########################################################################################## MISC-26: Ubuntu Dmesg being spammed with rtl_counters_cond == 1 ########################################################################################## Issue Type: Bug ----------------------------------------------------------------------------------------- Issue Information ==================== Priority: Major Status: Closed Resolution: Fixed (2019-03-19 13:21:26) Project: Miscellaneous (MISC) Reported By: btasker Assigned To: btasker Environment:Ubuntu 16.04.5 LTS (Xenial) Linux 4.4.0-72-generic #93-Ubuntu SMP Time Estimate: 0 minutes Time Logged: 0 minutes ----------------------------------------------------------------------------------------- Issue Description ================== My dmesg is getting spammed with lots of identical entries -- BEGIN SNIPPET -- [213308.745502] r8169 0000:1c:00.0 p3p1: rtl_counters_cond == 1 (loop: 1000, delay: 10). [213308.756755] r8169 0000:1c:00.0 p3p1: rtl_counters_cond == 1 (loop: 1000, delay: 10). [213308.767991] r8169 0000:1c:00.0 p3p1: rtl_counters_cond == 1 (loop: 1000, delay: 10). [213308.779243] r8169 0000:1c:00.0 p3p1: rtl_counters_cond == 1 (loop: 1000, delay: 10). [213308.790516] r8169 0000:1c:00.0 p3p1: rtl_counters_cond == 1 (loop: 1000, delay: 10). [213308.801655] r8169 0000:1c:00.0 p3p1: rtl_counters_cond == 1 (loop: 1000, delay: 10). [213308.812869] r8169 0000:1c:00.0 p3p1: rtl_counters_cond == 1 (loop: 1000, delay: 10). [213308.824014] r8169 0000:1c:00.0 p3p1: rtl_counters_cond == 1 (loop: 1000, delay: 10). [213309.704324] r8169 0000:1c:00.0 p3p1: rtl_counters_cond == 1 (loop: 1000, delay: 10). [213309.715648] r8169 0000:1c:00.0 p3p1: rtl_counters_cond == 1 (loop: 1000, delay: 10). [213310.941632] r8169 0000:1c:00.0 p3p1: rtl_counters_cond == 1 (loop: 1000, delay: 10). -- END SNIPPET -- Since it started, I'm also regularly getting apport pop up to claim Slack and Skype have crashed (they're still open, but you can see a thread segfaulting in dmesg). As r8169 is a NIC driver, my guess is those electron apps are picking up on a "change" in network state and failing to handle it. ----------------------------------------------------------------------------------------- Activity ========== ----------------------------------------------------------------------------------------- 2019-03-19 08:21:47 btasker ----------------------------------------------------------------------------------------- We know r8169 is a NIC driver (painful memories of troubleshooting it in the past). Line also gives us the bus address and the interface name: -- BEGIN SNIPPET -- [213310.941632] r8169 0000:1c:00.0 p3p1: rtl_counters_cond == 1 (loop: 1000, delay: 10). -- END SNIPPET -- So it's interface p3p1 at address 0000:1c:00.0 That interface isn't actually cabled up, or in use: -- BEGIN SNIPPET -- ben@milleniumfalcon:~$ ip a 1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: p1p1: mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000 link/ether 00:e0:ff:68:04:c4 brd ff:ff:ff:ff:ff:ff 3: p3p1: mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000 link/ether 00:e0:4c:68:11:be brd ff:ff:ff:ff:ff:ff 4: eth0: mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 80:c1:6e:f1:cb:3c brd ff:ff:ff:ff:ff:ff inet 192.168.1.70/24 brd 192.168.1.255 scope global dynamic eth0 valid_lft 532sec preferred_lft 532sec inet6 fe80::c2b6:54a1:bf5a:65db/64 scope link valid_lft forever preferred_lft forever -- END SNIPPET -- It's also curious, because this box has just 2 NICs - 1 Broadcom (eth0) and the Realtek. Broadcom is on-board, Realtek is a PCI card. Curious that the "two" Realtek interfaces have different MACs too.... For avoidance of doubt, I've just been on my knees behind the machine double-checking that there are in fact only 2 NICs. Rather than just ifdowning it, https://askubuntu.com/questions/909213/ethernet-fails-to-initialize suggests we can rescan the bus -- BEGIN SNIPPET -- find /sys/devices -name p3p1 echo 1 > `dirname $dev`/../remove -- END SNIPPET -- Now, that page suggests we can then re-scan, but that doesn't work for me: -- BEGIN SNIPPET -- root@milleniumfalcon:~# echo 1 >/sys/devices/pci0000:00/0000:1c:00.0/rescan bash: /sys/devices/pci0000:00/0000:1c:00.0/rescan: No such file or directory -- END SNIPPET -- Which makes sense given the thing doesn't actually exist. The loglines have stopped hitting dmesg but it feels like an incomplete fix as all we've actually done is remove the interface. So, lets try pulling the module out and putting it back to make sure it stays fixed -- BEGIN SNIPPET -- root@milleniumfalcon:~# rmmod r8169; modprobe r8169 -- END SNIPPET -- dmesg looks happy, and the phantom interface no longer shows up for ip -- BEGIN SNIPPET -- root@milleniumfalcon:~# ip a 1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 4: eth0: mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 80:c1:6e:f1:cb:3c brd ff:ff:ff:ff:ff:ff inet 192.168.1.70/24 brd 192.168.1.255 scope global dynamic eth0 valid_lft 481sec preferred_lft 481sec inet6 fe80::c2b6:54a1:bf5a:65db/64 scope link valid_lft forever preferred_lft forever 5: p1p1: mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000 link/ether 00:e0:ff:68:04:c4 brd ff:ff:ff:ff:ff:ff -- END SNIPPET -- Which resolves the issue, but doesn't really explain why r8169 decided to show a phantom interface with a hardware address that doesn't actually exist on the system. The OUI is definitely Realtek though. ----------------------------------------------------------------------------------------- 2019-03-19 08:32:06 btasker ----------------------------------------------------------------------------------------- I don't know where that other MAC came from, but there is an old, old bug with the Realtek driver that I've been hit by before which might be relevant. The bug itself only applies to systems using bonded interfaces (which this isn't), but the underlying fuck-up probably has other applications. When 2 (or more) interfaces are in a bond, they share a MAC address (generally taken from one of the slaves). The new - shared - MAC is written into the NICs on-chip memory. However, if (at least) one of those interfaces is Realtek (and it was specifically the r8169 driver I was hit with this on), following reboot the bond may fail to come up, or may come up with just one slave. A look in system logs will generally show complaints that the bond's MAC address is already in use. This is because, on boot, when the r8169 driver polls the NIC for it's MAC address, it fetches it from the nic's memory, rather than from the PHY. So, it get's the bond's MAC back rather than the true MAC. The "fix" is to isolate the system from power for 5 mins so that the NIC loses any residual charge. There's a mailing thread post somewhere from around 2010 where this was identified, and it remained unfixed as of 2 years ago (don't know about now, but I can take an educated guess). Now, this system doesn't have a bond, and never has, but I'm wondering if something similar is going on - I do occasionally run some VMs on this machine which get bridged to an adaptor (though I don't see _any_ with that MAC). There was a (very brief) power interruption the other night as a result of the trip switch being thrown (cause of that is unrelated), so the system powered down suddenly and uncleanly, but didn't stay isolated from power for the NIC to lose any charge. What that wouldn't explain though is the phantom bus address... clearly I'm missing something ----------------------------------------------------------------------------------------- 2019-03-19 12:33:14 btasker ----------------------------------------------------------------------------------------- Recording for posterity's sake -- BEGIN SNIPPET -- # modinfo r8169 filename: /lib/modules/4.4.0-72-generic/kernel/drivers/net/ethernet/realtek/r8169.ko firmware: rtl_nic/rtl8107e-2.fw firmware: rtl_nic/rtl8107e-1.fw firmware: rtl_nic/rtl8168h-2.fw firmware: rtl_nic/rtl8168h-1.fw firmware: rtl_nic/rtl8168g-3.fw firmware: rtl_nic/rtl8168g-2.fw firmware: rtl_nic/rtl8106e-2.fw firmware: rtl_nic/rtl8106e-1.fw firmware: rtl_nic/rtl8411-2.fw firmware: rtl_nic/rtl8411-1.fw firmware: rtl_nic/rtl8402-1.fw firmware: rtl_nic/rtl8168f-2.fw firmware: rtl_nic/rtl8168f-1.fw firmware: rtl_nic/rtl8105e-1.fw firmware: rtl_nic/rtl8168e-3.fw firmware: rtl_nic/rtl8168e-2.fw firmware: rtl_nic/rtl8168e-1.fw firmware: rtl_nic/rtl8168d-2.fw firmware: rtl_nic/rtl8168d-1.fw version: 2.3LK-NAPI license: GPL description: RealTek RTL-8169 Gigabit Ethernet driver author: Realtek and the Linux r8169 crew