I've been having some issues with lockups on my wired network adaptor. Originally flagged in LAN#93 but raising dedicated ticket for better tracking.
Hardware
ben@ratchett:~$ lsusb
Bus 004 Device 009: ID 0bda:0411 Realtek Semiconductor Corp. Hub
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 003: ID 5986:2145 Acer, Inc Integrated RGB Camera
Bus 003 Device 107: ID 0d8c:0014 C-Media Electronics, Inc. Audio Adapter (Unitek Y-247A)
Bus 003 Device 106: ID 0bda:5411 Realtek Semiconductor Corp. RTS5411 Hub
Bus 003 Device 002: ID 06cb:00fc Synaptics, Inc.
Bus 003 Device 004: ID 8087:0033 Intel Corp.
Bus 003 Device 089: ID 291a:8338 Anker Anker USB-C Hub Device
Bus 003 Device 085: ID 0c45:6366 Microdia Webcam Vitade AF
Bus 003 Device 032: ID 1050:0407 Yubico.com Yubikey 4/5 OTP+U2F+CCID
Bus 003 Device 031: ID 258a:002a SINO WEALTH Gaming KB
Bus 003 Device 030: ID 1bcf:0005 Sunplus Innovation Technology Inc. Optical Mouse
Bus 003 Device 029: ID 05e3:0610 Genesys Logic, Inc. Hub
Bus 003 Device 081: ID 291a:b817 Anker USB2.0 Hub
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 021: ID 0b95:1790 ASIX Electronics Corp. AX88179 Gigabit Ethernet
Bus 002 Device 037: ID 05e3:0626 Genesys Logic, Inc. USB3.1 Hub
Bus 002 Device 017: ID 291a:a817 Anker USB3.0 Hub
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Software
ben@ratchett:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.2 LTS
Release: 22.04
Codename: jammy
ben@ratchett:~$ uname -a
Linux ratchett 5.19.0-35-generic [[#36](/issue/jira-projects/MISC/36.html)](/issue/jira-projects/MISC/36.html)~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Feb 17 15:17:25 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
(it's actually Xubuntu rather than ubuntu, but that shouldn't matter here)
When the lockup occurs, the following is logged in dmesg
Edit: it seems this isn't consistently logged, the panic was not seen on subsequent lockups
[150012.703270] ------------[ cut here ]------------
[150012.703276] NETDEV WATCHDOG: enxa0cec86958e9 (ax88179_178a): transmit queue 0 timed out
[150012.703299] WARNING: CPU: 9 PID: 0 at net/sched/sch_generic.c:529 dev_watchdog+0x21f/0x230
[150012.703311] Modules linked in: xt_mark cdc_mbim cdc_wdm cdc_ncm cdc_ether ax88179_178a usbnet mii typec_displayport xt_nat veth nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter bridge stp llc overlay hid_cmedia snd_usb_audio usbhid snd_usbmidi_lib xt_tcpudp xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables libcrc32c tls nfnetlink ccm rfcomm cmac algif_hash algif_skcipher af_alg bnep binfmt_misc snd_ctl_led snd_soc_skl_hda_dsp snd_soc_intel_hda_dsp_common snd_soc_hdac_hdmi snd_sof_probes snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_soc_dmic snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda joydev snd_hda_ext_core snd_soc_acpi_intel_match mei_hdcp mei_pxp snd_soc_acpi intel_tcc_cooling soundwire_bus x86_pkg_temp_thermal pmt_telemetry
[150012.703389] intel_powerclamp pmt_class snd_soc_core coretemp iwlmvm intel_rapl_msr snd_compress ac97_bus mac80211 btusb snd_pcm_dmaengine kvm_intel btrtl libarc4 snd_hda_intel btbcm kvm uvcvideo snd_intel_dspcfg btintel videobuf2_vmalloc snd_intel_sdw_acpi processor_thermal_device_pci btmtk videobuf2_memops snd_seq_midi rapl iwlwifi snd_hda_codec processor_thermal_device videobuf2_v4l2 nls_iso8859_1 thinkpad_acpi processor_thermal_rfim intel_cstate bluetooth snd_seq_midi_event input_leds videobuf2_common snd_hda_core think_lmi processor_thermal_mbox ucsi_acpi serio_raw snd_hwdep spi_nor nvram mei_me wmi_bmof firmware_attributes_class snd_rawmidi videodev typec_ucsi processor_thermal_rapl ecdh_generic snd_pcm mtd cfg80211 ledtrig_audio mc hid_multitouch mei ecc intel_vsec igen6_edac intel_rapl_common typec snd_seq snd_seq_device snd_timer snd soc_button_array soundcore int3403_thermal platform_profile int340x_thermal_zone mac_hid intel_hid int3400_thermal acpi_thermal_rel sparse_keymap
[150012.703467] acpi_pad acpi_tad sch_fq_codel msr parport_pc ppdev lp parport ramoops pstore_blk reed_solomon pstore_zone efi_pstore ip_tables x_tables autofs4 dm_crypt i915 drm_buddy i2c_algo_bit ttm drm_display_helper cec rc_core hid_generic crct10dif_pclmul crc32_pclmul ghash_clmulni_intel drm_kms_helper aesni_intel syscopyarea sysfillrect sysimgblt fb_sys_fops crypto_simd spi_intel_pci cryptd psmouse i2c_i801 intel_lpss_pci nvme drm spi_intel i2c_smbus intel_lpss thunderbolt nvme_core idma64 xhci_pci xhci_pci_renesas wmi i2c_hid_acpi i2c_hid hid video pinctrl_tigerlake
[150012.703522] CPU: 9 PID: 0 Comm: swapper/9 Not tainted 5.19.0-35-generic [[#36](/issue/jira-projects/MISC/36.html)](/issue/jira-projects/MISC/36.html)~22.04.1-Ubuntu
[150012.703527] Hardware name: LENOVO 21CBCTO1WW/21CBCTO1WW, BIOS N3AET71W (1.36 ) 01/31/2023
[150012.703529] RIP: 0010:dev_watchdog+0x21f/0x230
[150012.703536] Code: 00 e9 31 ff ff ff 4c 89 e7 c6 05 ef ac 70 01 01 e8 c6 9d f8 ff 44 89 f1 4c 89 e6 48 c7 c7 a8 da 30 ad 48 89 c2 e8 89 e3 1b 00 <0f> 0b e9 22 ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
[150012.703540] RSP: 0018:ffffaa618038ce70 EFLAGS: 00010246
[150012.703543] RAX: 0000000000000000 RBX: ffff9ef2d15944c8 RCX: 0000000000000000
[150012.703546] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[150012.703548] RBP: ffffaa618038ce98 R08: 0000000000000000 R09: 0000000000000000
[150012.703550] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9ef2d1594000
[150012.703552] R13: ffff9ef2d159441c R14: 0000000000000000 R15: 0000000000000000
[150012.703554] FS: 0000000000000000(0000) GS:ffff9ef9ff640000(0000) knlGS:0000000000000000
[150012.703557] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[150012.703560] CR2: 00007f87448a9000 CR3: 0000000117810004 CR4: 0000000000770ee0
[150012.703563] PKRU: 55555554
[150012.703565] Call Trace:
[150012.703567] <IRQ>
[150012.703573] ? pfifo_fast_reset+0x170/0x170
[150012.703579] call_timer_fn+0x29/0x160
[150012.703584] ? pfifo_fast_reset+0x170/0x170
[150012.703589] __run_timers.part.0+0x1e9/0x290
[150012.703592] ? ktime_get+0x43/0xc0
[150012.703597] ? lapic_next_deadline+0x2c/0x50
[150012.703602] ? clockevents_program_event+0xb2/0x140
[150012.703607] run_timer_softirq+0x2a/0x60
[150012.703610] __do_softirq+0xd5/0x32a
[150012.703615] ? hrtimer_interrupt+0x12b/0x240
[150012.703619] __irq_exit_rcu+0x8d/0xd0
[150012.703624] irq_exit_rcu+0xe/0x20
[150012.703629] sysvec_apic_timer_interrupt+0x96/0xb0
[150012.703645] </IRQ>
[150012.703647] <TASK>
[150012.703648] asm_sysvec_apic_timer_interrupt+0x1b/0x20
[150012.703654] RIP: 0010:cpuidle_enter_state+0xea/0x640
[150012.703660] Code: 00 31 ff e8 48 7e 59 ff 80 7d d0 00 74 16 9c 58 0f 1f 40 00 f6 c4 02 0f 85 4d 03 00 00 31 ff e8 9c 18 61 ff fb 0f 1f 44 00 00 <45> 85 f6 0f 88 85 01 00 00 4d 63 ee 49 83 fd 09 0f 87 22 04 00 00
[150012.703663] RSP: 0018:ffffaa61801a3e18 EFLAGS: 00000246
[150012.703666] RAX: 0000000000000000 RBX: ffffca617fc40d00 RCX: 0000000000000000
[150012.703668] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[150012.703669] RBP: ffffaa61801a3e68 R08: 0000000000000000 R09: 0000000000000000
[150012.703671] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffadcb18c0
[150012.703673] R13: 0000000000000004 R14: 0000000000000004 R15: 0000886f8de36b34
[150012.703677] ? cpuidle_enter_state+0xc8/0x640
[150012.703681] ? raw_spin_rq_unlock+0x10/0x40
[150012.703687] cpuidle_enter+0x2e/0x50
[150012.703690] call_cpuidle+0x23/0x60
[150012.703695] cpuidle_idle_call+0x119/0x190
[150012.703699] do_idle+0x82/0x110
[150012.703702] cpu_startup_entry+0x20/0x30
[150012.703704] start_secondary+0x122/0x160
[150012.703709] secondary_startup_64_no_verify+0xe5/0xeb
[150012.703716] </TASK>
[150012.703717] ---[ end trace 0000000000000000 ]---
At that point, the interface cannot be interacted with.
On the first lockup, due to time pressures, I simply switched to a Wifi connection and got on with what I needed to do.
On a later lockup though, that wasn't possible. A timeout was logged, as above, but the adaptor continued to be used for the default route.
ifdown
wasn't installed, so couldn't test with thatIn order to restore connectivity, the USB-C hub had to be unplugged and then plugged back in
Activity
24-Mar-23 08:55
assigned to @btasker
24-Mar-23 08:55
moved from LAN#94
24-Mar-23 08:55
assigned to @btasker
24-Mar-23 08:55
Full
lsusb
for the hub is24-Mar-23 09:10
Reviews for the hub are generally positive, and I've always found Anker to be quite good, so I don't think the issue is a product issue (in the sense of them having used cheap chips rather than manufacturing defects).
My current working theory is that it's the result of excess demands on either power or bandwidth.
In terms of power, my previous laptop used a similar hub, but that only had 2 USB-A ports on it so some stuff was plugged directly into the laptop. In this case, the following is plugged in
It's more than was plugged into the old one, but still really isn't that much, it's not like I'm charging devices off it.
So it's probably not power (although it might still be some form of power-saving).
Bandwidth is possible, we've got that HDMI connection and my monitor has been running at a higher resolution (4K) than on the old laptop.
The adapter probably uses DisplayPort to communicate back, so there'll be compression between laptop and hub. But it does seem like a reasonable bet that we might be stretching the bandwidth available.
24-Mar-23 09:11
I need stability, so I've started by unplugging most things from the hub
Will have to see how that gets on.
24-Mar-23 09:11
mentioned in issue LAN#93
24-Mar-23 09:39
Just for avoidance of doubt, in yesterdays lockup, the NIC really wasn't having to do much
24-Mar-23 10:25
Just had a drop again, it's only the webcam and keyboard etc plugged into that hub.
24-Mar-23 14:48
Might be time to start being suspicious of the hardware (and/or it's driver) then.
Adapter info:
24-Mar-23 16:28
Well... looky-looky, there's a kernel bug report: USB Ethernet adapter ASIX AX88179 disconnects under heavy load.
Sounds familiar.
It looks like there are some out of band fixes.
Also looks like there might be a manufacturer fix: https://bugzilla.kernel.org/show_bug.cgi?id=212731#c10
Hmmm
But
24-Mar-23 16:33
One of the linked GH repos (https://github.com/nothingstopsme/AX88179_178A_Linux_Driver) makes an interesting note
I am using an extension cable (although it's only about a foot long). Might be worth taking that out of the loop.
Others have noted, though, that using that driver helped so I think that's the next port of call
24-Mar-23 16:36
Building and installing
That's installed to
/lib/modules/5.19.0-35-generic/kernel/drivers/net/usb/ax88179_178a.ko
, so it's reboot time.24-Mar-23 16:44
Modules loaded:
I guess we wait and see.
24-Mar-23 17:50
Just went on a Zoom call to test and didn't have any issues.
Usage was a bit more constrained than yesterday
I've set a largish file copy going to try and stretch it a bit more - currently coming across at 111MB/s. The test file is 2.8GB so the speed is sustrained for about 25 seconds at a time (I've run multiple back to back)
Ideally, I'd use
iperf
to run a longer test, but I'm on-call and can't risk coming back to the laptop to find it's crapped itself. Still, I'm fairly sure I wouldn't have been able to do those copies earlier today.I'll hold off further testing, but will keep the ticket open for the time being.
24-Mar-23 18:02
changed the description
25-Mar-23 17:37
Has remained stable so far, including whilst I pushed a backup to achieve a fairly sustained period of demand.
25-Mar-23 17:51
Found a related bug report in launchpad, with relatively little activity on it, so have provided some details there: https://bugs.launchpad.net/ubuntu/+source/linux-signed-hwe-5.19/+bug/2012520?comments=all
28-Mar-23 08:59
Just had a dropout whilst on a call, looks exactly the same
That sucks.
This time I was able to hit "Disconnect" in the network gadget to fail-over to wifi. There's nothing relevant before or after in
dmesg
we just suddenly get the panic.28-Mar-23 12:53
At that time, we were receiving at about 1.25Mib/s. We'd been receiving at that rate without issue just half an hour earlier.
However, being a video call, we were also sending at a little over 1Mib/s
Which suggests there might be a couple of possibilities
It's odd though, after I originally swapped the driver I went on a call and did screensharing etc to really try and stress it with no issues. There must be something we're missing here.
Unfortunately, I need this to just work, so I've ordered a different dongle based on a different chipset.
26-Jun-23 10:35
There's a suggestion here that it's an issue with the order that modules are loaded in
With the following used to force the order on reboot