- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
We have installed PC with Ubuntu 14.04.3 with all updates as Border router:
Linux hellnat 3.19.0-47-generic # 53~14.04.1-Ubuntu SMP Mon Jan 18 16:09:14 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
CPU: 2*E5-2690v3 with hyperthreading enabled (so total 48 logical "cores" in OS)
Intel XL710 quad port, every "channel" of every p1p* interface is binded to its core
It is used as border router, so it uses BGP. We use p1p1 and p1p3 to connect to internal routers and p1p2 and p1p3 - to Uplinks.
Suddenly traffic stopped when it was NOT rush hour.
After reboot (via IPMI) I saw next lines in syslog file:
Jan 31 02:33:33 hellnat kernel: [220504.793680] ------------[ cut here ]------------
Jan 31 02:33:33 hellnat kernel: [220504.793701] WARNING: CPU: 45 PID: 0 at /build/linux-lts-vivid-Yt59dr/linux-lts-vivid-3.19.0/net/sched/sch_generic.c:303 dev_watchdog+0x24f/0x260()
Jan 31 02:33:33 hellnat kernel: [220504.793705] NETDEV WATCHDOG: p1p1 (i40e): transmit queue 8 timed out
Jan 31 02:33:33 hellnat kernel: [220504.793707] Modules linked in: nf_conntrack_netlink nfnetlink xt_tcpudp xt_multiport iptable_filter xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_mangle xt_CT iptable_raw ast ttm joydev intel_rapl iosf_mbi drm_kms_helper x86_pkg_temp_thermal intel_powerclamp drm syscopyarea sysfillrect sysimgblt coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul aesni_intel ipmi_ssif aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd lpc_ich mei_me sb_edac edac_core mei ipmi_si 8250_fintek ipmi_msghandler lp wmi acpi_pad parport ioatdma mac_hid shpchp nf_conntrack_ftp acpi_power_meter nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre nf_nat nf_conntrack ip_tables x_tables 8021q garp mrp stp llc tcp_htcp hid_generic i40e(OE) igb vxlan ip6_udp_tunnel i2c_algo_bit udp_tunnel usbhid dca uas configfs ahci ptp usb_storage hid megaraid_sas libahci pps_core
Jan 31 02:33:33 hellnat kernel: [220504.793817] CPU: 45 PID: 0 Comm: swapper/45 Tainted: G OE 3.19.0-47-generic # 53~14.04.1-Ubuntu
Jan 31 02:33:33 hellnat kernel: [220504.793820] Hardware name: Supermicro SYS-6018R-WTR/X10DRW-i, BIOS 1.1 08/13/2015
Jan 31 02:33:33 hellnat kernel: [220504.793822] ffffffff81b3fcc0 ffff88105f4a3d58 ffffffff817afcd5 0000000000000000
Jan 31 02:33:33 hellnat kernel: [220504.793827] ffff88105f4a3da8 ffff88105f4a3d98 ffffffff81074dea 0000000000000286
Jan 31 02:33:33 hellnat kernel: [220504.793830] 0000000000000008 ffff88105b65a000 0000000000000040 ffff88105748cf40
Jan 31 02:33:33 hellnat kernel: [220504.793835] Call Trace:
Jan 31 02:33:33 hellnat kernel: [220504.793837] [] dump_stack+0x45/0x57
Jan 31 02:33:33 hellnat kernel: [220504.793857] [] warn_slowpath_common+0x8a/0xc0
Jan 31 02:33:33 hellnat kernel: [220504.793860] [] warn_slowpath_fmt+0x46/0x50
Jan 31 02:33:33 hellnat kernel: [220504.793869] [] dev_watchdog+0x24f/0x260
Jan 31 02:33:33 hellnat kernel: [220504.793874] [] ? dev_graft_qdisc+0x80/0x80
Jan 31 02:33:33 hellnat kernel: [220504.793879] [] call_timer_fn+0x39/0x110
Jan 31 02:33:33 hellnat kernel: [220504.793883] [] ? dev_graft_qdisc+0x80/0x80
Jan 31 02:33:33 hellnat kernel: [220504.793888] [] run_timer_softirq+0x220/0x320
Jan 31 02:33:33 hellnat kernel: [220504.793898] [] ? lapic_next_deadline+0x33/0x40
Jan 31 02:33:33 hellnat kernel: [220504.793905] [] __do_softirq+0xe4/0x270
Jan 31 02:33:33 hellnat kernel: [220504.793909] [] irq_exit+0x9d/0xb0
Jan 31 02:33:33 hellnat kernel: [220504.793916] [] smp_apic_timer_interrupt+0x4a/0x60
Jan 31 02:33:33 hellnat kernel: [220504.793924] [] apic_timer_interrupt+0x6d/0x80
Jan 31 02:33:33 hellnat kernel: [220504.793926] [] ? cpuidle_enter_state+0x70/0x170
Jan 31 02:33:33 hellnat kernel: [220504.793938] [] ? cpuidle_enter_state+0x5d/0x170
Jan 31 02:33:33 hellnat kernel: [220504.793943] [] cpuidle_enter+0x17/0x20
Jan 31 02:33:33 hellnat kernel: [220504.793949] [] cpu_startup_entry+0x334/0x3d0
Jan 31 02:33:33 hellnat kernel: [220504.793955] [] ? clockevents_register_device+0xe3/0x140
Jan 31 02:33:33 hellnat kernel: [220504.793960] [] start_secondary+0x197/0x1c0
Jan 31 02:33:33 hellnat kernel: [220504.793963] ---[ end trace 43e1a051ade0289e ]---
Jan 31 02:33:33 hellnat kernel: [220504.793973] i40e 0000:81:00.0 p1p1: tx_timeout: VSI_seid: 399, Q 8, NTC: 0xd36, HWB: 0xa1, NTU: 0xa1, TAIL: 0xa1, INT: 0x0
Jan 31 02:33:33 hellnat kernel: [220504.793976] i40e 0000:81:00.0 p1p1: tx_timeout recovery level 1, hung_queue 8
Jan 31 02:33:43 hellnat watchquagga[2972]: zebra state -> unresponsive : no response yet to ping sent 10 seconds ago
Jan 31 02:33:49 hellnat watchquagga[2972]: bgpd state -> unresponsive : no response yet to ping sent 10 seconds ago
Jan 31 02:33:50 hellnat kernel: [220521.908228] NMI watchdog: BUG: soft lockup - CPU# 13 stuck for 23s! [kworker/13:1:536]
Jan 31 02:33:50 hellnat kernel: [220521.908306] Modules linked in: nf_conntrack_netlink nfnetlink xt_tcpudp xt_multiport iptable_filter xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_mangle xt_CT iptable_raw ast ttm joydev intel_rapl iosf_mbi drm_kms_helper x86_pkg_temp_thermal intel_powerclamp drm syscopyarea sysfillrect sysimgblt coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul aesni_intel ipmi_ssif aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd lpc_ich mei_me sb_edac edac_core mei ipmi_si 8250_fintek ipmi_msghandler lp wmi acpi_pad parport ioatdma mac_hid shpchp nf_conntrack_ftp acpi_power_meter nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre nf_nat nf_conntrack ip_tables x_tables 8021q garp mrp stp llc tcp_htcp hid_generic i40e(OE) igb vxlan ip6_udp_tunnel i2c_algo_bit udp_tunnel usbhid dca uas configfs ahci ptp usb_storage hid megaraid_sas libahci pps_core
Jan 31 02:33:50 hellnat kernel: [220521.908396] CPU: 13 PID: 536 Comm: kworker/13:1 Tainted: G W OE 3.19.0-47-generic # 53~14.04.1-Ubuntu
Jan 31 02:33:50 hellnat kernel: [220521.908399] Hardware name: Supermicro SYS-6018R-WTR/X10DRW-i, BIOS 1.1 08/13/2015
Jan 31 02:33:50 hellnat kernel: [220521.908408] Workqueue: events inet_frag_worker
The main lines , I think, are:
Jan 31 02:33:33 hellnat kernel: [220504.793705] NETDEV WATCHDOG: p1p1 (i40e): transmit queue 8 timed out
Jan 31 02:33:33 hellnat kernel: [220504.793973] i40e 0000:81:00.0 p1p1: tx_timeout: VSI_seid: 399, Q 8, NTC: 0xd36, HWB: 0xa1, NTU: 0xa1, TAIL: 0xa1, INT: 0x0
Jan 31 02:33:33 hellnat kernel: [220504.793976] i40e 0000:81:00.0 p1p1: tx_timeout recovery level 1, hung_queue 8
We can see that tx queue 8 hang up. Why can it happen? I think it is a problem of network adapter or driver. Can you explain it to me and how to fix it? It is big problem when it happens because all ...
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Evgeny,
Thank you for contacting Intel Customer Support.
How often do you encounter the time out? What do you do to re-connect?
Please provide the adapter's details:
1. Specific Card Model:
2. Serial number (if available)
3. Modules installed
Sincerely,
Sandy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sandy,
This was first time, yesterday was second one. I reboot server via IPMI
1.
Ethernet controller: Intel Corporation Ethernet 10G 2P X710 Adapter (rev 01)
Subsystem: Intel Corporation Ethernet Converged Network Adapter X710-4
2. Device Serial Number f0-24-30-ff-ff-ca-05-68
3. Moduletech SFP+ modules.
After yesterday hangup I looked through syslog messages and found that there was lines with "event inet_frag_worker". After that I found that there were some patches implemented in fresh kernels (4.2) where some bugs in inet_fragment.c fixed. Today I updated to 4.2 kernel and I'll be watching for this machine for ome days. If there will not any problem for a week I think it is problem in kernel. So I'd like to take some timeout in closing this topic.
Regards,
Evgeny
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Evgeny,
Thank you for your information.
The time outs maybe related to the SFP+ modules. We recommend to use only the supported module. Please refer to the link below for more information:
http://www.intel.com/content/www/us/en/support/network-and-i-o/ethernet-products/000007045.html X710 Series—Compatible SFP+ Modules, SFP Modules, and Cables for...
Hope this is helpful.
Sincerely,
Sandy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sandy,
This is not because of SFP+ modules. I understand that you advise to use Intel SFP+ modules but there are other vendors that make compatible SFP+ modules.
As for my problem, there were problem in Linux kernel version 3, some part that works with fragmented packets. In kernel 4.2 there is some changes about this. So now it works stable for more than a week.
Thank you for a participation.
Sincerely,
Evgeny.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page