- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All
Server: HP 360DL G6
Network: Intel Corporation 82598EB 10-Gigabit AT CX4 Network Connection (rev 01)
OS: CentOS 5.5
Throughput: from 50Kpps up to 250Kpps in peak time
Driver: ixgbe
With new drivers 3.1.15 and 3.1.17 I have a connection loss approximately after 1 hour from booting.
Driver compiled with CLAGS_EXTRA="-DIXGBE_NO_LRO"
With the driver comes with OS (ixgbe ver 2.0.44) it works without problem.
May be someone know how to fix it?
Thanks in advance
--
From the /var/log/messages
Dec 18 05:58:36 localhost kernel: ixgbe: eth6: ixgbe_watchdog_link_is_down: NIC Link is DownDec 18 05:58:37 localhost kernel: Uhhuh. NMI received for unknown reason a0 on CPU 0.Dec 18 05:58:37 localhost kernel: You probably have a hardware problem with your RAM chipsDec 18 05:58:37 localhost kernel: Dazed and confused, but trying to continueDec 18 05:58:38 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 0 not cleared within the polling periodDec 18 05:58:38 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 1 not cleared within the polling periodDec 18 05:58:38 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 2 not cleared within the polling periodDec 18 05:58:38 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 3 not cleared within the polling periodDec 18 05:58:38 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 4 not cleared within the polling periodDec 18 05:58:38 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 5 not cleared within the polling periodDec 18 05:58:38 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 6 not cleared within the polling periodDec 18 05:58:38 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 7 not cleared within the polling periodDec 18 05:58:44 localhost kernel: ixgbe: eth6: ixgbe_reset: Hardware Error: -15Dec 18 05:58:44 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 0 not cleared within the polling periodDec 18 05:58:44 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 1 not cleared within the polling periodDec 18 05:58:44 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 2 not cleared within the polling periodDec 18 05:58:44 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 3 not cleared within the polling periodDec 18 05:58:44 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 4 not cleared within the polling periodDec 18 05:58:44 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 5 not cleared within the polling periodDec 18 05:58:44 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 6 not cleared within the polling periodDec 18 05:58:44 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 7 not cleared within the polling periodDec 18 05:58:44 localhost kernel: ixgbe: eth6: ixgbe_watchdog_link_is_up: NIC Link is Up 10 Gbps, Flow Control: RX/TXDec 18 06:00:30 localhost shutdown[4998]: shutting down for system reboot[root@localhost ixgbe-3.1.17]# lspci|grep 82598EB
0b:00.0 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AT CX4 Network Connection (rev 01)
0b:00.1 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AT CX4 Network Connection (rev 01)
[root@localhost ixgbe-3.1.17]# lspci -v -v -s 0b:00.00b:00.0 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AT CX4 Network Connection (rev 01) Subsystem: Super Micro Computer Inc Unknown device af80 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 169 Region 0: Memory at fcee0000 (32-bit, non-prefetchable) [size=128K] Region 1: Memory at fce80000 (32-bit, non-prefetchable) [size=256K] Region 2: I/O ports at 5000 [size=32] Region 3: Memory at fce70000 (32-bit, non-prefetchable) [size=16K] [virtual] Expansion ROM at c2000000 [disabled] ...Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have met the same problem.
someone know how to fix it?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@xiaomdoNG
The driver issues reported above are about two years old.
What driver versions are you using? What kernel? What distribution? What flags / options are you configuring with the driver? What are the details of the disconnects you are experiencing?
Make sure you are using the latest driver, version 3.10.17. You can download the driver at http://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=14687 http://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=14687.
Mark H
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Mark H
I have met the same problem in kernel version 2.6.28 With driver ixgbe-3.10.15.
Could you give me some advice?
Message as follows:
Sep 26 17:45:30 cwcos user.warn kernel: WARNING: at net/sched/sch_generic.c:226 dev_watchdog+0x110/0x194()
Sep 26 17:45:30 cwcos user.info kernel: NETDEV WATCHDOG: eth4 (ixgbe): transmit timed out
Sep 26 17:45:30 cwcos user.warn kernel: Modules linked in:
Sep 26 17:45:30 cwcos user.info kernel: nf_conntrack_tftp
Sep 26 17:45:30 cwcos user.info kernel: nf_conntrack_ftp
Sep 26 17:45:30 cwcos user.info kernel: nf_conntrack_ipv4
Sep 26 17:45:30 cwcos user.info kernel: nf_defrag_ipv4
Sep 26 17:45:30 cwcos user.info kernel: xt_state
Sep 26 17:45:30 cwcos user.info kernel: nf_conntrack
Sep 26 17:45:30 cwcos user.info kernel: nfnetlink
Sep 26 17:45:30 cwcos user.info kernel: iptable_filter
Sep 26 17:45:30 cwcos user.info kernel: ip_tables
Sep 26 17:45:30 cwcos user.info kernel: xt_tcpudp
Sep 26 17:45:30 cwcos user.info kernel: xt_limit xt_multiport x_tables ixgbe igb tg3 e1000e
Sep 26 17:45:30 cwcos user.info kernel: e1000 e100 sd_mod pata_jmicron ata_generic libata uhci_hcd
Sep 26 17:45:30 cwcos user.info kernel: ohci_hcd ehci_hcd
Sep 26 17:45:30 cwcos user.warn kernel: Pid: 0, comm: swapper Not tainted 2.6.28.3cwcos_kernel_v1.0.0.1c1 # 36
Sep 26 17:45:30 cwcos user.warn kernel: Call Trace:
Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0000 warn_slowpath+0x61/0x78
Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0001 apm+0x3c9/0x51b
Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0002 reschedule_interrupt+0x28/0x30
Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0003 smp_reschedule_interrupt+0x10/0x21
Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0004 reschedule_interrupt+0x28/0x30
Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0005 next_cpu+0x12/0x21 Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0006 find_busiest_group+0x23e/0x671 Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0007 dev_watchdog+0x110/0x194 Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0008 rebalance_domains+0x124/0x33d Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0009 dev_watchdog+0x0/0x194 Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0010 run_timer_softirq+0xf5/0x14a Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0011 do_softirq+0x76/0x113
Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0012 __do_softirq+0x0/0x113
Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0014 irq_exit+0x35/0x73
Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0015 smp_apic_timer_interrupt+0x6e/0x78
Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0016 apic_timer_interrupt+0x28/0x30
Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0017 acpi_ex_prep_field_value+0x131/0x1aa
Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0018 acpi_safe_halt+0x18/0x25
Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0019 acpi_idle_enter_c1+0x9a/0xf0
Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0020 cpuidle_idle_call+0x5c/0x94
Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0021 cpu_idle+0x68/0x98
Sep 26 17:45:30 cwcos user.warn kernel: --- end trace 34ba8c1b33fd6912 ---
Sep 26 17:45:30 cwcos user.err kernel: ixgbe: eth4: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 0 not cleared within the polling period
Sep 26 17:45:30 cwcos user.err kernel: ixgbe: eth4: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 1 not cleared within the polling period
Sep 26 17:45:30 cwcos user.err kernel: ixgbe: eth4: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 2 not cleared within the polling period
Sep 26 17:45:30 cwcos user.err kernel: ixgbe: eth4: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 3 not cleared within the polling period
Sep 26 17:45:30 cwcos user.err kernel: ixgbe: eth4: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 4 not cleared within the polling period
Sep 26 17:45:30 cwcos user.err kernel: ixgbe: eth4: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 5 not cleared within the polling period
Sep 26 17:45:30 cwcos user.err kernel: ixgbe: eth4: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 6 not cleared within the polling period
Sep 26 17:45:30 cwcos user.err kernel: ixgbe: eth4: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 7 not cleared within the polling period
Sep 26 17:45:37 cwcos user.err kernel: ixgbe: eth4: ixgbe_reset: Hardware Error: -15 !
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Troubleshooting this issue is beyond what I know, so I contacted a Linux driver developer for his suggestions. He cannot tell what the issue is from this information. He thinks that something might be happening with the OS that then causes the problem shown in the log you posted. Possibly something shows up earlier in the log that leads up to the watchdog warning. Please supply the preceding log messages.
He would also like to see some other information might help with troubleshooting:
Please provide the ethreg's register dump of the system in the failure state. You can get the tool from SourceForge at http://sourceforge.net/projects/e1000/files/Ethregs%20-%20Register%20Dump%20Tool/ http://sourceforge.net/projects/e1000/files/Ethregs%20-%20Register%20Dump%20Tool/.
What are NIC's stats from "ethtool -s"?
What is the hardware you are running on?
What is the output of lspci -vvv?
With the additional information, he might be able to get closer to the cause and give suggestions.
Mark H
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Given the mess with the Ethernet stack, it might be reasonable to reinstall the OS.
Make sure you are using the latest distribution of CentOS so that you have the current kernel etc.
I use a different distribution but CentOS is the same as for the others.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page