Community
cancel
Showing results for 
Search instead for 
Did you mean: 
idata
Community Manager
1,771 Views

82598EB 10-Gigabit AT CX4 problem with new drivers

Hi All

Server: HP 360DL G6

Network: Intel Corporation 82598EB 10-Gigabit AT CX4 Network Connection (rev 01)

OS: CentOS 5.5

Throughput: from 50Kpps up to 250Kpps in peak time

Driver: ixgbe

With new drivers 3.1.15 and 3.1.17 I have a connection loss approximately after 1 hour from booting.

Driver compiled with CLAGS_EXTRA="-DIXGBE_NO_LRO"

With the driver comes with OS (ixgbe ver 2.0.44) it works without problem.

May be someone know how to fix it?

Thanks in advance

--

From the /var/log/messages

Dec 18 05:58:36 localhost kernel: ixgbe: eth6: ixgbe_watchdog_link_is_down: NIC Link is DownDec 18 05:58:37 localhost kernel: Uhhuh. NMI received for unknown reason a0 on CPU 0.Dec 18 05:58:37 localhost kernel: You probably have a hardware problem with your RAM chipsDec 18 05:58:37 localhost kernel: Dazed and confused, but trying to continueDec 18 05:58:38 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 0 not cleared within the polling periodDec 18 05:58:38 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 1 not cleared within the polling periodDec 18 05:58:38 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 2 not cleared within the polling periodDec 18 05:58:38 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 3 not cleared within the polling periodDec 18 05:58:38 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 4 not cleared within the polling periodDec 18 05:58:38 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 5 not cleared within the polling periodDec 18 05:58:38 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 6 not cleared within the polling periodDec 18 05:58:38 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 7 not cleared within the polling periodDec 18 05:58:44 localhost kernel: ixgbe: eth6: ixgbe_reset: Hardware Error: -15Dec 18 05:58:44 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 0 not cleared within the polling periodDec 18 05:58:44 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 1 not cleared within the polling periodDec 18 05:58:44 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 2 not cleared within the polling periodDec 18 05:58:44 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 3 not cleared within the polling periodDec 18 05:58:44 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 4 not cleared within the polling periodDec 18 05:58:44 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 5 not cleared within the polling periodDec 18 05:58:44 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 6 not cleared within the polling periodDec 18 05:58:44 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 7 not cleared within the polling periodDec 18 05:58:44 localhost kernel: ixgbe: eth6: ixgbe_watchdog_link_is_up: NIC Link is Up 10 Gbps, Flow Control: RX/TXDec 18 06:00:30 localhost shutdown[4998]: shutting down for system reboot

[root@localhost ixgbe-3.1.17]# lspci|grep 82598EB

0b:00.0 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AT CX4 Network Connection (rev 01)

0b:00.1 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AT CX4 Network Connection (rev 01)

[root@localhost ixgbe-3.1.17]# lspci -v -v -s 0b:00.00b:00.0 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AT CX4 Network Connection (rev 01) Subsystem: Super Micro Computer Inc Unknown device af80 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 169 Region 0: Memory at fcee0000 (32-bit, non-prefetchable) [size=128K] Region 1: Memory at fce80000 (32-bit, non-prefetchable) [size=256K] Region 2: I/O ports at 5000 [size=32] Region 3: Memory at fce70000 (32-bit, non-prefetchable) [size=16K] [virtual] Expansion ROM at c2000000 [disabled] ...
6 Replies
dxiao3
Beginner
74 Views

I have met the same problem.

someone know how to fix it?

Mark_H_Intel
Employee
74 Views

@xiaomdoNG

The driver issues reported above are about two years old.

What driver versions are you using? What kernel? What distribution? What flags / options are you configuring with the driver? What are the details of the disconnects you are experiencing?

Make sure you are using the latest driver, version 3.10.17. You can download the driver at http://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=14687 http://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=14687.

Mark H

dxiao3
Beginner
74 Views

@Mark H

I have met the same problem in kernel version 2.6.28 With driver ixgbe-3.10.15.

 

Could you give me some advice?

Message as follows:

Sep 26 17:45:30 cwcos user.warn kernel: WARNING: at net/sched/sch_generic.c:226 dev_watchdog+0x110/0x194()

 

Sep 26 17:45:30 cwcos user.info kernel: NETDEV WATCHDOG: eth4 (ixgbe): transmit timed out

 

Sep 26 17:45:30 cwcos user.warn kernel: Modules linked in:

 

Sep 26 17:45:30 cwcos user.info kernel: nf_conntrack_tftp

 

Sep 26 17:45:30 cwcos user.info kernel: nf_conntrack_ftp

 

Sep 26 17:45:30 cwcos user.info kernel: nf_conntrack_ipv4

 

Sep 26 17:45:30 cwcos user.info kernel: nf_defrag_ipv4

 

Sep 26 17:45:30 cwcos user.info kernel: xt_state

 

Sep 26 17:45:30 cwcos user.info kernel: nf_conntrack

 

Sep 26 17:45:30 cwcos user.info kernel: nfnetlink

 

Sep 26 17:45:30 cwcos user.info kernel: iptable_filter

 

Sep 26 17:45:30 cwcos user.info kernel: ip_tables

 

Sep 26 17:45:30 cwcos user.info kernel: xt_tcpudp

 

Sep 26 17:45:30 cwcos user.info kernel: xt_limit xt_multiport x_tables ixgbe igb tg3 e1000e

 

Sep 26 17:45:30 cwcos user.info kernel: e1000 e100 sd_mod pata_jmicron ata_generic libata uhci_hcd

 

Sep 26 17:45:30 cwcos user.info kernel: ohci_hcd ehci_hcd

 

Sep 26 17:45:30 cwcos user.warn kernel: Pid: 0, comm: swapper Not tainted 2.6.28.3cwcos_kernel_v1.0.0.1c1 # 36

 

Sep 26 17:45:30 cwcos user.warn kernel: Call Trace:

 

Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0000 warn_slowpath+0x61/0x78

 

Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0001 apm+0x3c9/0x51b

 

Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0002 reschedule_interrupt+0x28/0x30

 

Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0003 smp_reschedule_interrupt+0x10/0x21

 

Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0004 reschedule_interrupt+0x28/0x30

 

Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0005 next_cpu+0x12/0x21 Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0006 find_busiest_group+0x23e/0x671 Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0007 dev_watchdog+0x110/0x194 Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0008 rebalance_domains+0x124/0x33d Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0009 dev_watchdog+0x0/0x194 Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0010 run_timer_softirq+0xf5/0x14a Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0011 do_softirq+0x76/0x113

 

Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0012 __do_softirq+0x0/0x113

 

Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0014 irq_exit+0x35/0x73

 

Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0015 smp_apic_timer_interrupt+0x6e/0x78

 

Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0016 apic_timer_interrupt+0x28/0x30

 

Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0017 acpi_ex_prep_field_value+0x131/0x1aa

 

Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0018 acpi_safe_halt+0x18/0x25

 

Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0019 acpi_idle_enter_c1+0x9a/0xf0

 

Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0020 cpuidle_idle_call+0x5c/0x94

 

Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0021 cpu_idle+0x68/0x98

 

Sep 26 17:45:30 cwcos user.warn kernel: --- end trace 34ba8c1b33fd6912 ---

 

Sep 26 17:45:30 cwcos user.err kernel: ixgbe: eth4: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 0 not cleared within the polling period

 

Sep 26 17:45:30 cwcos user.err kernel: ixgbe: eth4: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 1 not cleared within the polling period

 

Sep 26 17:45:30 cwcos user.err kernel: ixgbe: eth4: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 2 not cleared within the polling period

 

Sep 26 17:45:30 cwcos user.err kernel: ixgbe: eth4: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 3 not cleared within the polling period

 

Sep 26 17:45:30 cwcos user.err kernel: ixgbe: eth4: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 4 not cleared within the polling period

 

Sep 26 17:45:30 cwcos user.err kernel: ixgbe: eth4: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 5 not cleared within the polling period

 

Sep 26 17:45:30 cwcos user.err kernel: ixgbe: eth4: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 6 not cleared within the polling period

 

Sep 26 17:45:30 cwcos user.err kernel: ixgbe: eth4: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 7 not cleared within the polling period

 

Sep 26 17:45:37 cwcos user.err kernel: ixgbe: eth4: ixgbe_reset: Hardware Error: -15 !
Mark_H_Intel
Employee
74 Views

Troubleshooting this issue is beyond what I know, so I contacted a Linux driver developer for his suggestions. He cannot tell what the issue is from this information. He thinks that something might be happening with the OS that then causes the problem shown in the log you posted. Possibly something shows up earlier in the log that leads up to the watchdog warning. Please supply the preceding log messages.

He would also like to see some other information might help with troubleshooting:

 

Please provide the ethreg's register dump of the system in the failure state. You can get the tool from SourceForge at http://sourceforge.net/projects/e1000/files/Ethregs%20-%20Register%20Dump%20Tool/ http://sourceforge.net/projects/e1000/files/Ethregs%20-%20Register%20Dump%20Tool/.

 

What are NIC's stats from "ethtool -s"?

 

What is the hardware you are running on?

 

What is the output of lspci -vvv?

With the additional information, he might be able to get closer to the cause and give suggestions.

Mark H

Vegan
New Contributor I
74 Views

Given the mess with the Ethernet stack, it might be reasonable to reinstall the OS.

Make sure you are using the latest distribution of CentOS so that you have the current kernel etc.

I use a different distribution but CentOS is the same as for the others.

idata
Community Manager
74 Views

Further help on this would help me too.