Community
cancel
Showing results for 
Search instead for 
Did you mean: 
PPari4
Beginner
2,730 Views

X710 10GbE SFP card, i40e and NIC Link is Down due to DCB init failed and tx_timeout

Hi,

We run CentOS 7.4 with kernel 4.9.x on HP hardware and noticed that few server got their network interfaces marked down by the kernel. In the logs we saw a lot of

reports for DCB init failed -53, disabled, TX driver issue detected, PF reset issued and eth0: tx_timeout: VSI_seid followed by marking the link down.

Here is the full log:

2017-10-04T15:50:29.908202+02:00kernel: i40e 0000:04:00.1 eth0: tx_timeout recovery level 1, hung_queue 11

2017-10-04T15:50:30.061686+02:00kernel: i40e 0000:04:00.1: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM

2017-10-04T15:50:30.061693+02:00kernel: i40e 0000:04:00.1: DCB init failed -53, disabled

2017-10-04T15:50:36.085291+02:00kernel: i40e 0000:04:00.1 eth0: tx_timeout: VSI_seid: 388, Q 2, NTC: 0x20, HWB: 0x20, NTU: 0x100, TAIL: 0x100, INT: 0x0

2017-10-04T15:50:36.085295+02:00kernel: i40e 0000:04:00.1 eth0: tx_timeout recovery level 2, hung_queue 2

2017-10-04T15:50:39.328928+02:00kernel: i40e 0000:04:00.0: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM

2017-10-04T15:50:39.328936+02:00kernel: i40e 0000:04:00.0: DCB init failed -53, disabled

2017-10-04T15:50:39.637232+02:00kernel: i40e 0000:04:00.1: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM

2017-10-04T15:50:39.637237+02:00kernel: i40e 0000:04:00.1: DCB init failed -53, disabled

2017-10-04T15:50:40.111808+02:00kernel: i40e 0000:04:00.1: TX driver issue detected, PF reset issued

2017-10-04T15:50:40.788697+02:00kernel: i40e 0000:04:00.1: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM

2017-10-04T15:50:40.788702+02:00kernel: i40e 0000:04:00.1: DCB init failed -53, disabled

2017-10-04T15:50:46.839994+02:00kernel: i40e 0000:04:00.1 eth0: tx_timeout: VSI_seid: 388, Q 11, NTC: 0x54, HWB: 0x54, NTU: 0xed, TAIL: 0xed, INT: 0x1

2017-10-04T15:50:46.839998+02:00kernel: i40e 0000:04:00.1 eth0: tx_timeout recovery level 3, hung_queue 11

2017-10-04T15:50:50.119447+02:00kernel: i40e 0000:04:00.0: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM

2017-10-04T15:50:50.119455+02:00kernel: i40e 0000:04:00.0: DCB init failed -53, disabled

2017-10-04T15:50:50.301798+02:00kernel: i40e 0000:04:00.0 eth1: NIC Link is Down

2017-10-04T15:50:50.423744+02:00kernel: i40e 0000:04:00.1: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM

2017-10-04T15:50:50.423752+02:00kernel: i40e 0000:04:00.1: DCB init failed -53, disabled

2017-10-04T15:50:50.600812+02:00kernel: i40e 0000:04:00.1 eth0: NIC Link is Down

2017-10-04T15:50:50.764799+02:00kernel: i40e 0000:04:00.1 eth0: NIC Link is Up 10 Gbps Full Duplex, Flow Control: None

2017-10-04T15:50:53.234804+02:00kernel: i40e 0000:04:00.0 eth1: NIC Link is Up 10 Gbps Full Duplex, Flow Control: None

2017-10-04T15:51:17.201808+02:00kernel: i40e 0000:04:00.1: TX driver issue detected, PF reset issued

2017-10-04T15:51:17.783439+02:00kernel: i40e 0000:04:00.1: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM

2017-10-04T15:51:17.783447+02:00kernel: i40e 0000:04:00.1: DCB init failed -53, disabled

2017-10-04T15:51:18.392805+02:00kernel: i40e 0000:04:00.1: TX driver issue detected, PF reset issued

2017-10-04T15:51:18.814970+02:00kernel: i40e 0000:04:00.1: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM

2017-10-04T15:51:18.814978+02:00kernel: i40e 0000:04:00.1: DCB init failed -53, disabled

2017-10-04T15:51:19.436807+02:00kernel: i40e 0000:04:00.1: TX driver issue detected, PF reset issued

2017-10-04T15:51:19.767258+02:00kernel: i40e 0000:04:00.1: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM

2017-10-04T15:51:19.767265+02:00kernel: i40e 0000:04:00.1: DCB init failed -53, disabled

2017-10-04T15:51:20.440800+02:00kernel: i40e 0000:04:00.1: TX driver issue detected, PF reset issued

2017-10-04T15:51:20.793083+02:00kernel: i40e 0000:04:00.1: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM

2017-10-04T15:51:20.793091+02:00kernel: i40e 0000:04:00.1: DCB init failed -53, disabled

2017-10-04T15:51:21.471805+02:00kernel: i40e 0000:04:00.1: TX driver issue detected, PF reset issued

2017-10-04T15:51:21.810807+02:00kernel: i40e 0000:04:00.1: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM

2017-10-04T15:51:21.810811+02:00kernel: i40e 0000:04:00.1: DCB init failed -53, disabled

2017-10-04T15:51:22.468707+02:00kernel: i40e 0000:04:00.1: TX driver issue detected, PF reset issued

2017-10-04T15:51:22.772829+02:00kernel: i40e 0000:04:00.1: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM

2017-10-04T15:51:22.772833+02:00kernel: i40e 0000:04:00.1: DCB init failed -53, disabled

2017-10-04T15:51:23.411802+02:00kernel: i40e 0000:04:00.1: TX driver issue detected, PF reset issued

2017-10-04T15:51:23.796867+02:00kernel: i40e 0000:04:00.1: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM

2017-10-04T15:51:23.796872+02:00kernel: i40e 0000:04:00.1: DCB init failed -53, disabled

2017-10-04T15:51:24.440800+02:00kernel: i40e 0000:04:00.1: TX driver issue detected, PF reset issued

2017-10-04T15:51:24.758945+02:00kernel: i40e 0000:04:00.1: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM

2017-10-04T15:51:24.758950+02:00kernel: i40e 0000:04:00.1: DCB init failed -53, disabled

2017-10-04T15:51:25.411806+02:00kernel: i40e 0000:04:00.1: TX driver issue detected, PF reset issued

2017-10-04T15:51:25.782778+02:00kernel: i40e 0000:04:00.1: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM

2017-10-04T15:51:25.782781+02:00kernel: i40e 0000:04:00.1: DCB init failed -53, disabled

2017-10-04T15:51:26.417804+02:00kernel: i40e 0000:04:00.1: TX driver issue detected, PF reset issued

2017-10-04T15:51:26.804559+02:00kernel: i40e 0000:04:00.1: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM

<p style="...
0 Kudos
4 Replies
PPari4
Beginner
697 Views

We have been running firmware version 1.1618.0 for 40 hours now and we haven't since a singe case where the link went down. At the same time another box with the same kernel and firmware version 1.1752.0. had his link down twice the past 40 hours.

So, there is something going wrong with firmware version 1.1752.0.

PPari4
Beginner
697 Views

I see several kernel commits for i40e module and I am wondering if they fix our issue, here are some of them that have already been pushed to4.14.rc kernels

3c8f3e96af3a6799841761923d000566645f0942

09f79fd49d94cda5837e9bfd0cb222232b3b6d9f

0a2c7722be1705edca34458bd9de2f97188f9636

ba4460d45a6ec04e29e55e6c97edc0e842c18999

2bf01935ec5362aee6ff9ffc2476043af321aa42

PPari4
Beginner
697 Views

Some more info.

We have noticed that firmware version 1.1618.0 on kernel 4.9 causes stability problems on BGP peerings. We run Bird daemon and at the time we get the errors I mentioned above we see bird complaining about timeout issues on BGP keep-alives and as a result bird breaks down the BGP peering with the switch, which causes the server do not receive traffic and clients to get TCP RST due to the reshuffling of incoming TCP connections.

The only option we have now is to switch back to 3.10 kernel from RedHat, which is something we don't want as we need the functionality and performance we get out from Kernel 4.9.

PPari4
Beginner
697 Views

Has anyone looked at this?

Reply