- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We run CentOS 7.4 with kernel 4.9.x on HP hardware and noticed that few server got their network interfaces marked down by the kernel. In the logs we saw a lot of
reports for DCB init failed -53, disabled, TX driver issue detected, PF reset issued and eth0: tx_timeout: VSI_seid followed by marking the link down.
Here is the full log:
2017-10-04T15:50:29.908202+02:00kernel: i40e 0000:04:00.1 eth0: tx_timeout recovery level 1, hung_queue 11
2017-10-04T15:50:30.061686+02:00kernel: i40e 0000:04:00.1: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM
2017-10-04T15:50:30.061693+02:00kernel: i40e 0000:04:00.1: DCB init failed -53, disabled
2017-10-04T15:50:36.085291+02:00kernel: i40e 0000:04:00.1 eth0: tx_timeout: VSI_seid: 388, Q 2, NTC: 0x20, HWB: 0x20, NTU: 0x100, TAIL: 0x100, INT: 0x0
2017-10-04T15:50:36.085295+02:00kernel: i40e 0000:04:00.1 eth0: tx_timeout recovery level 2, hung_queue 2
2017-10-04T15:50:39.328928+02:00kernel: i40e 0000:04:00.0: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM
2017-10-04T15:50:39.328936+02:00kernel: i40e 0000:04:00.0: DCB init failed -53, disabled
2017-10-04T15:50:39.637232+02:00kernel: i40e 0000:04:00.1: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM
2017-10-04T15:50:39.637237+02:00kernel: i40e 0000:04:00.1: DCB init failed -53, disabled
2017-10-04T15:50:40.111808+02:00kernel: i40e 0000:04:00.1: TX driver issue detected, PF reset issued
2017-10-04T15:50:40.788697+02:00kernel: i40e 0000:04:00.1: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM
2017-10-04T15:50:40.788702+02:00kernel: i40e 0000:04:00.1: DCB init failed -53, disabled
2017-10-04T15:50:46.839994+02:00kernel: i40e 0000:04:00.1 eth0: tx_timeout: VSI_seid: 388, Q 11, NTC: 0x54, HWB: 0x54, NTU: 0xed, TAIL: 0xed, INT: 0x1
2017-10-04T15:50:46.839998+02:00kernel: i40e 0000:04:00.1 eth0: tx_timeout recovery level 3, hung_queue 11
2017-10-04T15:50:50.119447+02:00kernel: i40e 0000:04:00.0: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM
2017-10-04T15:50:50.119455+02:00kernel: i40e 0000:04:00.0: DCB init failed -53, disabled
2017-10-04T15:50:50.301798+02:00kernel: i40e 0000:04:00.0 eth1: NIC Link is Down
2017-10-04T15:50:50.423744+02:00kernel: i40e 0000:04:00.1: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM
2017-10-04T15:50:50.423752+02:00kernel: i40e 0000:04:00.1: DCB init failed -53, disabled
2017-10-04T15:50:50.600812+02:00kernel: i40e 0000:04:00.1 eth0: NIC Link is Down
2017-10-04T15:50:50.764799+02:00kernel: i40e 0000:04:00.1 eth0: NIC Link is Up 10 Gbps Full Duplex, Flow Control: None
2017-10-04T15:50:53.234804+02:00kernel: i40e 0000:04:00.0 eth1: NIC Link is Up 10 Gbps Full Duplex, Flow Control: None
2017-10-04T15:51:17.201808+02:00kernel: i40e 0000:04:00.1: TX driver issue detected, PF reset issued
2017-10-04T15:51:17.783439+02:00kernel: i40e 0000:04:00.1: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM
2017-10-04T15:51:17.783447+02:00kernel: i40e 0000:04:00.1: DCB init failed -53, disabled
2017-10-04T15:51:18.392805+02:00kernel: i40e 0000:04:00.1: TX driver issue detected, PF reset issued
2017-10-04T15:51:18.814970+02:00kernel: i40e 0000:04:00.1: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM
2017-10-04T15:51:18.814978+02:00kernel: i40e 0000:04:00.1: DCB init failed -53, disabled
2017-10-04T15:51:19.436807+02:00kernel: i40e 0000:04:00.1: TX driver issue detected, PF reset issued
2017-10-04T15:51:19.767258+02:00kernel: i40e 0000:04:00.1: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM
2017-10-04T15:51:19.767265+02:00kernel: i40e 0000:04:00.1: DCB init failed -53, disabled
2017-10-04T15:51:20.440800+02:00kernel: i40e 0000:04:00.1: TX driver issue detected, PF reset issued
2017-10-04T15:51:20.793083+02:00kernel: i40e 0000:04:00.1: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM
2017-10-04T15:51:20.793091+02:00kernel: i40e 0000:04:00.1: DCB init failed -53, disabled
2017-10-04T15:51:21.471805+02:00kernel: i40e 0000:04:00.1: TX driver issue detected, PF reset issued
2017-10-04T15:51:21.810807+02:00kernel: i40e 0000:04:00.1: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM
2017-10-04T15:51:21.810811+02:00kernel: i40e 0000:04:00.1: DCB init failed -53, disabled
2017-10-04T15:51:22.468707+02:00kernel: i40e 0000:04:00.1: TX driver issue detected, PF reset issued
2017-10-04T15:51:22.772829+02:00kernel: i40e 0000:04:00.1: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM
2017-10-04T15:51:22.772833+02:00kernel: i40e 0000:04:00.1: DCB init failed -53, disabled
2017-10-04T15:51:23.411802+02:00kernel: i40e 0000:04:00.1: TX driver issue detected, PF reset issued
2017-10-04T15:51:23.796867+02:00kernel: i40e 0000:04:00.1: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM
2017-10-04T15:51:23.796872+02:00kernel: i40e 0000:04:00.1: DCB init failed -53, disabled
2017-10-04T15:51:24.440800+02:00kernel: i40e 0000:04:00.1: TX driver issue detected, PF reset issued
2017-10-04T15:51:24.758945+02:00kernel: i40e 0000:04:00.1: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM
2017-10-04T15:51:24.758950+02:00kernel: i40e 0000:04:00.1: DCB init failed -53, disabled
2017-10-04T15:51:25.411806+02:00kernel: i40e 0000:04:00.1: TX driver issue detected, PF reset issued
2017-10-04T15:51:25.782778+02:00kernel: i40e 0000:04:00.1: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM
2017-10-04T15:51:25.782781+02:00kernel: i40e 0000:04:00.1: DCB init failed -53, disabled
2017-10-04T15:51:26.417804+02:00kernel: i40e 0000:04:00.1: TX driver issue detected, PF reset issued
2017-10-04T15:51:26.804559+02:00kernel: i40e 0000:04:00.1: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM
<p style="...Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We have been running firmware version 1.1618.0 for 40 hours now and we haven't since a singe case where the link went down. At the same time another box with the same kernel and firmware version 1.1752.0. had his link down twice the past 40 hours.
So, there is something going wrong with firmware version 1.1752.0.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I see several kernel commits for i40e module and I am wondering if they fix our issue, here are some of them that have already been pushed to4.14.rc kernels
3c8f3e96af3a6799841761923d000566645f0942
09f79fd49d94cda5837e9bfd0cb222232b3b6d9f
0a2c7722be1705edca34458bd9de2f97188f9636
ba4460d45a6ec04e29e55e6c97edc0e842c18999
2bf01935ec5362aee6ff9ffc2476043af321aa42
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Some more info.
We have noticed that firmware version 1.1618.0 on kernel 4.9 causes stability problems on BGP peerings. We run Bird daemon and at the time we get the errors I mentioned above we see bird complaining about timeout issues on BGP keep-alives and as a result bird breaks down the BGP peering with the switch, which causes the server do not receive traffic and clients to get TCP RST due to the reshuffling of incoming TCP connections.
The only option we have now is to switch back to 3.10 kernel from RedHat, which is something we don't want as we need the functionality and performance we get out from Kernel 4.9.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Has anyone looked at this?
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page