We are facing a peculiar problem when connecting particular NICs to particular switches. The NICs will go off-line for 4-5 seconds, in irregular intervals, and then return to service as if nothing had happened. The weird thing is, we see this only on particular combinations of NICs and switches:
- Intel 82599EB, 8086:151c, only work reliably on a Nexus 4900M switch. When used on a Nexus 3064, we get a lot of those:
Sep 2 14:06:16 host kernel: ixgbe 0000:04:00.0: eth0: NIC Link is Down
Sep 2 14:06:21 host kernel: ixgbe 0000:04:00.0: eth0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
Sep 2 14:06:21 host kernel: ixgbe 0000:04:00.0: eth0: NIC Link is Down
Sep 2 14:06:23 host kernel: ixgbe 0000:04:00.0: eth0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
- Intel X540-AT2, 8086:1528, only work reliably on a Nexus 3064, When used on a Nexus 4900M, we observe the above.
All servers in question are running CentOS6. Those that are Dell have all the latest firmware and BIOS updates installed. We even tested the (then) latest Linux drivers, downloaded directly from Intel, over the ones that come with the OS, but it made no difference. Other machines in the mix are HP and custom-built, and all have one or the other NIC card.
Are there any debug tools for the NICs that could be useful here? Are there any particular options that should or should not be set?
We have had problems with Intel NICs and Nexus 5596 until we forced speed 10000 on the switch side. I suggest giving that a try to see if it help you with the 3064. Also, you say that you are having problem with the Nexus 4900M? To my knowledge, that is not a Catalyst (not Nexus) device.
You are right, the 4900M is a Catalyst, not Nexus model.
I have figured out a different solution in the meantime: enable flow control (switch side)/disable pause frames (NIC side, ethtool) when connecting 82599EB controllers to the 3064. I would even argue the ixgbe driver must have a long-standing bug with pause frame autonegotiation, but of course, IANAE.
Thanks for posting.
Here are some solutions that may resolve your connection problem:
1. Update driver - you may download the latest driver here: https://downloadcenter.intel.com/ Intel® Download Center
2. Advance Driver Settings that you may configure: http://www.intel.com/support/network/adapter/pro100/sb/CS-029402.htm Intel® Server Adapters — Advanced Driver Settings for Intel® Ethernet 10 Gigabit Server Adapters
I see you have enabled Flow Control. Try to disable other advanced features and observe your connection:
- Interrupt Moderation
- Interrupt Moderation Rate
- TCP Checksum Offload
Hope this is helpful.