X520 DP SFP+ causing network loop?

CHert1 · ‎03-24-2016

Hi,

yesterday one of our VMware ESXi 5.5 servers (Supermicro with X520 DP SFP+ NIC) crashed with a PSOD (Purple Screen Of Death).

At the exact same time when the PSOD occured, we had a total network outage of the whole VMware cluster, as both 10G switches, where the two ports of that server's NIC are connected to, had a lot of (spanning tree-) trouble (considering theirselves as the spanning tree root bridge, flapping network ports, broadcast storms and so on).

This outage lasted *exactly* until the moment when we pushed the reset button on that server.

We already saw the same behavior in November 2015 on the same server with the same bad consequences and also solved it by resetting the server.

I know that sounds really really weird but the only explanation for this behavior, that sounds reasonable for us, is, that the X520 NIC somehow turned into a kind of "bridge all traffic between the two ports"-mode causing a network bridging loop, after ESXi suddenly crashed with a PSOD.

Has someone ever heard of such a weird behavior or can at least somebody imagine that this could have happened?

I think it would be possible to manually and intentionally achieve that behavior by directly configuring the network card, but could it happen accidentally?

Please let me know your thoughts about it.

Best regards,

Christian Hertel

----- Some additional NIC information ------

~ # esxcfg-nics -l

Name PCI Driver Link Speed Duplex MAC Address MTU Description

vmnic0 0000:02:00.00 igb Down 0Mbps Half 00:25:90:a4:28:56 1500 Intel Corporation 82576 Gigabit Network Connection

vmnic1 0000:02:00.01 igb Down 0Mbps Half 00:25:90:a4:28:57 1500 Intel Corporation 82576 Gigabit Network Connection

vmnic2 0000:04:00.00 ixgbe Up 10000Mbps Full 90:e2:ba:3a:04:2c 9000 Intel Corporation 82599 10 Gigabit Dual Port Network Connection

vmnic3 0000:04:00.01 ixgbe Up 10000Mbps Full 90:e2:ba:3a:04:2d 9000 Intel Corporation 82599 10 Gigabit Dual Port Network Connection

~ # ethtool vmnic2

Settings for vmnic2:

Supported ports: [ FIBRE ]

Supported link modes: 1000baseT/Full

Supports auto-negotiation: Yes

Advertised link modes: 1000baseT/Full

Advertised auto-negotiation: Yes

Speed: Unknown! (10000)

Duplex: Full

Port: FIBRE

PHYAD: 0

Transceiver: external

Auto-negotiation: on

Supports Wake-on: d

Wake-on: d

Current message level: 0x00000007 (7)

Link detected: yes

~ # ethtool -i vmnic2

driver: ixgbe

version: 3.21.4iov

firmware-version: 0x61c10001

bus-info: 0000:04:00.0

~ # ethtool -k vmnic2

Offload parameters for vmnic2:

Cannot get device udp large send offload settings: Function not implemented

Cannot get device generic segmentation offload settings: Function not implemented

rx-checksumming: on

tx-checksumming: on

scatter-gather: on

tcp segmentation offload: on

udp fragmentation offload: off

generic segmentation offload: off

SYeo3 · ‎03-25-2016

Dear chertel,

Thank you for contacting Intel.

We do not have known issues related to looping. However, I noticed that your driver is quite old. Please try updating the driver for your network connection. You may download the driver here - https://sourceforge.net/projects/e1000/files/ixgbe%20stable/4.3.15/ Intel Ethernet Drivers and Utilities - Browse /ixgbe stable/4.3.15 at SourceForge.net

Kindly check with Supermicro* if they have verified driver updates for your network connection.

Hope this help resolve the looping issue.

Sincerely,

Sandy

PStei2 · ‎11-02-2016

Hi chertel

With our new Intel servers and our Cisco stack we have exactly the same problem and all drivers and firmwares are up to date.

How did you solve the issue?

ThanksSmilery