yesterday one of our VMware ESXi 5.5 servers (Supermicro with X520 DP SFP+ NIC) crashed with a PSOD (Purple Screen Of Death).
At the exact same time when the PSOD occured, we had a total network outage of the whole VMware cluster, as both 10G switches, where the two ports of that server's NIC are connected to, had a lot of (spanning tree-) trouble (considering theirselves as the spanning tree root bridge, flapping network ports, broadcast storms and so on).
This outage lasted *exactly* until the moment when we pushed the reset button on that server.
We already saw the same behavior in November 2015 on the same server with the same bad consequences and also solved it by resetting the server.
I know that sounds really really weird but the only explanation for this behavior, that sounds reasonable for us, is, that the X520 NIC somehow turned into a kind of "bridge all traffic between the two ports"-mode causing a network bridging loop, after ESXi suddenly crashed with a PSOD.
Has someone ever heard of such a weird behavior or can at least somebody imagine that this could have happened?
I think it would be possible to manually and intentionally achieve that behavior by directly configuring the network card, but could it happen accidentally?
Please let me know your thoughts about it.
----- Some additional NIC information ------
~ # esxcfg-nics -l
Name PCI Driver Link Speed Duplex MAC Address MTU Description
vmnic0 0000:02:00.00 igb Down 0Mbps Half 00:25:90:a4:28:56 1500 Intel Corporation 82576 Gigabit Network Connection
vmnic1 0000:02:00.01 igb Down 0Mbps Half 00:25:90:a4:28:57 1500 Intel Corporation 82576 Gigabit Network Connection
vmnic2 0000:04:00.00 ixgbe Up 10000Mbps Full 90:e2:ba:3a:04:2c 9000 Intel Corporation 82599 10 Gigabit Dual Port Network Connection
vmnic3 0000:04:00.01 ixgbe Up 10000Mbps Full 90:e2:ba:3a:04:2d 9000 Intel Corporation 82599 10 Gigabit Dual Port Network Connection
~ # ethtool vmnic2
Settings for vmnic2:
Supported ports: [ FIBRE ]
Supported link modes: 1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 1000baseT/Full
Advertised auto-negotiation: Yes
Speed: Unknown! (10000)
Supports Wake-on: d
Current message level: 0x00000007 (7)
Link detected: yes
~ # ethtool -i vmnic2
~ # ethtool -k vmnic2
Offload parameters for vmnic2:
Cannot get device udp large send offload settings: Function not implemented
Cannot get device generic segmentation offload settings: Function not implemented
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: off
Thank you for contacting Intel.
We do not have known issues related to looping. However, I noticed that your driver is quite old. Please try updating the driver for your network connection. You may download the driver here - https://sourceforge.net/projects/e1000/files/ixgbe%20stable/4.3.15/ Intel Ethernet Drivers and Utilities - Browse /ixgbe stable/4.3.15 at SourceForge.net
Kindly check with Supermicro* if they have verified driver updates for your network connection.
Hope this help resolve the looping issue.
With our new Intel servers and our Cisco stack we have exactly the same problem and all drivers and firmwares are up to date.
How did you solve the issue?