I have a pair of S5500WB motherboards that I am using for ESXi lab servers. These both have been running for over 2 years without issue. I have a second dual-port 1GB NIC card in the PCIe slot of both so there are a total of four 1GB NICs.
As of Sunday, vmnic0 (first port on the motherboards) shows as "disconnected" on both servers. I recently updated the ESXi 5.5 software to the latest patch and not too long ago updated the BIOS with the latest EFI package from intel.com. Sunday I powered down my NAS and the ESXi servers to blow out the dust and clean them up. When I powered them on is when I noticed that vmnic0 was no longer functioning. It says it's there, just that it's been disconnected. I swapped switch ports with the working vmnic1 and no luck. I swapped cables with vmnic1, no luck. I rebooted. I cleared CMOS. I re-installed ESXi. I re-flashed the BIOS. No luck with any of that. If I try to PXE boot, the NIC0 fails with a "media disconnected" error, NIC1 just times out waiting for DHCP. I have replaced/checked every media component between the server NIC0 and the switch. There are no link/activity lights on NIC0 or its switch port. Not even during POST/boot up. The lights are working fine on NIC1.
NICs 1, 2, and 3 are up and running fine. If this were just one motherboard, I would suspect faulty hardware, but to happen to both servers at the same time, I am thinking there is something sketchy in the BIOS update that I performed. Maybe related to the BMC?
I am at a loss with the BMC components on this motherboard. I have no idea what function they serve and if they are at all useful to me/ESXi. It's just a lab, and NIC0/NIC1 are just management interfaces in a redundant configuration. NIC2 and NIC3 do all the VM and iSCSI traffic. Since I still have NIC1 I am not inconvenienced other than the ESXi alarm about lost redundancy.
Anyone have any ideas about what else I can check or change to recover NIC0 on these two boards? Would the hardware confidence test (HCT) be of any help? I loaded a linux distro on the server and still did not have access to NIC0, so I do not think the issue is with ESXi (it could have initiated it).
Does the BMC interface piggy-back NIC0? Could that be causing an issue?
I agree having two boards exhibiting the same situation on the same NIC0 does not point to a hardware malfunctioning; as you implied, it may initially point to a misconfiguration BIOS/firmware-wise. To be 100% sure, based on all of the outlined troubleshooting you have performed, please, let me know the following:
1. What is the http://www.intel.com/support/network/sb/CS-031176.htm# Lsp PBA number of your Intel® Server Board http://ark.intel.com/products/36721/Intel-Server-Board-S5500WB?q=S5500WB S5500WB?
2. What BIOS version did you update the server to?
3. Did you clear the event log (in BIOS) and load the BIOS default settings right after you cleared the CMOS?
1. PBA E40367-305
2. This Update package includes the following system software updates:
System BIOS - 64
BMC Firmware - 00.61
ME Firmware - 1.12
FRUSDR - 16
3. No, I cleared the CMOS using J1B4. That didn't change anything, so I proceeded to do the BIOS update. I skipped the FRU/SDR piece since I really don't know what that does. I did not do anything to the BIOS config after reloading the image. The event log is still there as far as I know. I see an option to clear it, but I don't know how to read it.
Wow... how embarrassing... Do NOT plug your ethernet cables into the RJ-45 Serial port connector. It will fit, but it sure won't work. Also, do NOT mirror your first mistake on to your second server.
/tail tucked, head hung in shame...