I'm suddenly having problems with VMWare ESXi 6 on a S2600CWTS motherboard. All was well for the first month or so after I had it running, but in the last two weeks I'm seeing errors in the VMWare event log saying that it lost access to the datastore which is on a RAID 5 array connected to the LSI onboard controller. A couple of times the RAID controller had to rebuild one of the raid drives, but each time it was a different drive, which leads me to believe that the problem isn't a failing drive, but is rather an issue with the controller, or a software/driver issue with ESXi 6. Also one time the controller took two drives offline, causing ESXi to crash. The drives are Samsung 1tb SSD 850 Pro (MZ-7KE1T0BW). After the rebuild, or bringing the drives back online through the RAID controller BIOS, everything worked fine and no data was lost, furthering my thought that the drives aren't the problem. I've also been able to copy VMs off the system without problems and without causing timeout errors, so it doesn't appear that specific data areas on the drives are an issue.
I'm not quite sure how to proceed at this point. I've taken some of the more critical VMs off the server and put them back on the old one until I can get this figured out. For now though can anyone direct me to where I can get the BIOS manual for the LSI controller, as I haven't tried to do an integrity check yet, and I don't want to fool with that until I have the manual in front of me and know what I'm doing. I was able to bring the drives back online, and one time force a rebuild on a drive the controller had marked "Failed" without the manual, but it took me a while to figure out how to do it. I haven't had any luck finding that BIOS manual on the Intel or LSI website.
Also, does anyone know if there is some software that I can add to ESXi 6 that would allow me to access the RAID controller's functions without having to bring ESXi down and do it from the BIOS screen? Dell has some software vib's that you can install into ESXi that allows you access their RAID controller and do things like run an integrity check and do a hot swap without taking down the server. I see the settings in the RAID controller BIOS for things like limiting the amount of resources consumed during an integrity check, which leads to believe that their must be a way to do it without having the system down.
Could you please confirm the exact model of the RAID controller you are using? At least in our Intel RAID controllers we do offer the option to use the Intel® RAID Web Console to monitor your array without rebooting your system or bring your hypervisor down.
http://www.intel.com/support/motherboards/server/sb/CS-033313.htm ESXi White Paper
The RAID controller is an LSI* SAS3008 SAS 12G that comes on the S2600CWTS motherboard. I looked at your link and I don't see that model unless it has an Intel part number that I don't recognize. Also, when I found the software it says it's for ESXi 5, so I'm not sure if it works for ESXi 6 or not, but if it works with that controller and ESXi 6, then it is what I'm looking for.
I can't reboot it right now to look at the POST screen, but the only other names I know it's called by is RS3YC or "LSI MegaRAID SAS Fury Controller" which is how ESXi identifies it . However, it is the onboard hardware based RAID SAS controller that comes with the S2600CWTS motherboard; it's not a card that I added to the server.
Thank you for the update. This would help us expedite the resolution of your case. As a matter of a fact, I would recommend contacting our http://www.intel.com/p/en_US/support/contactsupport Intel Customer Support team.
I figured out what the issue was. Although I installed the LSI drivers for the raid controller, ESXi continued to use the older default driver with the controller even though the newer driver was available. I didn't figure that out until I found a similar situation described someplace else. After disabling the default driver so that ESXi had no choice but to use the LSI version of the driver, the problem disappeared. I don't know if this is a VMWare issue or a problem with how LSI packaged the install script for their driver, but it doesn't appear to be a problem with the hardware.