A customer of ours has an Intel R2208GZ4GC server which is running Windows Server 2012 R2 as the host OS. It has 2 x SSDs (configured as RAID 1 for the OS) and 5 x 600GB SATA 6G drive. There are two VMS also running 2012 R2.
The BIOS is at 2.06.0005 which is the latest for the hardware and the RAID firmware is also up to date. The latest updates have been applied to both the host and guest operating systems.
The server has hung every 24-48 hours for the last 5 days. There were no previous problems and nothing changed preceding the hangs ( that I know of at least!). The symptoms are:
Grateful for any suggestions on what we could try next to isolate the problem.
This issue you describe sound like software / Windows related. But if you want to make sure about possible hardware related issues you might try to retrieve the board system logs using the following tool https://downloadcenter.intel.com/download/25440/System-Event-Log-SEL-Viewer-Utility?product=56262 https://downloadcenter.intel.com/download/25440/System-Event-Log-SEL-Viewer-Utility?product=56262 . It will get the logs stored on the BMC showing all the info related to sensors, voltages and such.
Please save the log into a file and let us know.
Yes, we initially thought it was Windows but since there were no obvious errors or problems there we started to look at the hardware. The only changes we have applied hardware-wise so far are upgrading the BIOS to the latest version, re-seating RAM and blowing out a lot of dust that had accumulated within the server.
I've attached the SEL log from the past couple of days ( I zeroed it at the weekend). It doesn't seem to indicate anything major is wrong but you might see something we are missing. Note that you'll see a flurry of activity yesterday afternoon as we updated BIOS and tested the PSUs.
Thank for attaching the logs. I looked at them and the only thing that attracted my attention were repetitive warnings about PS redundant power lost and a couple of critical records about PS with insufficient resources to maintain normal operation. I assume these happened during your PS testing. This could possibly generate a reboot or a total shutdown due to insufficient power but I don't think it could create a system hang.
Lets allow the system to run and lets gather the logs right after any of the hangs reoccur so we can see if hardware related errors are shown.
Do you have any updates about this? If not can we mark this thread as resolved and you can come back in case it reappears.