In the technical product specification of the board I can see that the problem can be connected with on of the following issues:
– Critical temperature threshold asserted
– CATERR asserted
– Critical voltage threshold asserted
– VRD hot asserted
– SMI Timeout asserted
First thing that caught my attention were metrics from vsphere client which shows that one of the fan was spinning at half of speed (1500rpm) compared to the others. Is there any location to check any threshold value and if this can be a problem?
Second issue that can be connected with the blinking led could be probably temperature of the cpu. In the attached picture from vsphere temperatures are marked as normal but values seems to be messed up..
Third thing that I came around in the /var/log/messages was a warning that one of the disk is above the temparature treshold. After checking smart parameters I could see that all metrics were ok except one - Driver Rated Max Temperature. Could this metric trigger the blinking led?
esxcli storage core device smart get -d t10.ATA_____WDC_WD5000AAKS2D60YGA1___
Parameter Value Threshold Worst
---------------------------- ----- --------- -----
Health Status OK N/A N/A
Media Wearout Indicator N/A N/A N/A
Write Error Count N/A N/A N/A
Read Error Count 200 51 200
Power-on Hours 61 0 61
Power Cycle Count 100 0 100
Reallocated Sector Count 200 140 200
Raw Read Error Rate 200 51 200
Drive Temperature 119 0 91
Driver Rated Max Temperature 69 45 41 -- !!!!!!
Write Sectors TOT Count 200 0 200
Read Sectors TOT Count 200 51 200
Initial Bad Block Count 100 97 100
Tomorow I will probably open the case of the server and looked at fan led. Is there anything to be take in consideration before stepping into action?