Server Products
Data Center Products including boards, integrated systems, Intel® Xeon® Processors, RAID Storage; and Intel® Xeon® Processors
4482 Discussions

S2600ST sensor "VRD hot" asserted - but nothing is actually running hot

UlrichP
Beginner
2,821 Views

Hi all,

 

maybe someone has an idea re. VRD hot asserted. My system is a S2600STBR with XEON Silver 4210 and Kingston 4*KSM26RD8/16HDI. System is running well, i do not see any issues except for the "System Status LED" is blinking amber ( 1 second frequency ). It's about the VRD hot sensor: SEL says "CPU2, DIMM Channel 1/2" ) but CPU2 and DIMM are not even populated !

One can touch all heatsinks on board, they are not even warm ( and definitely not hot ).

Anyhow, i installed 2 additional chassis fans in server case, but it does not change anything.

I already updated system from initial firmware to latest BIOS/BMC Package ( 02.01.0014 ) but no change. 

Well, i thought populating CPU2 and DIMM Channel 1/2 might change the game, but it doesn't. CPU2 and DIMM Channel 1/2 are recognized and run w/o issues.

I tried resettig CMOS ( by jumper on S2600STBR ) and also by BMC "Reset to factory" - no luck.

It's the same when resetting BMC ( via ipmitool mc reset cold ).

Any idea how to remedy that? If you need more information, just ask.

Just for completeness; power supply unit can deliver 750W, CPU1+2 and MB power are connected, the additonal 4-pin "12V aux power" is not. I hope, this is not the source of my trouble.

 

Thanks in advance, Ulrich 

0 Kudos
32 Replies
Paul_R_Intel
Moderator
891 Views

Hello UlrichP, 

 

Thank you for the patience and time,  we are investigating this with the engineering team. In the meantime, I'd like to confirm that you tried to swap the DIMMS on Channels 1/2 and same results, is that correct?


Regards,


Paul R. 

Intel Customer Support Technician 

For firmware updates and troubleshooting tips, visit: 

https://intel.com/support/serverbios 


UlrichP
Beginner
885 Views

Hi Paul,

 

well, actually not. As it is really difficult or even impossible to get memory modules on the market these days, that are listed in the HCL for S2600STBR, there was no chance to test it (yet).

So still waiting for the opportunity to grab modules from an existing (fully functional /error free) system. But this might take a while.

What i did was interchanging the 4 modules from CPU1 to CPU2 and vice versa - but no change at all. As CPU1 memory doesn't get a "bad press" by VRD hot  sensor, i thougth message would turn to reason "CPU1 - DIMM channel 1/2" - but nothing like this happened. It keeps saying reason is "CPU2 - DIMM channel 1/2". Like a fixed message.

 

Regards, Uli

   

Paul_R_Intel
Moderator
883 Views

Hello UlrichP,


Thank you for all the information provided.


Please allow us to review the details you have shared with us. We will share an update soon.


Regards, 

 

Paul R.  

Intel Customer Support Technician  

For firmware updates and troubleshooting tips, visit:  

https://intel.com/support/serverbios  




Paul_R_Intel
Moderator
873 Views

Hello UlrichP,


Thank you for your patience and time, we are still investigating, can you please provide the SysInfo logs for our investigation? Use the following tool to retrieve them:


https://www.intel.com/content/www/us/en/download/19033/system-information-retrieval-utility-sysinfo-...


Regards, 

 

Paul R.  

Intel Customer Support Technician  

For firmware updates and troubleshooting tips, visit:  

https://intel.com/support/serverbios  



UlrichP
Beginner
867 Views

Hi Paul,

of cause, please find the logs attached.

I used sysinfo tool version 15.0.3 and ran it from UEFI shell.

If you need something else or additional logs, just say a word.

 

Regards, Uli

 

Paul_R_Intel
Moderator
864 Views

Hello UlrichP,


Thank you for all the information provided.


Please allow us to review the details you have shared with us. We will share an update soon.


Regards, 

 

Paul R.  

Intel Customer Support Technician  

For firmware updates and troubleshooting tips, visit:  

https://intel.com/support/serverbios  


Paul_R_Intel
Moderator
863 Views

Hello UlrichP,


Thank you for your patience and time , after going over the latest Syslog info, we cannot see any issues with the temperature reported on the board. All levels of temperature are operating on the design specifications.


I would suggest as the last option, checking the Fan1 cables/connections and making sure all is properly working and no apparent heat.


Please let me know the outcome and if possible provide a new set of fresh logs with the alert.


Regards, 

 

Paul R.  

Intel Customer Support Technician  

For firmware updates and troubleshooting tips, visit:  

https://intel.com/support/serverbios  



UlrichP
Beginner
856 Views

Hi Paul,

 

i checked Fan 1 ( and all other 4 i.e. FAN1 .. FAN5 ) - all are fine ( do spin ) and are properly connected.

When updating SDR file all FANs are properly detected. So i just swapped cable connection of FAN1 and FAN2.

But it does not change the amber light... I also checked CPU Fans, replcaed both - no change as well.

 

One observation: CPU Fans run always at full speed ( noisy ! ) since i do have the new Intel case.

Prior to that they were running but with much lower rounds per minute (rpm).

I also had the system to re-detect all changes by setting BIOS Defaults (F9) and CMOS clear (Jumper) .

No change... The only thing: SELlog and sysinfo are dated from 01.01.2020 now - but they are from today and fresh.

( because NTP cleared after CMOS reset ).

I added a short video showing the green and amber LED blinking. You probably remember that i reported the green LED near BMC chip...

And once again a deguglog for your engineers. 

To be frankly: I've got the impression, that we are not even close to a solution, are we?

Please keep me posted about the outcome.  

 

Regards, Uli

Paul_R_Intel
Moderator
850 Views

Hello UlrichP,


Thank you for all the information provided.


Please allow us to review the details, I will keep you posted.


Regards, 

 

Paul R.  

Intel Customer Support Technician  

For firmware updates and troubleshooting tips, visit:  

https://intel.com/support/serverbios  




Paul_R_Intel
Moderator
843 Views

Hello UlrichP, 


Thank you very much for your patience and time, as per the revision of the logs in terms of temperature and fans RPMs we can say that there has to be a sensor failure since the server is not presenting any issues with this.


Therefore, we would like to replace the board, we will create a case internally in which I will send you an email requesting all the information needed to proceed.


Regards,


Paul R. 

Intel Customer Support Technician 

For firmware updates and troubleshooting tips, visit: 

https://intel.com/support/serverbios 


UlrichP
Beginner
838 Views

Hi Paul,

 

thanks a lot, i just replied to your mail re. Intel Customer Support - Case #: 05372027.

and provided the information you were asking for.

 

Regards,

Uli

UlrichP
Beginner
821 Views

Hi Paul,

 

just to let you know: yesterday i recieved my replacement board ( not a new one, but functional ).

What shall i say: everything is working - even in None-Intel-Case. Case solved and dismissed.

Thanks for your patience and support.

 

One more happy customer... 

Regards, Ulrich

Reply