Server Products
Data Center Products including boards, integrated systems, Intel® Xeon® Processors, RAID Storage, and Intel® Xeon® Processors
4784 Discussions

Random System Crashes M50CYP1UR204

BevanO
Beginner
1,287 Views

Hey guys,

 

I am currently experiencing random system crashes on 1 of my servers (I have 6 in total and only problematic on this 1 server).

 

When I experience a crash, there is no video output or anything. I am able to still access remote access (BMC) however it reports as the "Host Power Status: Host is currently OFF". The overall system health light is a solid green.

 

When I try and power the server on remotely, via BMC, it fails and the overall system health light changes to a solid red/amber colour.

 

Looking at event logs via BMC, this is logged at the time of the crash.

801 Sun Jul 10 05:59:53 2022 Pwr Unit Status BMC Informational Power Unit Power Off / Power Down - Asserted

 

When I try and remotely restart it, these events are logged

804 Sun Jul 10 16:33:10 2022 P2 Status BMC Critical Processor Thermal Trip - CPU boot FIVR fault - Asserted
803 Sun Jul 10 16:33:10 2022 P1 Status BMC Critical Processor Thermal Trip - CPU boot FIVR fault - Asserted
802 Sun Jul 10 16:33:09 2022 Pwr Unit Status BMC Critical Power Unit

Soft Power Control Failure - Asserted

 

When looking at the Sensor Readings, I see this

Critical Pwr Unit Status Power Off / Power Down
Soft Power Control Failure
0x8021
Critical P1 Status Thermal Trip
Processor Presence detected
0x8082
Critical P2 Status Thermal Trip
Processor Presence detected
0x8082

The only way for me to bring the server up, is a full power reset (unplug both power cables). The server may stay online from 2 hours till 24 hours, it varies, before it crashes again.

 

I have tried updating to latest BIOS: 01.01.0005 with no luck.

My system is currently running 2x Xeons 4309Y, 64GB RAM (4x16GB), 2x P4510 in RAID 0 via VROC.

OS is Ubuntu 22.04 LTS

 

Any help would be much appreciated.

 

Thanks

0 Kudos
3 Replies
JoseH_Intel
Moderator
1,270 Views

Hello BevanO,


Thank you for joining the Intel community


Please check the following article and try to follow the suggested steps when possible. If the issue persist after this we should be able to consider a CPU replacement under warranty

Error Message: IERR – Non-boot Core FIVR Fault – Asserted... (intel.com)


We will look forward to your updates.


Regards


Jose A.

Intel Customer Support Technician

For firmware updates and troubleshooting tips, visit:

https://intel.com/support/serverbios


0 Kudos
JoseH_Intel
Moderator
1,227 Views

Hello BevanO,


I am just following up to double-check if you found the provided information useful. If you have further questions please don't hesitate to ask. If you consider the issue to be completed please let us know so we can proceed to mark this thread as resolved. I will try to reach you as a very last time on next Monday 25th. After that the thread will be automatically archived.


Regards


Jose A.

Intel Customer Support Technician

For firmware updates and troubleshooting tips, visit:

https://intel.com/support/serverbios


0 Kudos
JoseH_Intel
Moderator
1,216 Views

Hello BevanO,


We will proceed to mark this thread as resolved. If you have further issues or questions just go ahead and submit a new topic.


Regards


Jose A.

Intel Customer Support Technician

For firmware updates and troubleshooting tips, visit:

https://intel.com/support/serverbios


0 Kudos
Reply