Server Products
Data Center Products including boards, integrated systems, Intel® Xeon® Processors, RAID Storage, and Intel® Xeon® Processors
4784 Discussions

Server System SR2625URLXR Pwr Unit Status error

idata
Employee
3,390 Views

Hello, my Server System SR2625URLXR has worked about an year without any errors, but now it fails every week with error: "Pwr Unit Status reports the power unit has suffered a failure." After the fail server either restarts automatically or shuts down with BIOS beeps. Server is connected to two different UPS's, and it's power is stable.

In the BMC log I see follow messages:

Drv 5 Pres reports the device has been inserted or is present. (14:09:17)

PS1 Status reports the power supply's presence has been lost.

PS1 Status reports the power supply has recovered from the failure.

PS1 Status reports the power supply's input (AC/DC) has been regained.

Drv 4 Pres reports the device has been inserted or is present.

Drv 3 Pres reports the device has been inserted or is present.

Drv 2 Pres reports the device has been inserted or is present.

Drv 1 Pres reports the device has been inserted or is present.

BIOS Evt Sensor reports Timestamp Clock Sync. Event is first of two expected events from BIOS on every power on.

Pwr Unit Redund reports full redundancy has been lost.

Pwr Unit Redund reports Redundancy Lost: Entered any non-redundant state, including Nonredundant: Insufficient Resources .

Pwr Unit Redund reports Non-redundant: Sufficient Resources from Redundant, redundancy has been lost, but the unit is still functioning with the minimum amount of resources needed for normal operation.

Drv 0 Pres reports the device has been inserted or is present.

System Event reports a PEF action has occurred - alert.

PS1 Status reports the power supply has suffered a failure.

PS1 Status reports the power supply's input (AC/DC) has been lost.

BIOS Evt Sensor reports Timestamp Clock Sync. Event is second of two expected events from BIOS on every power on.

Pwr Unit Status reports that a soft power control failure has been regained.

Fan 1 Present reports the device has been inserted or is present.

Fan 6 Present reports the device has been inserted or is present.

Fan 2 Present reports the device has been inserted or is present.

Fan 3 Present reports the device has been inserted or is present.

Fan Redundancy reports Fully Redundant: Indicates that full redundancy has been regained .

Fan 5 Present reports the device has been inserted or is present.

Pwr Unit Status reports the power unit has been regained from a failure.

Pwr Unit Status reports the power unit is powered on.

Pwr Unit Status reports that a soft power control failure has been regained.

Pwr Unit Status reports the power unit has suffered a failure.

Pwr Unit Status reports the power unit is powered off or being powered down.

Button reports the power button has been pressed. (14:08:12) [<-- I pressed reset button]

Pwr Unit Status reports there has been a soft power control failure. (14:01:18)

Pwr Unit Status reports the power unit has been regained from a failure.

Pwr Unit Status reports the power unit is powered on.

Pwr Unit Status reports the power unit is powered off or being powered down.

Pwr Unit Status reports the power unit has suffered a failure. (14:01:05)

Yesterday I got a new error:

2012-12-10 10:38:10 Operating system bootup. Informational Open OEM Event

2012-12-10 10:38:10 OS Boot sensor 0 reports the boot from drive C has been completed. Informational Open OS Boot

2012-12-10 10:37:52 BIOS Evt Sensor reports a system boot event has occurred. Informational Open System Event

2012-12-10 10:36:22 BIOS Evt Sensor reports Timestamp Clock Sync. Event is first of two expected events from BIOS on every power on. Informational Open System Event

2012-12-10 10:33:02 Operating system bluescreen error. Informational Open OEM Event

2012-12-10 10:32:57 PCIe Fat Sensor reports a fatal PCI Express Completion Timeout error. CRITICAL Open* Critical Interrupt

2012-12-10 10:32:57 PCIe Cor Sensor reports a correctable PCI Express Link Bandwidth Changed error. Informational Open Critical Interrup

My question: Is this motherboard on power distribution board error?

0 Kudos
2 Replies
CToed
Beginner
2,219 Views

Hi,

please disconnect all UPS and watch the system.

Such issue often cause by UPS.

The second issue is a problem of mainboard or/and PCIe cards.

Which PCIe Cards are inserted?

0 Kudos
David_A_Intel
Moderator
2,219 Views

The issue could be related to the power supply or the power distribution board. I noticed, at least on the text you posted, the errors are related to PS1. I did not notice any errors for PS2.

If possible, try to swap around the power supplies to see if the issue follows the power supply or the slot it goes into. You may want to disable PCI AER Support in BIOS > Server Management to clear possible PCIe errors.

As usual, make sure your system is running at the latest https://downloadcenter.intel.com/download/24808/Intel-Server-Board-S5520UR-Firmware-Update-Package-for-EFI BIOS release 64.

0 Kudos
Reply