Server Products
Data Center Products including boards, integrated systems, Intel® Xeon® Processors, RAID Storage, and Intel® Xeon® Processors
4785 Discussions

several H2224XXLR2 systems are throttling, due to one of the PDU's being reported as down, even though there is power on them.

JanKuipers
Beginner
5,337 Views

H2224XXLR2 , with S2600TPR board, with most current firmware package R01.01.0028 

 

The behaviour is identical to the one described in TA-1131 :

"Intel has received numerous reports from customers of unexpected and severe system CPU throttling on the identified products. The event is also coupled with a Power Supply Amber LED warning (1 Hz blink pattern) status generated by the event. A check of the System Event Log (SEL) shows a PSU Predictive Failure, an Over Temperature condition, or a SmART-CLST event."

 

Is there a more current PSU firmware version? Is this known behaviour? (as it apparantly was before) ?

 

-Jan

 

 

0 Kudos
63 Replies
JanKuipers
Beginner
1,160 Views

To recap my problem:

 

H2224XXLR2 system with redundant PSU has the issue as described in TA-1131

 

Observation on the system :

 

H2224XXLR2 system does -not- recognise PSU, which would, I suspect, cause it to -NOT- update any firmware on the PSU's , as described in TA-1131 ----> Is that correct ? Will the firmware update indeed not be applied ?

 

 

 

0 Kudos
SergioS_Intel
Moderator
1,160 Views

Hello JanKuipers,

 

We understand that the system is not recognizing the power supply, that is why we need to know the model and the manufacturer of your power supply, additionally, we need to know if the UPS provides a pure sinewave power source when in back up power.

 

Regards

 Sergio S.

Intel Customer Support

 

0 Kudos
JanKuipers
Beginner
1,160 Views

where "recognizing" related to its identification only, it does provide power to the chassis and is operational.

 

0 Kudos
Emeth_O_Intel
Moderator
1,160 Views

Hello JanKuipers,

 

Thank you for letting us know the information.

We are going to verify some details, as soon as possible we will be getting back to you in order to proceed with the next step.

 

Best regards,

 

Emeth O.

Intel Server Specialist.

0 Kudos
JanKuipers
Beginner
1,160 Views

The model the datacenter is  Piller UBR625

0 Kudos
SergioS_Intel
Moderator
1,160 Views

Hello JanKuipers,

 

Thank you for the information, we are going to check with our upper-level support and will get back to you.

 

Best regards,

 

Best regards,

Sergio S.

Intel Customer Support Technician

For firmware updates and troubleshooting tips, visit :https://intel.com/support/serverbios.

 

0 Kudos
SergioS_Intel
Moderator
1,160 Views

Hello JanKuipers,

 

Engineering went through the debug logs and think the SMART/CLST assertion/deassertion could be caused by insufficient input voltage, an overcurrent condition and/or over-temperature condition ending in system throttling.

 

They also recommend reinstalling the full system update package (BIOS/BMC/ME/FRUSDR) on the system with the PSU inserted in order to detect and validate the PSUs and chassis information.

 

Please collect the system debug logs after the system re-do the full update of R01.01.0028 package.

 

Finally, there are 3 chassis with the same issue, but we only have one sysinfo log and one system debug log. Could you please collect system debug logs and sysinfo logs from 3 S2600TP systems across 3 different chassis?

 

We will be looking forward to your response.

 

Best regards,

Sergio S.

0 Kudos
SergioS_Intel
Moderator
1,160 Views

Hello JanKuipers,

 

I am following your issue and we would like to know if you were able to perform the recommendations provided on the previous post.

 

Just to recap, our recommendations were to reinstall the full system update package (BIOS/BMC/ME/FRUSDR) on the system with the PSU inserted in order to detect and validate the PSUs and chassis information and collect the system debug logs after the system re-do the full update of R01.01.0028 package.

 

Finally, there are 3 chassis with the same issue, but we only have one sysinfo log and one system debug log. Could you please collect system debug logs and sysinfo logs from 3 S2600TP systems across 3 different chassis.

 

 

Best regards,

Sergio S.

Intel Customer Support Technician

 

0 Kudos
SergioS_Intel
Moderator
1,160 Views

Hello JanKuipers,

 

I am following your issue and we would like to know if you were able to perform the recommendations provided on the previous post.

  

Best regards,

Sergio S.

Intel Customer Support Technician

 

0 Kudos
SergioS_Intel
Moderator
1,160 Views

Hello JanKuipers,

 

We hope this message finds you safe. We are following your issue and we would like to know if you need further assistance.

  

Best regards,

Sergio S.

Intel Customer Support Technician

 

0 Kudos
SergioS_Intel
Moderator
1,160 Views

Hello JanKuipers,

 

We are following your issue and we would like to know if you need further assistance or if we can close this thread.

  

Best regards,

Sergio S.

Intel Customer Support Technician

 

0 Kudos
SergioS_Intel
Moderator
1,160 Views

Hello JanKuipers,

 

This message is to let you know that we are going to close this thread.

 

Just to recap, our recommendations were to reinstall the full system update package (BIOS/BMC/ME/FRUSDR) on the system with the PSU inserted in order to detect and validate the PSUs and chassis information and collect the system debug logs after the system re-do the full update of R01.01.0028 package.

 

Finally, there are 3 chassis with the same issue, but we only have one sysinfo log and one system debug log. Could you please collect system debug logs and sysinfo logs from 3 S2600TP systems across 3 different chassis.

 

In case you need further assistance please contact us back.

  

Best regards,

Sergio S.

Intel Customer Support Technician

 

0 Kudos
JanKuipers
Beginner
1,160 Views

On one system we have completely shut downed all nodes, unplugged PSU's and reinstalled the full update package, however without any change in behaviour.

 

The PSU's are still not recognized.

 

System debug logs and sysinfo logs will be uploaded shortly.

0 Kudos
SergioS_Intel
Moderator
1,160 Views

Hello JanKuipers,

 

We appreciate the update on this case. we will be waiting for the logs from you.

 

Best regards,

Sergio S.

Intel Customer Support Technician

 

 

0 Kudos
JanKuipers
Beginner
1,160 Views

Attached are the sysinfo and bmc debug logs for the 3 different chassis.

 

chassis1 has been completely shut down (power pulled) and upgraded with the R01.01.0028  update package (BIOS/BMC/ME/FRUSDR) with both of the PSU's inserted.

 

Looking forward to a solution,

Jan

 

0 Kudos
SergioS_Intel
Moderator
1,160 Views

Hello JanKuipers,

 

Thank you for the information, please allow us some time to check on the logs.

 

Best regards,

Sergio S.

Intel Customer Support Technician

 

0 Kudos
SergioS_Intel
Moderator
1,160 Views

Hello JanKuipers,

 

We would like to get some additional information from you:

 

Based on April 3 case notes, the customer has 3 H2224XXLR2 chassis, and all exhibit this behavior. Now you mentioned that 3 nodes with the issue. Please clarify:

 

1. Are the 3 nodes in 3 different H2224XXLR2 chassis? 

 

2. If yes, are there any other nodes in the 3 chassis? You know, up to 4 nodes can be installed in one chassis.

 

3. If there are other nodes installed in the 3 chassis and are not affected by this issue, can you please provide logs of the other nodes for comparison?

 

 

Best regards,

Sergio S.

Intel Customer Support Technician

 

0 Kudos
JanKuipers
Beginner
1,160 Views

There are  3 H2224XXLR2 chassis, which are all fully occupied, as such there are 12 nodes, if you will.

 

All chassis (and hence nodes) are affected in the same way by this issue.

 

 

0 Kudos
SergioS_Intel
Moderator
1,136 Views

Hello JanKuipers,

 

We appreciate the additional information, please allow us to check and will get back to you.

 

Regards

 Sergio S.

Intel Customer Support

 

0 Kudos
SergioS_Intel
Moderator
1,136 Views

Hello JanKuipers,

 

Please help us providing some additional information:

 

1- Can you please let us know if you performed both FRU and SDR in the update.

 

2- What is the current led status for each PSU? LED on the back of each PSU.

 

3- Could you please provide the four nodes system logs? In order to identify who is the master node could your customer run the IPMI command on each node and share the results:

 

ipmitool -H (BMC IP) -U (user) -P(password of user) raw 0x3E 0x50

 

4- Could you please share the image of the BMC web console system information > FRU information > PWR supply 1 FRU and PWR supply 2 FRU of the system Chassis1. Virtual009?

 

Additionally, to verify if the power supply firmware is the latest version updated by SUP package, run below commands on the power supplies of the 3 chassis and provide the output to us. So we can check if the latest firmware were installed on the power supplies.

 

Note: ipmitool is open source tool. You need to download the ipmitool from Internet and run below commands:

 

for PSU1, ipmitool raw 0x06 0x52 0x0f 0xb0 0x04 0xd9

 

for PSU2, ipmitool raw 0x06 0x52 0x0f 0xb2 0x04 0xd9

As PSU sensors is not present in log files, please run command "ipmitool fru" to see if the information is displayed for PSUs and provide command output to us 

 

 

Best regards,

Sergio S.

Intel Customer Support Technician

 

0 Kudos
SergioS_Intel
Moderator
1,136 Views

Hello JanKuipers,

 

We are following your issue and we would like to know if you were able to perform the recommendations that we provided you on the previous post.

 

Best regards,

Sergio S.

Intel Customer Support Technician

 

0 Kudos
Reply