Server Products
Data Center Products including boards, integrated systems, Intel® Xeon® Processors, RAID Storage, and Intel® Xeon® Processors
4761 Discussions

s2600cp BMC error help

DPoar
Beginner
3,857 Views

I picked up a s2600cp that I am trying to get running. I have got the BIOS updated to the latest version (02.06.0006) after stepping through many upgrades. I have 3 4pin fans plugged into system fan 1, 2 and 3. I ran frusdr and configured those fans, all other fans off, and marked no for chassis intrusion. Upon completing the config update the fans immediately spin down, but after the BMC reinitialized they ramp back up to full speed. I ran sysinfo and after digging through the log saw this:

Management Subsystem Health, BMC FW Health (# 0x10) Warning event: BMC FW Health reports the sensor has failed and may not be providing a valid reading.

It I dug deeper and found that it started popping up somewhere during the bios updates(The version that I started with was so low i stepped through maybe 12 to be safe). After seeing this I reflashed the BMC, and reflashed frusdr to no change.

I have found very little after scouring the net for this error. I have seen some people say this means the board is failing, and I have seen some people say it means its reading something wrong. Either way both pieces seem a bit vague.

I am fairly certain this is the cause of the full speed fans as the system status LED is flashing green and I believe a fault like this will trigger fans to run at max speed.

Also the system seems to be running fine other than this. Does anyone have any information that could shed some light on this? Thanks

0 Kudos
12 Replies
idata
Employee
1,768 Views

Hello drock,

 

 

Could you please tell what chassis manufacturer and model are you using with this board? Is it an Intel chassis or 3rd party chassis?

 

 

The fans issue could be related to an unsupported chassis.

 

 

This is a list of supported and tested chassis models and manufacturers that you could use as reference.

 

P4308CP4MHENS2600CP4 board in an Intel P4000 pedestal chassis, rackable P4308CP4MHGCS2600CP4 board in an Intel P4000 pedestal chassis, rackable P4208CP4MHGCS2600CP4 board in Intel P4000 pedestal chassis, rackableReference chassis list

 

The reference chassis list includes third-party chassis tested for the Intel® Server Board S2600CP Family. Chassis are tested to see if they provide adequate airflow to meet individual manufacturer temperature specifications. VendorModelChassis TypePower SupplyUnpackage Shock TestThermal Test LevelDriver SupportChenbro*SR112PedestalSinglePass with 25G1A and BChenbroSR105PedestalSinglePass with 25G2A and BChenbroRM137Rack/1USinglePass with 25G2A and BChenbroRM13604H01*13114Rack/1USingleN/A3A and BChenbroRM417Rack/4UH/SN/A1A and BChenbroRM13704-500CPRack/1USinglePass with 25G2A and BIn-win*PV689PedestalSinglePass with 25G1A and BIn-winPP689PedestalSinglePass with 25G2A and BTST*ESR316Rack/3UH/SPass with 25G1A and BASS*ST104A-HB-L-I2600Rack/1USinglePass with 25G1A and BCI-Design*NSR224Rack/2UH/SPass with 25G1A and B

 

Please reply back with your chassis model so we can continue with the troubleshooting process.

 

 

Hope this helps

 

 

Jose A.
0 Kudos
DPoar
Beginner
1,768 Views

Thanks for the reply. The chassis is a chenbro RM23212 2u rack. It has 3 80mm fans.

0 Kudos
idata
Employee
1,768 Views

Hello drock,

 

 

Thanks for the info.

 

 

Since the chassis is not within the tested list could you please attach a sysinfo log so we can try to determine if the issue is chassis sensor related or board sensor related. You can download the sysinfo utility on the following URL: https://downloadcenter.intel.com/download/26988/System-Event-Log-SEL-Viewer-Utility?product=61088 https://downloadcenter.intel.com/download/26988/System-Event-Log-SEL-Viewer-Utility?product=61088

 

 

Regards

 

 

Jose A.
0 Kudos
DPoar
Beginner
1,768 Views

Log is attached. I had some issues after the first bios update and and cleared the cmos, which is why the date jumps back to 2005 for a bit.

System fans 1, 2, 3 are connected.

0 Kudos
idata
Employee
1,768 Views

Hello drock,

 

 

Thanks for attaching the logs. I looked at them and found some errors from 2016. Is that the date on the server or are the errors back from a year ago?

 

 

The errors found say "PCI Express Receiver error" which might be related to a PCI riser card not correctly installed. Another error says "Mmry ECC Sensor reports uncorrectable error. There has been an uncorrectable ECC or other uncorrectable memory error for the memory module RANK_0, CPU_2, Channel = A, DIMM_1."

 

 

A third one says "SPS FW Health reports SPS Health event type FW status. Internal error. Operational image shall be updated or hardware board repair is needed(if error is persistent)" which seems to be related to a BIOS or board related error.

 

 

Let me know if you have corrected any of this errors like replacing memory or reseating PCI riser card

 

 

Regards

 

 

Jose A.
0 Kudos
idata
Employee
1,768 Views

Hello drock,

 

 

Do you have any updates, questions or comments in regards to this issue?

 

 

Please do not hesitate to contact us back.

 

 

If you consider the issue to be completed please let us know so we can proceed to mark this thread as resolved.

 

 

Regards

 

 

Jose A.
0 Kudos
DPoar
Beginner
1,768 Views

Hi Jose,

I apologize I have been busy, but wanted to go through the logs you provided. I did not have the board until 2017. It does not have any pcie riser cards in it, nor has it thrown that error since I have had the board. The same with the memory error. I did take out a stick that was causing a fus in h1/2 bank because someone through non matching sticks in it.

As far as the last error you mentioned "SPS FW Health reports SPS Health event type FW status...." it stopped throwing that after one of the bios updates. The dates jump to 2005 after I cleared the cmos on 11/25/17

0 Kudos
idata
Employee
1,768 Views

Hello drock,

 

 

So a couple more questions, the logs with 2005 date are actually newer? I was able to see some errors related to the flash device.

 

 

In this moment are you still getting the original "BMC FW Health reports the sensor has failed and may not be providing a valid reading." error message?

 

 

Regards

 

 

Jose A.
0 Kudos
DPoar
Beginner
1,768 Views

Correct, the logs from 2005 embedded in the later 2017 entries are newer. The entries are entered in the order they happened. The BMC error is still happening. I believe the flash error was a flash I plugged in that could be failing.

0 Kudos
idata
Employee
1,768 Views

Hello drock,

 

 

I think the original error message "BMC FW Health reports the sensor has failed and may not be providing a valid reading" is related to the fact that you integrated this server in a non validated/tested 3rd party chassis . The original chassis and the tested ones have sensors that interact with the BMC and of course it will throw an error if an specific sensor is not found.

 

 

I might recommend to do a FW update one more time and this time take your time to complete the chassis info once the flash is completed.

 

 

Let me know how it goes.

 

 

Jose A.
0 Kudos
idata
Employee
1,768 Views

Hello drock,

 

 

Do you have any updates, questions or comments in regards to this issue?

 

 

Please do not hesitate to contact us back.

 

 

If you consider the issue to be completed please let us know so we can proceed to mark this thread as resolved.

 

 

Regards

 

 

Jose A.
0 Kudos
idata
Employee
1,768 Views

Hello drock,

 

 

We will proceed to mark this thread as resolved. If you have further issues or questions just create a new topic.

 

 

Jose A.
0 Kudos
Reply