Server Products
Data Center Products including boards, integrated systems, Intel® Xeon® Processors, RAID Storage, and Intel® Xeon® Processors
4761 Discussions

Very noisy fan 3 on S2600CW

idata
Employee
1,807 Views

When the server is idle all the temperatures on my server are below 45 C. 4 fans are throttled down to 800 RPM, but fan 3 is stuck at 8500 RPM and makes a very high noise. I've already updated BIOS/ME/FRU/SDR and the script correctly detects the chassis (redundant, non-HSBP), but the middle fan remains stuck. The PWM offset is set to 0, the fan profile is Acoustic and the CPU Power and Performance Policy is Balanced Power.

Is there anything I can do to improve this?

0 Kudos
10 Replies
David_A_Intel
Moderator
847 Views

You have covered most troubleshooting tips we can recommend. You may probably want to enable the option to Clear Event Logs in BIOS > Server Management to refresh the BMC sensors.

Additionally, make sure you perform a full power cycle (by disconnecting power cords for about 30 seconds) to see if this helps.

If possible feel free to include the exact model of the chassis you are using. I would also recommend swapping SysFan3 into a different location to see if the issue follows the fan or the header on the board.

idata
Employee
847 Views

Thanks! I had already power cycled the machine a lot, I now tried Clear Event Logs and it didn't help.

The issue follows the header (I swapped fans 3 and 4, and fan 3 remains at 8000 RPM). Based on http://www.intel.com/support/motherboards/server/s2600cw/sb/CS-034913.htm Intel® Server Board S2600CW Family — Chassis Compatibility, the chassis should be a P4304XXMUXX. The server is a demo system from Intel so it does not show the chassis model name in the DMI data.

0 Kudos
DSilv11
Valued Contributor III
847 Views

The Server board fan are controlled by the SDRs (Sensor Data Records).

They are part of the flash update package found here: https://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=24375 https://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=24375

When using an Intel chassis, the flash utility probes the hardware reading things like front panel, PSUs & HSBP type to determine which specific chassis you have so it can load the correct values. The utility does assume the fans are connected to the mother board correctly.

Since different customer use different hardware configurations, the FRUSDR package needs to be loaded when you assemble a new server.

First thing i would recommend is to down load the firmware package and flash the system. That fixes about 90% of all fan issues assuming the fans are connected correctly and working.

2nd up is read the SEL rather than clearing it. The SELview tool will display the SEL log in the OS or EFI https://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=24719 Get it here Intel® Download Center

or you can read the SEL in the BMC's embedded web server. This needs the BMC to be enabled in BIOS set-up with a password and user id. Connection to a network and a second system with a web browser to log into the CW system.

3rd The CW system uses 5 cooling domains which happen to match the system fans.

The fan domain structure allows the BMC to only ramp specific fans needed to cool a specific area.

FAN 3 / domain 3 is the central domain along with fan 2 & fan 4.

All 3 of these fans should behave the same.

Hmmmmm,

ah ha!!

I just open the FW update files and see something in the S2600CW.SDR that does not look right.

Almost at the end of the file is an entry for P4000 Redundant Fan SKU Chassis, Domain 2, Domain3, Domain4, All Profiles

which is incorrectly formatted and since Domain 3 is were you are having issue, it is pretty suspect.

Just we need to get this fixed!

0 Kudos
idata
Employee
847 Views

1st step done already. 2nd step does not show anything incorrect.

3rd step... Great, if there's a new .SDR file that I can test on the machine, I can do that! Note that I'm having a problem with domain 2 (fan 3).

I took a look and noticed that the .SDR file version 1.05 has LF line endings instead of CR+LF in that section and in another one. I changed it to CR+LF and re-updated the FRU/SDR, but it didn't fix the problem.

I found a few problems in the .SDR file comments. For example, non-redundant SKUs have sensor C8h (Agg Thrm Mgn 1), whlie redundant SKUs have sensor C9h (Agg Thrm Mgn 2). However, the comments always mention Aggregate Thermal Margin 1 even when the record (correctly) refers to sensor C9h. But everything I found was only in the comments.

0 Kudos
DSilv11
Valued Contributor III
847 Views

That should have fixed it. In fact, it should not be an issue except to us humans who like CR to keep things neat.

There is a new System update package you can try https://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=24732 https://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=24732

If that does not fix it, run the system info tool https://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=24718 https://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=24718

and post the results so we can see what is getting loaded for SDRs

0 Kudos
DSilv11
Valued Contributor III
847 Views

Just checking. You said the FRUSDR did detect the chassis, so it did not ask you what fans are connected,correct?

0 Kudos
idata
Employee
847 Views

Right, FRUSDR detects the chassis right. I'll try the new firmware later this week.

0 Kudos
idata
Employee
847 Views
0 Kudos
DSilv11
Valued Contributor III
847 Views

The file looks great, except for fan 3 running at max speed,

Fans 3,4& 5 have a common thermal driver so all these fans should ramp together,

Pretty much comes down to hardware (fan driver) or maybe something really odd in the BMC.

You could try a BMC restore defaults (button in the web browse or using the syscfg tool syscfg -rbfd (i think-- you may need to check the syscfg -?)

High odds are a problem on the mother board.

0 Kudos
idata
Employee
847 Views

Yeah, at some point it looks very much like a stuck PWM controller, if something like that can exist at all. I've already done a BMC restore using a jumper on the motherboard.

I'll engage customer support to have the motherboard replaced, thanks!

0 Kudos
Reply