Continuing from my earlier post at
It has been a while, but I now have an additional unit, this time with Intel certified memory, that is experiencing the same symptoms. Namely, at random intervals, ranging from 15 minutes to a few hours, the fans will spin up to full speed for a few seconds before returning to normal speed. I narrowed the cause down to the P1 Therm Margin sensor, which will occasionally be unreadable for a moment. See the attached video file to see this in action.
These servers are all running CentOS Linux.
# dmidecode 3.0
Getting SMBIOS data from sysfs.
SMBIOS 2.7 present.
64 structures occupying 3771 bytes.
Table at 0x80630000.
Handle 0x0001, DMI type 221, 12 bytes
Header and Data:
DD 0C 01 00 01 01 00 04 01 00 08 00
Reference Code - ACPI
Handle 0x0002, DMI type 133, 12 bytes
Header and Data:
85 0C 02 00 00 50 F9 81 00 40 00 00
Handle 0x0006, DMI type 0, 24 bytes
Vendor: Intel Corporation
Release Date: 09/27/2017
Runtime Size: 64 kB
ROM Size: 16384 kB
PCI is supported
PNP is supported
BIOS is upgradeable
BIOS shadowing is allowed
Boot from CD is supported
Selectable boot is supported
EDD is supported
5.25"/1.2 MB floppy services are supported (int 13h)
3.5"/720 kB floppy services are supported (int 13h)
3.5"/2.88 MB floppy services are supported (int 13h)
Print screen service is supported (int 5h)
8042 keyboard services are supported (int 9h)
Serial services are supported (int 14h)
Printer services are supported (int 17h)
CGA/mono video services are supported (int 10h)
ACPI is supported
USB legacy is supported
LS-120 boot is supported
ATAPI Zip drive boot is supported
BIOS boot specification is supported
Function key-initiated network boot is supported
Targeted content distribution is supported
UEFI is supported
BIOS Revision: 0.0
Firmware Revision: 0.0
Handle 0x0007, DMI type 1, 27 bytes
Manufacturer: Intel Corporation
Product Name: S1200SP
Serial Number: QSCD80400069
Wake-up Type: Power Switch
SKU Number: SKU Number
Handle 0x0008, DMI type 2, 17 bytes
Base Board Information
Manufacturer: Intel Corporation
Product Name: S1200SP
Serial Number: QSSA80100236
Asset Tag: Base Board Asset Tag
Board is a hosting board
Board is replaceable
Location In Chassis: Part Component
Chassis Handle: 0x0000
Contained Object Handles: 0
Handle 0x0009, DMI type 3, 24 bytes
Type: Rack Mount Chassis
Lock: Not Present
Serial Number: ..................
Asset Tag: ....................
Boot-up State: Safe
Power Supply State: Safe
Thermal State: Safe
Security Status: None
OEM Information: 0x00000000
Number Of Power Cords: Unspecified
Contained Elements: 0
SKU Number: Not Specified
Handle 0x000F, DMI type 11, 5 bytes
String 1: To Be Filled By O.E.M.
Handle 0x0011, DMI type 13, 22 bytes
BIOS Language Information
Language Description Format: Long
Installable Languages: 1
Currently Installed Language: en|US|iso8859-1
Handle 0x0012, DMI type 27, 15 bytes
Temperature Probe Handle: 0x000B
Cooling Unit Group: 1
OEM-specific Information: 0x00000000
Nominal Speed: Unknown Or Non-rotating
Description: Not Specified
Handle 0x0013, DMI type 28, 22 bytes
Location: System Management Module
Maximum Value: Unknown
Minimum Value: Unknown
OEM-specific Information: 0x00000000
Nominal Value: Unknown
Handle 0x0014, DMI type 32, 11 bytes
System Boot Information
Status: No errors detected
Handle 0x0015, DMI type 34, 11 bytes
Address Type: Unknown
Handle 0x0016, DMI type 35, 11 bytes
Management Device Component
Description: To Be Filled By O.E.M.
Management Device Handle: 0x000D
Component Handle: 0x000A
Threshold Handle: 0x000F
Handle 0x0017, DMI type 36, 16 bytes
Management Device Threshold Data
Handle 0x0018, DMI type 39, 22 bytes
System Power Supply
Power Unit Group: 1
Location: To Be Filled By O.E.M.
Name: To Be Filled By O.E.M.
Manufacturer: To Be Filled By O.E.M.
Serial Number: To Be Filled By O.E.M.
Asset Tag: To Be Filled By O.E.M.
Model Part Number: To Be Filled By O.E.M.
Revision: To Be Filled By O.E.M.
Max Power Cap...
I was reading your case and also the previous case already created and I would like to ask you some details about the server system that you are using in order to find the root of the issue that you have been facing. Please, provide us the following information:
A) When did the issue started? How many server systems are been affected for the same issue?
B) Did you change something recently in the hardware or apply any update that could affect the system?
C) Are you using a ECC Memory or Unbuffered Memory RAM?
D) Did you change the location of the fan header?
E) Have you ever tried to set the BIOS at defaults settings?
The server board provides five SSI-compliant 4-pin fans to use as CPU and I/O cooling fans. 3-pin fans are supported on all fan headers. The pin configuration for each of the 4-pin fan headers is identical and defined in the following information.
I will be waiting the outcome in order to proceed with the next step.
Thank you for the reply and apologies for my slow reply. Things have been busy here.
A&B) The issue started as soon as the system was completed. This is a new build, and even with the latest firmware, it has had this problem out of the gate. The three other units mentioned in the previous thread also had this problem from the beginning. As such, no changes to the hardware have been made. I would also like to note that these four units in question are also the only four units we have, so I have no units that do not exhibit this problem.
C) The memory installed in this unit is two KVR24E17D8/16I Kingston DDR4 Unbuffered ECC Intel Certified memory.
D) The fan headers have not been changed. As you can see in the video, system fans 1 through 3 all show proper function. They only spin up when P1 Therm Margin becomes unresponsive.
E) The BIOS was reset to defaults after flashing to the latest version and then set to the required UEFI settings for CentOS.
These are all 1U systems, so it has no dedicated CPU fan. Here are the current temperature readouts for the processor:
Adapter: Virtual device
temp1: +27.8°C (crit = +119.0°C)
temp2: +29.8°C (crit = +119.0°C)
Adapter: ISA adapter
Physical id 0: +38.0°C (high = +80.0°C, crit = +100.0°C)
Core 0: +37.0°C (high = +80.0°C, crit = +100.0°C)
Core 1: +38.0°C (high = +80.0°C, crit = +100.0°C)
Core 2: +37.0°C (high = +80.0°C, crit = +100.0°C)
Core 3: +36.0°C (high = +80.0°C, crit = +100.0°C)
Adapter: ACPI interface
power1: 27.00 W (interval = 1.00 s)
Since the fans only spin up to max for about five seconds, I doubt that the processor temperature is spiking to dangerous levels and then falling back to normal that quickly. System load on these are very steady in any event.
As the video shows, the fans coast at a normal ~4800 RPM until the P1 Therm Margin sensor drops, at which point they spin up to ~22000 RPM. Then once contact with P1 Therm Margin is re-established, the fans begin to return to their normal coasting speed. The question is what would cause P1 Therm Margin to be unresponsive for a second or two randomly every 15 minutes to few a hours. Since CentOS has no direct control over the fan speeds, the BMC must be what is spinning the fans up. Therefore, the BMC is losing contact with that sensor causing it to spin the fans up, so this likely a problem outside of the OS.
Thanks again for looking in to this.
No worries, thank you so much for replying back.
I was reviewing the information of the previous community case and also the information provided of the new community case and I saw caught my attention.
I would like to continue asking you some probing question in order to determine the root of the issue due to the fact that you mentioned this issue is not only happening with this server system if not with three more systems. Let me inform you that this issue most of the time should have been fixed updating the BIOS/Firmware version. However, you already performed it and the issue still persists. So, let me provide you some troubleshooting steps in order to make sure the information and reject non-related issues.
A) Could you please so kind and replace the redundant power supplies?
(In some cases fans can stay high if redundant power is lost.)
B) Do you already check that your server system has the BMC Version: 1.12.11072?
C) Did you performed the BIOS updated using the following version "Intel® Server Board S1200SP
[Intel® Xeon® Processor E3-1200 v6 only] BIOS and Firmware Update Package for EFI" Version: 03.01.1029?
D) Have you performed a BMC Force Update Jumper (J4B1)?
*In order to see where is located the Jumper (J4B1) please check the picture attached*
The BMC Force Update jumper is used to put the BMC in Boot Recovery mode for a low-level update.
It is used when the BMC has become corrupted and is non-functional, requiring a new BMC image to be
loaded on to the server board.
1. Turn off the system and remove power cords.
2. Move the BMC FRC UPDT Jumper from the default (pins 1 and 2) operating position to the Force
Update position (pins 2 and 3).
3. Re-attach system power cords.
4. Power on the system.
Note: System Fans will boost and the BIOS Error Manager should report an 84F3 error code
(Baseboard Management Controller in update mode).
5. Boot to the EFI shell and update the BMC firmware using BMC# .NSH (where # is the
version number of the BMC). ( https://downloadcenter.intel.com/download/27520/?product=97952)
6. When update has successfully completed, power off system.
7. Remove AC power cords.
8. Move BMC FRC UPDT jumper back to the default position.
9. Install AC power cords.
10. Power on system.
11. Boot to the EFI shell and update the FRU and SDR data using FRUSDR# .nsh (where # is
the version number of the FRUSDR package).
12. Reboot the system.
13. Configure desired BMC configuration settings.
I will be waiting the outcome of this in order to proceed with the next step if it will be necessary.
I would like to know if you will need more assistance in this case in order to look up the best solution for you. I will be waiting the outcome, please do not hesitate and reply and I will be more than happy to assist you!
I have confirmed that the BMC software is only at version 1.10.10925. I will attempt an update, but it may take some time as I am heading out on vacation for the next week, and I do not have ready access to the server at this time. I will update you when I have performed the update and checked to see if the situation has been resolved.
No problem, take your time and if you have any other question please do not hesitate and let me know and I will be more than happy to help you. I hope the issue will be fixed with the update.
Have a wonderful day,
I would like to make sure some details, if you have the possibility to provide us the sysinfo log of your server system will help us a lot in order to identify more information about this issue.
Also, please provide us the LED status of the motherboard when the fans are running full speed. Does it go to another state? If so, when fans do ramp down does the status LED goes back to a solid green state?
I would like to see if you will need more assistance in this case. If so, please do not hesitate and let me know and I will be more than happy to assist you.
Just to update you, I have not had the chance to update the firmware yet. The server was taken away from me, and now I have to coordinate the update with a few people. I'll let you know once I have the update in place.
Thank you for keeping tabs on this.