- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Continuing from my earlier post at
It has been a while, but I now have an additional unit, this time with Intel certified memory, that is experiencing the same symptoms. Namely, at random intervals, ranging from 15 minutes to a few hours, the fans will spin up to full speed for a few seconds before returning to normal speed. I narrowed the cause down to the P1 Therm Margin sensor, which will occasionally be unreadable for a moment. See the attached video file to see this in action.
These servers are all running CentOS Linux.
# dmidecode 3.0
Getting SMBIOS data from sysfs.
SMBIOS 2.7 present.
64 structures occupying 3771 bytes.
Table at 0x80630000.
Handle 0x0001, DMI type 221, 12 bytes
OEM-specific Type
Header and Data:
DD 0C 01 00 01 01 00 04 01 00 08 00
Strings:
Reference Code - ACPI
Handle 0x0002, DMI type 133, 12 bytes
OEM-specific Type
Header and Data:
85 0C 02 00 00 50 F9 81 00 40 00 00
Handle 0x0006, DMI type 0, 24 bytes
BIOS Information
Vendor: Intel Corporation
Version: S1200SP.86B.03.01.0026.092720170729
Release Date: 09/27/2017
Address: 0xF0000
Runtime Size: 64 kB
ROM Size: 16384 kB
Characteristics:
PCI is supported
PNP is supported
BIOS is upgradeable
BIOS shadowing is allowed
Boot from CD is supported
Selectable boot is supported
EDD is supported
5.25"/1.2 MB floppy services are supported (int 13h)
3.5"/720 kB floppy services are supported (int 13h)
3.5"/2.88 MB floppy services are supported (int 13h)
Print screen service is supported (int 5h)
8042 keyboard services are supported (int 9h)
Serial services are supported (int 14h)
Printer services are supported (int 17h)
CGA/mono video services are supported (int 10h)
ACPI is supported
USB legacy is supported
LS-120 boot is supported
ATAPI Zip drive boot is supported
BIOS boot specification is supported
Function key-initiated network boot is supported
Targeted content distribution is supported
UEFI is supported
BIOS Revision: 0.0
Firmware Revision: 0.0
Handle 0x0007, DMI type 1, 27 bytes
System Information
Manufacturer: Intel Corporation
Product Name: S1200SP
Version: R1304SPOSHBNR
Serial Number: QSCD80400069
UUID: C81BB18B-33EF-E711-AB21-A4BF012C036E
Wake-up Type: Power Switch
SKU Number: SKU Number
Family: Family
Handle 0x0008, DMI type 2, 17 bytes
Base Board Information
Manufacturer: Intel Corporation
Product Name: S1200SP
Version: H57534-270
Serial Number: QSSA80100236
Asset Tag: Base Board Asset Tag
Features:
Board is a hosting board
Board is replaceable
Location In Chassis: Part Component
Chassis Handle: 0x0000
Type: Motherboard
Contained Object Handles: 0
Handle 0x0009, DMI type 3, 24 bytes
Chassis Information
Manufacturer: ...............................
Type: Rack Mount Chassis
Lock: Not Present
Version: ..................
Serial Number: ..................
Asset Tag: ....................
Boot-up State: Safe
Power Supply State: Safe
Thermal State: Safe
Security Status: None
OEM Information: 0x00000000
Height: Unspecified
Number Of Power Cords: Unspecified
Contained Elements: 0
SKU Number: Not Specified
Handle 0x000F, DMI type 11, 5 bytes
OEM Strings
String 1: To Be Filled By O.E.M.
Handle 0x0011, DMI type 13, 22 bytes
BIOS Language Information
Language Description Format: Long
Installable Languages: 1
en|US|iso8859-1
Currently Installed Language: en|US|iso8859-1
Handle 0x0012, DMI type 27, 15 bytes
Cooling Device
Temperature Probe Handle: 0x000B
Type: Fan
Status: OK
Cooling Unit Group: 1
OEM-specific Information: 0x00000000
Nominal Speed: Unknown Or Non-rotating
Description: Not Specified
Handle 0x0013, DMI type 28, 22 bytes
Temperature Probe
Description: LM78A
Location: System Management Module
Status:
Maximum Value: Unknown
Minimum Value: Unknown
Resolution: Unknown
Tolerance: Unknown
Accuracy: Unknown
OEM-specific Information: 0x00000000
Nominal Value: Unknown
Handle 0x0014, DMI type 32, 11 bytes
System Boot Information
Status: No errors detected
Handle 0x0015, DMI type 34, 11 bytes
Management Device
Description: UNKNOWN
Type: Unknown
Address: 0x00000000
Address Type: Unknown
Handle 0x0016, DMI type 35, 11 bytes
Management Device Component
Description: To Be Filled By O.E.M.
Management Device Handle: 0x000D
Component Handle: 0x000A
Threshold Handle: 0x000F
Handle 0x0017, DMI type 36, 16 bytes
Management Device Threshold Data
Handle 0x0018, DMI type 39, 22 bytes
System Power Supply
Power Unit Group: 1
Location: To Be Filled By O.E.M.
Name: To Be Filled By O.E.M.
Manufacturer: To Be Filled By O.E.M.
Serial Number: To Be Filled By O.E.M.
Asset Tag: To Be Filled By O.E.M.
Model Part Number: To Be Filled By O.E.M.
Revision: To Be Filled By O.E.M.
Max Power Cap...
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello JohnMcC,
I was reading your case and also the previous case already created and I would like to ask you some details about the server system that you are using in order to find the root of the issue that you have been facing. Please, provide us the following information:
A) When did the issue started? How many server systems are been affected for the same issue?
B) Did you change something recently in the hardware or apply any update that could affect the system?
C) Are you using a ECC Memory or Unbuffered Memory RAM?
D) Did you change the location of the fan header?
E) Have you ever tried to set the BIOS at defaults settings?
The server board provides five SSI-compliant 4-pin fans to use as CPU and I/O cooling fans. 3-pin fans are supported on all fan headers. The pin configuration for each of the 4-pin fan headers is identical and defined in the following information.
- One 4-pin fan header is designated as processor cooling fan: - CPU fan (J7K1)
- Three 4-pin fan headers are designated as system fans: - System fan 1 (J3K2) - System fan 2 (J8K2) - System fan 3 (J8K3)
- One 4-pin fan header is designated as a rear system fan: - System fan 4 (J8B1)
I will be waiting the outcome in order to proceed with the next step.
Best regards,
Emeth X
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the reply and apologies for my slow reply. Things have been busy here.
A&B) The issue started as soon as the system was completed. This is a new build, and even with the latest firmware, it has had this problem out of the gate. The three other units mentioned in the previous thread also had this problem from the beginning. As such, no changes to the hardware have been made. I would also like to note that these four units in question are also the only four units we have, so I have no units that do not exhibit this problem.
C) The memory installed in this unit is two KVR24E17D8/16I Kingston DDR4 Unbuffered ECC Intel Certified memory.
D) The fan headers have not been changed. As you can see in the video, system fans 1 through 3 all show proper function. They only spin up when P1 Therm Margin becomes unresponsive.
E) The BIOS was reset to defaults after flashing to the latest version and then set to the required UEFI settings for CentOS.
These are all 1U systems, so it has no dedicated CPU fan. Here are the current temperature readouts for the processor:
]# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1: +27.8°C (crit = +119.0°C)
temp2: +29.8°C (crit = +119.0°C)
coretemp-isa-0000
Adapter: ISA adapter
Physical id 0: +38.0°C (high = +80.0°C, crit = +100.0°C)
Core 0: +37.0°C (high = +80.0°C, crit = +100.0°C)
Core 1: +38.0°C (high = +80.0°C, crit = +100.0°C)
Core 2: +37.0°C (high = +80.0°C, crit = +100.0°C)
Core 3: +36.0°C (high = +80.0°C, crit = +100.0°C)
power_meter-acpi-0
Adapter: ACPI interface
power1: 27.00 W (interval = 1.00 s)
Since the fans only spin up to max for about five seconds, I doubt that the processor temperature is spiking to dangerous levels and then falling back to normal that quickly. System load on these are very steady in any event.
As the video shows, the fans coast at a normal ~4800 RPM until the P1 Therm Margin sensor drops, at which point they spin up to ~22000 RPM. Then once contact with P1 Therm Margin is re-established, the fans begin to return to their normal coasting speed. The question is what would cause P1 Therm Margin to be unresponsive for a second or two randomly every 15 minutes to few a hours. Since CentOS has no direct control over the fan speeds, the BMC must be what is spinning the fans up. Therefore, the BMC is losing contact with that sensor causing it to spin the fans up, so this likely a problem outside of the OS.
Thanks again for looking in to this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello JohnMcC,
No worries, thank you so much for replying back.
I was reviewing the information of the previous community case and also the information provided of the new community case and I saw caught my attention.
I would like to continue asking you some probing question in order to determine the root of the issue due to the fact that you mentioned this issue is not only happening with this server system if not with three more systems. Let me inform you that this issue most of the time should have been fixed updating the BIOS/Firmware version. However, you already performed it and the issue still persists. So, let me provide you some troubleshooting steps in order to make sure the information and reject non-related issues.
A) Could you please so kind and replace the redundant power supplies?
(In some cases fans can stay high if redundant power is lost.)
B) Do you already check that your server system has the BMC Version: 1.12.11072?
C) Did you performed the BIOS updated using the following version "Intel® Server Board S1200SP
[Intel® Xeon® Processor E3-1200 v6 only] BIOS and Firmware Update Package for EFI" Version: 03.01.1029?
(https://downloadcenter.intel.com/download/27520/?product=97952 https://downloadcenter.intel.com/download/27520/?product=97952)
D) Have you performed a BMC Force Update Jumper (J4B1)?
*In order to see where is located the Jumper (J4B1) please check the picture attached*
The BMC Force Update jumper is used to put the BMC in Boot Recovery mode for a low-level update.
It is used when the BMC has become corrupted and is non-functional, requiring a new BMC image to be
loaded on to the server board.
1. Turn off the system and remove power cords.
2. Move the BMC FRC UPDT Jumper from the default (pins 1 and 2) operating position to the Force
Update position (pins 2 and 3).
3. Re-attach system power cords.
4. Power on the system.
Note: System Fans will boost and the BIOS Error Manager should report an 84F3 error code
(Baseboard Management Controller in update mode).
5. Boot to the EFI shell and update the BMC firmware using BMC# .NSH (where # is the
version number of the BMC). ( https://downloadcenter.intel.com/download/27520/?product=97952)
6. When update has successfully completed, power off system.
7. Remove AC power cords.
8. Move BMC FRC UPDT jumper back to the default position.
9. Install AC power cords.
10. Power on system.
11. Boot to the EFI shell and update the FRU and SDR data using FRUSDR# .nsh (where # is
the version number of the FRUSDR package).
12. Reboot the system.
13. Configure desired BMC configuration settings.
I will be waiting the outcome of this in order to proceed with the next step if it will be necessary.
Best Regards,
Emeth X
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello JohnMcC,
I would like to know if you will need more assistance in this case in order to look up the best solution for you. I will be waiting the outcome, please do not hesitate and reply and I will be more than happy to assist you!
Best Regards,
Emeth X
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Emeth,
I have confirmed that the BMC software is only at version 1.10.10925. I will attempt an update, but it may take some time as I am heading out on vacation for the next week, and I do not have ready access to the server at this time. I will update you when I have performed the update and checked to see if the situation has been resolved.
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello JohnMcC,
No problem, take your time and if you have any other question please do not hesitate and let me know and I will be more than happy to help you. I hope the issue will be fixed with the update.
Have a wonderful day,
Best Regards,
Emeth X
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello JohnMcC,
I would like to make sure some details, if you have the possibility to provide us the sysinfo log of your server system will help us a lot in order to identify more information about this issue.
Also, please provide us the LED status of the motherboard when the fans are running full speed. Does it go to another state? If so, when fans do ramp down does the status LED goes back to a solid green state?
Best Regards,
Emeth X
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello JohnMcC,
I would like to see if you will need more assistance in this case. If so, please do not hesitate and let me know and I will be more than happy to assist you.
Best Regards,
Emeth X
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Emeth,
Just to update you, I have not had the chance to update the firmware yet. The server was taken away from me, and now I have to coordinate the update with a few people. I'll let you know once I have the update in place.
Thank you for keeping tabs on this.
John
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
No problem, if you have any other question just let us know and we will be more than happy to assist you.
Best Regards,
Emeth X.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Any outcome about this case?
I would like to know if the issue still persists or if it is everything fine now.
Best Regards,
Emeth X

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page