Server Products
Data Center Products including boards, integrated systems, Intel® Xeon® Processors, RAID Storage, and Intel® Xeon® Processors
4931 Discussions

Hi Team , I have server S2600 and i get the next message (Att). The Problem is in the M.B or CPU ? 10X Oded Davidson

ODavi2
Beginner
2,473 Views
 
0 Kudos
13 Replies
Emeth_O_Intel
Moderator
2,157 Views

Hi ODavi2,

 

Thank you for contacting us about this.

 

I would like to gather more information about your system.

 

A) Could you please so kind and let us know the specific model of your Intel Server Board, and the model of your processor as well.

 

B) On the other hand, please let us know which specific Event ID are you referring to?

Also, provide us the BMC Event Log file in order to have a better visibility of the critical or warning events of your system.

 

C) When did you notice the message? did you change an specific component? or did you apply an specific update that could cause the message?

 

Please let me know those details in order to proceed with the next step.

 

Regards,

 

Emeth O.

Intel Customer Support Technician

Under Contract to Intel Corporation

 

 

0 Kudos
ODavi2
Beginner
2,157 Views

Hi Emeth,

 

Thanks for your Help

 

I Collect some Info :

 

  1. I saw the error on my monitoring system (Intel Data Center Manager )

When I checked what is wrong with the machine, this is what a screen shows…

 

Log.jpgM.b.png

 

Cpu.png

  • Attached srv-04-098.zip with event logs.

I don’t have any specific ID to refer to, they all are bad and recurring all the time.

 

Thanks Oded

 

0 Kudos
Emeth_O_Intel
Moderator
2,157 Views

Hi ODavi2,

 

I was reviewing the BMCLog provided and I noticed the following Events:

 

EventID:0143 | Time Stamp: 07/17/2019 10:21:41

SensorName: BMC FW Health

Sensor Type: Management Subsystem Health

Description: 'P1 Therm Ctrl %' sensor has failed and may not be providing a valid reading –Asserted

 

These events normally occur due to failures of the thermal solution;

 

  1.  Verify heatsink is properly attached and has thermal grease.
  2. If the system has a heatsink fan, ensure the fan is spinning.
  3. Check all system fans are operating properly.
  4. Check that the air used to cool the system is within limits (typically 35 °C).

 

EventID:0141 Time | Stamp:07/17/2019 10:20:26             

SensorName:BMC FW Health        

Sensor Type:Management Subsystem Health              

Description:'DIMM Thrm Mrgn 1' sensor has failed and may not be providing a valid reading -Asserted

 

  1. Check for clear and unobstructed airflow into and out of the chassis.
  2. Ensure the SDR is programmed and correct chassis has been selected.
  3. Ensure there are no fan failures.
  4. Ensure the air used to cool the system is within the thermal specifications for the system (typically below 35 °C).

 

Now, hace you noticed random shut downs on this system? or any performance degradation recently?

 

On the other hand, which BIOS Version do you have running on this system? Could you please so kind and verify that it is running the latest version?

 

Please, verify the steps provided above and let me know the outcome and I will be more than happy to proceed with the next step.

 

Regards,

 

Emeth O.

Intel Customer Support Technician

Under Contract to Intel Corporation

0 Kudos
Emeth_O_Intel
Moderator
2,157 Views

Hi ODavi2,

 

I was checking on the SELtext and I see several processor internal errors (IERR) assertions so there has to be something going on with the processor(s) here.

 

It would be better if you provided us a sysinfo log here as it will provide us with more info about it.

 

On the other hand, as a recommendation I would like to suggest you to reseat the processor, swapping or testing one at a time in socket 1 in order to see the reaction of the system.

 

Please provide us the SysInfo Log in order to confirm the information.

 

Regards,

 

Emeth O.

Intel Customer Support Technician

Under Contract to Intel Corporation

 

 

0 Kudos
ODavi2
Beginner
2,157 Views

After swapping CPUs the problem still exist.

Syslog attached

 

.Zip

0 Kudos
Emeth_O_Intel
Moderator
2,157 Views

Hi,

 

Thank you for the information provided.

 

Nevertheless, please provide us the System Information Retrieval Utility (SysInfo) using the following tool:

User Guide for Intel System Information Retrieval Utility (Sysinfo)

 

https://downloadcenter.intel.com/download/28712/System-Information-Retrieval-Utility-SysInfo-?product=88289

 

We will be waiting the information in order to perform a deep analysis.

 

Regards,

 

Emeth O.

Intel Customer Support Technician

Under Contract to Intel Corporation

 

0 Kudos
Emeth_O_Intel
Moderator
2,157 Views

Hi,

 

I would like to verify if you could gather the Sysinfo log from your system, or if you still need help on this.

 

If so, please do not hesitate and let us know and we will be more than happy to assist you.

 

Regards,

 

Emeth O.

Intel Customer Support Technician

Under Contract to Intel Corporation

 

0 Kudos
ODavi2
Beginner
2,157 Views

Hi Emeth ,

 

I upload the logs

 

10X Oded Davidson

0 Kudos
Emeth_O_Intel
Moderator
2,157 Views

Hi,

 

Thank you for replying back with the information requested.

 

I will double check the logs and as soon as possible I will get back to you with the outcome.

 

Regards,

 

Emeth O.

Intel Customer Support Technician

Under Contract to Intel Corporation

0 Kudos
ODavi2
Beginner
2,157 Views

Hi Emeth ,

 

There is something new with this case?

 

10X Oded

 

0 Kudos
Emeth_O_Intel
Moderator
2,157 Views

Hi,

 

I was reviewing the logs provided and there appears to be an issue with CPU1 in the system.

 

An IERR entry reports an internal processor error since the processor contains the MCH (Memory Controller Hub), which could also explain the "BMC FW Health reports DIMM Thrm Mrgn 2 has failed and may not be providing a valid reading."

 

If you would like to proceed with the replacement of the processor let me know.

 

Regards,

 

Emeth O.

Intel Customer Support Technician

Under Contract to Intel Corporation

 

0 Kudos
ODavi2
Beginner
2,157 Views

Hi Emeth ,

 

Thanks so much for the in-depth analysis

I would like to continue the processor replacement cpu1.

 

Thanks Oded Davidson

0 Kudos
Emeth_O_Intel
Moderator
2,157 Views

Hi ODavi2,

 

Sure, no problem. Let me assist you with this.

I will proceed with the next step, and you will receive a private email about the next step and the information with the process.

 

Regards,

 

Emeth O.

Intel Customer Support Technician

Under Contract to Intel Corporation

0 Kudos
Reply