Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
1,758 Views

memory thrm related warnings/errors on vmware.

Добрый день !

У меня подобная проблема которая тут обсуждалась. Ответа на вопрос я не нашёл и поэтому повторяю :

Remote Management Module key : Installed

Device (BMC) Available : Yes

BMC FW Build Time : 2018-06-07 11:48:53

BIOS ID : SE5C610.86B.01.01.0027.071020182329

BMC FW Rev : 1.53.11210

Boot FW Rev : 1.07

SDR Package Version : SDR Package 1.17

Mgmt Engine (ME) FW Rev : 03.01.03.050

Baseboard Serial Number : BQTP61800321

Сенсоры в BMC WEB console :

VRD Hot All deasserted OK 0x0000

DIMM Thrm Mrgn 1 Normal OK -48 degrees C

DIMM Thrm Mrgn 2 Normal Unknown Not Available

DIMM Thrm Mrgn 3 Normal OK -51 degrees C

DIMM Thrm Mrgn 4 Normal Unknown Not Available

Agg Therm Mgn 1 Normal OK -7 degrees C

Постоянно, приблизительно каждые 5 минут в сфере vmware выходит сообщение о температуре на DIMM. Помогите от этого избавиться.

for english :

Good afternoon !

 

 

At me a similar problem which here was discussed. I did not find the answer to the question and therefore I repeat:

 

 

Remote Management Module key: Installed

 

Device (BMC) Available: Yes

 

BMC FW Build Time: 2018-06-07 11:48:53

 

BIOS ID: SE5C610.86B.01.01.0027.071020182329

 

BMC FW Rev: 1.53.11210

 

Boot FW Rev: 1.07

 

SDR Package Version: SDR Package 1.17

 

Mgmt Engine (ME) FW Rev: 03.01.03.050

 

Baseboard Serial Number: BQTP61800321

 

 

Sensors:

 

 

VRD Hot All deasserted OK 0x0000

 

DIMM Thrm Mrgn 1 Normal OK -48 degrees C

 

DIMM Thrm Mrgn 2 Normal Unknown Not Available

 

DIMM Thrm Mrgn 3 Normal OK -51 degrees C

 

DIMM Thrm Mrgn 4 Normal Unknown Not Available

 

Agg Therm Mgn 1 Normal OK -7 degrees C

 

 

Constantly, approximately every 5 minutes in the vmware sphere there is a message about the temperature on the DIMM.

 

help please get rid of it.

0 Kudos
27 Replies
Highlighted
Community Manager
84 Views

Hello rs-link,

 

 

Hope you are doing fine.

 

 

In order to better assist you can you confirm the motherboard model?

 

 

Regards,

 

 

Charlie_Intel

 

0 Kudos
Highlighted
Beginner
84 Views

Processed Tags

FRU & SDR Update Package for Intel(R) Server Board S2600TPx (S

2600TP_117)

Copyright (c) 2017 Intel Corporation

Intel(R) Server Board S2600TPR detected

Auto-detecting chassis model and attached hardware.

This may take up to 1 minute to complete.

NODE Presence is present

Detected Hardware

**************************************************************

******************

Intel(R) Server Chassis H2000G Product Family

Intel(R) Server Board S2600TPR

Intel(R) Xeon(R) Processor E5-2600 in socket 1

Intel(R) Xeon(R) Processor E5-2600 in socket 2

Baseboard FRU Device

1600 Watt Power Supply Module 1

Power Supply Module 1 FRU Device

1600 Watt Power Supply Module 2

Power Supply Module 2 FRU Device

Redundant Power Supply Configuration

HSBP is a 2U 12 slot 3.5 inch HDD Backplane

IO Module FRU Storage Device

IO Module Temperature Device

0 Kudos
Highlighted
Community Manager
84 Views

Hello rs-link,

 

 

Are able to restart the server?

 

Can you perform a CMOS clear and system event log clear just for testing purposes?

 

 

Regards,

 

 

Charlie_Intel

 

 

0 Kudos
Highlighted
Beginner
84 Views

Good afternoon !

 

 

Yes of course I will. Tomorrow I'll be in the server room, I'll write.

Thank you.

0 Kudos
Highlighted
Community Manager
84 Views

Hello rs-link,

 

 

Thank you, let us know how it goes.

 

 

Regards,

 

 

Charlie_Intel
0 Kudos
Highlighted
Beginner
84 Views

After completing your instructions, the day passed - the flight was normal. Sphere does not give out more warnings about the memory temperature. Tomorrow I'll cut the computational modules into combat mode - let's see.

 

 

Many thanks for the advice.

 

 

With the best wishes.
0 Kudos
Highlighted
Beginner
84 Views

Good afternoon !

 

Unfortunately the messages have again climbed. Confuses this:

DIMM Thrm Mrgn2 and Mrgn4

0 Kudos
Highlighted
Beginner
84 Views

In addition to the complete information, a message from vsphere:

0 Kudos
Highlighted
Community Manager
84 Views

Hello rs-link,

 

 

Is there a way that you can share the sysinfo logs from the server with us so we can perform a deep research about it?

 

With the last image what we can see is a failing sensor, but we would like to sure about it.

 

 

Regards,

 

 

Charlie_Intel

 

 

0 Kudos
Highlighted
Beginner
84 Views

Good evening !

 

I attach the report generated for developers in the form of an archive file with logs. I hope that helps.

 

 

Regards
0 Kudos
Highlighted
Community Manager
84 Views

Hello rs-link,

 

 

Thanks for the file, unfortunately, the file is encrypted and we were not able to open it.

 

Also, the one that we require is the sysinfo log, you can pull it out by following this link: https://www.intel.com/content/www/us/en/support/articles/000023940/server-products/server-boards.htm...

 

Please let us know if you need further assistance to download the logs.

 

 

Regards,

 

 

Charlie_Intel
0 Kudos
Highlighted
Beginner
84 Views

The prose of life is that I do not encrypt this file, but your hardware. Anyway. Attached files with logs taken by sysinfo. I really hope that helps. Look forward to.

 

In advance, I'm sorry for my English.

 

Regards
0 Kudos
Highlighted
Community Manager
84 Views

Hello rs-link,

 

 

Do not worry.

 

Thank you for the file, are you able to switch memories 1 and 2, clear the Cmos again and then send us the new logs?

 

 

Regards,

 

 

Charlie_Intel
0 Kudos
Highlighted
Beginner
84 Views

Good afternoon !

 

 

The memory is replaced in the following order

 

DIMM A1 <---> F1

 

DIMM B1 <---> E1

 

 

Cross crosswise.

 

 

CMOS reset

 

 

Logs are attached.

 

 

Thank you in advance
0 Kudos
Highlighted
Community Manager
84 Views

Hello rs-link,

 

 

Thank you for the information, let us do a quick research about it and we will contact you back.

 

 

Regards,

 

 

Charlie V.
0 Kudos
Highlighted
Beginner
84 Views

Good afternoon !

 

 

How's it going with my problem? Do not forget about me?
0 Kudos
Highlighted
Community Manager
84 Views

 

Hello rs-link,

 

 

Please do not worry we are just waiting for a second level response and we are going to get back to you as soon as possible. Thank you.

 

 

Regards,

 

 

Charlie_Intel

 

0 Kudos
Highlighted
Community Manager
84 Views

Hello rs-link,

 

 

Thank you for your patience, these values are not memory temps but actual margins or thresholds.

 

What I did notice in the log is processor IERR. This is an internal processor error and 2 mins after that I see warning events for DIMM Thrm Mrgn failures.

 

 

Could be related to processor since processors include the memory controller in them.

 

 

Now, the entries are from 09/02 and I don't see them happening anymore.

 

 

Could you confirm if the issues are still showing?

 

I would like you to check the newer logs, is that possible?

 

 

Regards,

 

 

Charlie_Intel
0 Kudos
Highlighted
Community Manager
84 Views

Hello rs-link,

 

 

We are wondering if you need further assistance?

 

 

Regards,

 

 

Charlie_Intel
0 Kudos