Server Products
Data Center Products including boards, integrated systems, Intel® Xeon® Processors, RAID Storage, and Intel® Xeon® Processors
4761 Discussions

PECI over DMI Interface Error

idata
Employee
12,716 Views

Hi All,

Does anybody have any information about this error?

We have 3x Brand new S2600WTTR Intel Servers and all 3 are producing this error every few weeks:

SPS FW Health reports SPS Health event type FW status. PECI over DMI interface error. Recovery via CPU Host reset or platform reset. DMI timeout of PECI request.

This is causing a hard reset on each of the systems.

If anyone has any more info on what would be causing this it would be greatly appreciated

0 Kudos
1 Solution
idata
Employee
9,170 Views

Hello Demandred,

 

 

It seems that the error comes from the Management Engine.

 

 

I was able to find more information about this in our http://www.intel.com/content/dam/support/us/en/documents/motherboards/server/sb/s2600v3_systemeventlog_troubleshootingguide_r1_0.pdf System Event Log Troubleshooting Guide.

 

 

Due to the nature of the issue, and the amount of systems impacted. I would recommend contacting our support group directly as they may request to have an entire log file to review this issue.

 

 

You may visit the following link for our support options.

 

 

http://www.intel.com/content/www/us/en/support/contact-support.html http://www.intel.com/content/www/us/en/support/contact-support.html

 

 

Best regards,

 

Dave A.

View solution in original post

7 Replies
idata
Employee
9,171 Views

Hello Demandred,

 

 

It seems that the error comes from the Management Engine.

 

 

I was able to find more information about this in our http://www.intel.com/content/dam/support/us/en/documents/motherboards/server/sb/s2600v3_systemeventlog_troubleshootingguide_r1_0.pdf System Event Log Troubleshooting Guide.

 

 

Due to the nature of the issue, and the amount of systems impacted. I would recommend contacting our support group directly as they may request to have an entire log file to review this issue.

 

 

You may visit the following link for our support options.

 

 

http://www.intel.com/content/www/us/en/support/contact-support.html http://www.intel.com/content/www/us/en/support/contact-support.html

 

 

Best regards,

 

Dave A.
CtHar
Novice
9,170 Views

Did you ever resolve this issue? We're getting the exact same symptoms on a Dell PowerEdge R530 with 2 Xeon E5-2603 v4 CPUs.

idata
Employee
9,170 Views

HelloColinTHart

As Dave A. stated: Due to the nature of the issue, and the number of systems impacted. I would recommend contacting our support group directly as they may request to have an entire log file to review this issue.

You could visit the following link for our support options.

http://www.intel.com/content/www/us/en/support/contact-support.html http://www.intel.com/content/www/us/en/support/contact-support.html

Best regards,

 

Caesar B.
CtHar
Novice
9,170 Views

I submitted a request to Intel but got the brush off saying it's a Dell system, contact them. To reiterate, we are getting the exact same symptoms on a Dell machine, so I'd really like to know of any remedial action taken. You can even send it to me in a private message if you don't want to share this information publicly.

Reading the Intel CPU errata, there are known issues with the CPUs we are using which Intel has marked as "No fix" so I'm primarily wondering if a CPU exchange is our best (only?) recourse.

Thanks,

Colin

Alibek
Beginner
9,170 Views

I catch same error:

527 | 07/05/2019 21:01:47 (UTC) | SPS FW Health | OEM Reserved |

PECI over DMI interface error. This is a notification that PECI over DMI interface failure was detected and it is not functional any more. - DMI timeout of PECI request - Asserted

 

And after system is reset

 

MB:

    Vendor: Intel Corporation

    Version: SE5C610.86B.01.01.0018.072020161249

    Release Date: 07/20/2016

    Product Name: S2600WTTR

    Version: G92187-366

 

CPUs: 2 x Intel(R) Xeon(R) CPU E5-2650 v4@ 2.20GHz (CPU family: 6, Model: 79)

 

Last normal reboot was: 07/04/2019 11:33:03

 

Beetween 07/04/2019 11:33:03 and 07/05/2019 21:01:47 - was no any load on the system

 

All messages from 07/04/2019 11:33:03:

536 07/05/2019 21:05:07 BIOS Evt Sensor System Event reports OEM System Boot Event - Asserted 535 07/05/2019 21:03:42 Physical Scrty Physical Security (Chassis Intrusion) reports LAN Leash has been lost - Deasserted 534 07/05/2019 21:03:34 Physical Scrty Physical Security (Chassis Intrusion) reports LAN Leash has been lost - Asserted 533 07/05/2019 21:03:17 IERR Processor reports it has been asserted - Deasserted 532 07/05/2019 21:03:17 Pwr Unit Status Power Unit reports the power unit is powered off or being powered down - Deasserted 531 07/05/2019 21:03:12 Pwr Unit Status Power Unit reports the power unit is powered off or being powered down - Asserted 530 07/05/2019 21:02:08 IERR Processor reports it has been asserted - Asserted 529 07/05/2019 21:02:06 BMC FW Health Management Subsystem Health 'DIMM Thrm Mrgn 2' sensor has failed and may not be providing a valid reading - Asserted 528 07/05/2019 21:02:06 BMC FW Health Management Subsystem Health 'DIMM Thrm Mrgn 1' sensor has failed and may not be providing a valid reading - Asserted 527 07/05/2019 21:01:47 SPS FW Health OEM Reserved PECI over DMI interface error. This is a notification that PECI over DMI interface failure was detected and it is not functional any more. - DMI timeout of PECI request - Asserted 526 07/04/2019 11:34:01 Physical Scrty Physical Security (Chassis Intrusion) reports LAN Leash has been lost - Deasserted 525 07/04/2019 11:33:48 Physical Scrty Physical Security (Chassis Intrusion) reports LAN Leash has been lost - Asserted 524 07/04/2019 11:33:03 BIOS Evt Sensor System Event reports OEM System Boot Event - Asserted

 

0 Kudos
CtHar
Novice
9,170 Views

We exchanged our Broadwell 2603v4 CPUs for slightly older Haswell 2620v3 versions and since then haven't had any more spontaneous reboots.

0 Kudos
ChrisDrake
Beginner
3,456 Views

The problem is that you have populated your memory wrongly.

 

I've confirmed this is definitely the cause after spending most of the day re-arranging my DIMMs.

 

I've tried combinations of 2x, 4x, 5x, and 6x DIMMs in a while pile of different DIMM sockets - most of those combinations cause the error, but if you use specific slots, the error never happens.

 

If you're using only 6 DIMMs, populate A1, A2, B1 and G1, G2, H1

 

Unfortunately, the manual is not very helpful - besides saying "furthest away from processor" it doesn't actually give any of the ordering rules, which obviously are important!

0 Kudos
Reply