- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
We have 36 or so machines with the S2600TPR server board.
One of them keeps rebooting frequently, typically once a week.
There is a lot of messaging in the BMC but I suspect this is the reason for the reboot.
PECI over DMI interface error. This is a notification that PECI over DMI interface failure was detected and it is not functional any more. - DMI timeout of PECI request - Asserted
I have updated to latest available BIOS and disabled C-states in the firmware (just googling pointed to a Dell issue which was fixed by doing this) neither of which has made a difference.
The system is running fairly heavy virtual machines on ESXi 6.7
Is there anything we can do to diagnose this further?
Debug logs are attached.
This appears to have been an ongoing issue for years on this particular server but it is now increasing in frequency.
Thank you
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello timmjm,
Thank you for joining the Intel community
The suggestion for this errors is to update BIOS to the latest available, which you have already done. The PECI is a thermal management feature which might suggest a possible overheating issue on any of the CPUs. I think this could be a good starting point. Unfortunately I cannot check on the Debug logs as they are password protected, so if you could extract and attach the SEL logs will be a lot easier to me.
I will look forward to your updates
Let me know if this helps
Regards
Jose A.
Intel Customer Support Technician
For firmware updates and troubleshooting tips, visit:
https://intel.com/support/serverbios
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jose,
Thank you for the response.
We will investigate the overheating scenario and check the heatsinks on each CPU as you suggest.
Have attached the SEL files if it helps confirm any further diagnosis.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello timmjm,
I found no PECI errors in SEL log. Even though I found some temperature related errors like these couple ones:
EventID:0136 Time Stamp:06/07/2021 08:06:00 SensorName:BMC FW Health Sensor Type:Management Subsystem Health Description:'P1 Therm Ctrl %' sensor has failed and may not be providing a valid reading -Asserted
EventID:0160 Time Stamp:06/13/2021 10:47:57 SensorName:P1 Therm Ctrl % Sensor Type:Temperature Description:reports the sensor is high, critical, and going higher state -Asserted
Besides that I found some IERR errors that usually are related to memory
EventID:0148 Time Stamp:06/13/2021 10:44:41 SensorName:IERR Sensor Type:Processor Description:reports it has been asserted -Asserted
As suggested earlier I think that temperature checking would be a good start. But you can also check for memory like trying another good known ones just to discard.
I will look forward to your updates.
Jose A.
Intel Customer Support Technician
For firmware updates and troubleshooting tips, visit:
https://intel.com/support/serverbios
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello timmjm,
I am just following up to double-check if you found the provided information useful. If you have further questions please don't hesitate to ask. If you consider the issue to be completed please let us know so we can proceed to mark this ticket as resolved. I will try to reach you on next Tuesday 22nd. After that the thread will be archived automatically.
Regards
Jose A.
Intel Customer Support Technician
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello timmjm,
We will proceed to mark this thread as resolved. If you have further issues or questions just go ahead and submit a new topic.
Regards
Jose A.
Intel Customer Support Technician

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page