Server Products
Data Center Products including boards, integrated systems, Intel® Xeon® Processors, RAID Storage; and Intel® Xeon® Processors
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
4302 Discussions

Catastrophic Error - Modular Server

CJarb
Beginner
1,004 Views

We have MFS2600KI Compute Modules all running ESXi 5.5. There have been intermittent issues where a catastrophic error occurs and the blades reboot. It's completely random and can happen on any of the 6 blades. Here is an example of the error:

ID:2101Type:IPMIDetailed Description:A catastrophic error has occurred. The system has halted.Cause:An uncorrectable memory error is often the cause.Action:Check for other events that occurred near the same time which may help identify the cause or potential hardware failure.Extra Data:s:68:"Raw IPMI (hex): Gen:3000 Num:80 Type:07 EDir:83 ED1:a1 ED2:01 ED3:01";

The error indicates a possible memory issue but Intel support has been unable to identify the exact issue. We've replaced a module completely but others are still throwing these errors. Has anyone seen this before and know of a possible resolution?

0 Kudos
1 Reply
Salem_W_Intel1
Employee
85 Views

Hi!

A CATERR could actually refer to anything, hardware, software or firmware-wise. Because of this, I would highly recommend providing the http://www.intel.com/support/motherboards/server/sb/CS-032779.htm complete system diagnostics to our Support Team, with your given Case# ; thus, they can check into the logs and see what may be triggering this random symptom.

As an alternative hint, please, ensure the memory installed on your compute modules is among the officially tested http://www.intel.com/support/motherboards/server/MFS5520VI/sb/CS-030301.htm ones.

Reply