Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
New Contributor I
2,354 Views

mce: [Hardware Error]: Machine check events logged

Jump to solution

Hello,

I have a custom board(RC10), which has E3845 and is similar to MinnowBoard MAX. I have customized from Intel Firmware Engine MinnowBoard MAX firmware to RC10 by enabling i2c-0, PCIe-2, etc. When the Linux system boots, it shows "mce: [Hardware Error]: Machine check events logged" 300 seconds after the boot.

1. Since the original configuration came from the MinnowBoard MAX, which uses E3825, the mce error might come from it. If yes, how can I change the processor to E3845.

2. Other than #1 I don't have any idea where the mce error came from. Is there any way to track it down by disabling HW components(e.g. PCIE-0)?

 

Tags (1)
0 Kudos

Accepted Solutions
Highlighted
New Contributor I
2,354 Views

Hello Brian,

Here is the output of mcelog --client:

mcelog: failed to prefill DIMM database from DMI data

Kernel does not support page offline interface

mcelog: Family 6 Model 37 CPU: only decoding architectural errors

Hardware event. This is not a software error.

MCE 0

CPU 0 BANK 0

ADDR fef80000

TIME 978536917 Wed Jan  3 10:48:37 2001

MCG status:

MCi status:

Uncorrected error

MCi_ADDR register valid

Processor context corrupt

MCA: Internal unclassified error: 410

Running trigger `unknown-error-trigger'

STATUS a600000007600410 MCGSTATUS 0

MCGCAP 806 APICID 0 SOCKETID 0

CPUID Vendor Intel Family 6 Model 55

View solution in original post

0 Kudos
15 Replies
Highlighted
Employee
2,354 Views

We'd like to get the log of the machine check exception to figure out what's going on.

On Linux systems, you should be able to get this using mcelog - http://mcelog.org/

As an example you can install this on Ubuntu/Debian using apt-get:

sudo apt-get install mcelog

The events will be logged to /var/log/mcelog. You can also run:

sudo mcelog --client

to query the mcelog daemon for errors.

0 Kudos
Highlighted
New Contributor I
2,355 Views

Hello Brian,

Here is the output of mcelog --client:

mcelog: failed to prefill DIMM database from DMI data

Kernel does not support page offline interface

mcelog: Family 6 Model 37 CPU: only decoding architectural errors

Hardware event. This is not a software error.

MCE 0

CPU 0 BANK 0

ADDR fef80000

TIME 978536917 Wed Jan  3 10:48:37 2001

MCG status:

MCi status:

Uncorrected error

MCi_ADDR register valid

Processor context corrupt

MCA: Internal unclassified error: 410

Running trigger `unknown-error-trigger'

STATUS a600000007600410 MCGSTATUS 0

MCGCAP 806 APICID 0 SOCKETID 0

CPUID Vendor Intel Family 6 Model 55

View solution in original post

0 Kudos
Highlighted
Employee
2,354 Views

Thanks Jong. We'll investigate this and let you know what we find.

0 Kudos
Highlighted
Employee
2,354 Views

Jong: did you try to enable ECC memory on your board?

0 Kudos
Highlighted
New Contributor I
2,354 Views

Hello Brain,

Unfortunately, we don't have ECC (E3845 - DRAM1_DQ[56..x] aka DRAM0_ECC_DQ[0..x]) in RC10 board design. We didn't think it was necessary.

Are you recommending to have ECC in RC10 board design? Do you think the MCE message come from memory?

 

0 Kudos
Highlighted
Employee
2,354 Views

No, I don't recommend using ECC with this project. It wasn't a feature enabled on the MinnowBoard Max. I'm just trying to rule it out as a problem. Thanks for the information.

0 Kudos
Highlighted
New Contributor I
2,354 Views

Hello Brian,

For your information I tired 0.84 firmware from https://firmware.intel.com/projects/minnowboard-max on both RC10 and MinnowBoard MAX. None of those had the mce error. I think the firmware built from the Intel Firmware Engine had some problem. What do you think?

0 Kudos
Highlighted
Employee
2,354 Views

We're investigating the 0.84 codebase differences already. There may be some delay on our end due to the Christmas holiday, but I'll keep you posted. Thanks.

0 Kudos
Highlighted
New Contributor I
2,354 Views

Hello,

Is there any update? Thank you.

0 Kudos
Highlighted
Employee
2,354 Views

what kind of linux did you try? yocto?

0 Kudos
Highlighted
New Contributor I
2,354 Views

It's Debian 8 Jessie. As I mentioned previously, 0.84 firmware from https://firmware.intel.com/projects/minnowboard-max didn't have mce error with Debian 8.

0 Kudos
Highlighted
Employee
2,354 Views

I can reproduce it in ubuntu and yocto. After debugging, i found this machine check error actually happens during bios post. It is not a critical error, minor issue and happens only once. Will not impact later OS running. You can temporarily ignore it. Besides, the root cause has been found, we are gonna fix this bug in later release.

0 Kudos
Highlighted
2,354 Views

Hi,

Is it possible to send you a release candidate to see if you see the issue again?

 

0 Kudos
Highlighted
New Contributor I
2,354 Views

Hello Laurie,

Yes, I can try.

0 Kudos
Highlighted
2,354 Views

Can you send an email to Firmware_Engine@intel.com so I can give instructions for downloading a pre-release for testing.

0 Kudos