Software Archive
Read-only legacy content
17061 Обсуждение

mce: [Hardware Error]: Machine check events logged

JONG_L_
Новый участник I
20 917Просмотр.

Hello,

I have a custom board(RC10), which has E3845 and is similar to MinnowBoard MAX. I have customized from Intel Firmware Engine MinnowBoard MAX firmware to RC10 by enabling i2c-0, PCIe-2, etc. When the Linux system boots, it shows "mce: [Hardware Error]: Machine check events logged" 300 seconds after the boot.

1. Since the original configuration came from the MinnowBoard MAX, which uses E3825, the mce error might come from it. If yes, how can I change the processor to E3845.

2. Other than #1 I don't have any idea where the mce error came from. Is there any way to track it down by disabling HW components(e.g. PCIE-0)?

 

0 баллов
1 Решение
JONG_L_
Новый участник I
20 917Просмотр.

Hello Brian,

Here is the output of mcelog --client:

mcelog: failed to prefill DIMM database from DMI data

Kernel does not support page offline interface

mcelog: Family 6 Model 37 CPU: only decoding architectural errors

Hardware event. This is not a software error.

MCE 0

CPU 0 BANK 0

ADDR fef80000

TIME 978536917 Wed Jan  3 10:48:37 2001

MCG status:

MCi status:

Uncorrected error

MCi_ADDR register valid

Processor context corrupt

MCA: Internal unclassified error: 410

Running trigger `unknown-error-trigger'

STATUS a600000007600410 MCGSTATUS 0

MCGCAP 806 APICID 0 SOCKETID 0

CPUID Vendor Intel Family 6 Model 55

Просмотреть решение в исходном сообщении

15 Ответы
BrianRichardson
Сотрудник
20 917Просмотр.

We'd like to get the log of the machine check exception to figure out what's going on.

On Linux systems, you should be able to get this using mcelog - http://mcelog.org/

As an example you can install this on Ubuntu/Debian using apt-get:

sudo apt-get install mcelog

The events will be logged to /var/log/mcelog. You can also run:

sudo mcelog --client

to query the mcelog daemon for errors.

JONG_L_
Новый участник I
20 918Просмотр.

Hello Brian,

Here is the output of mcelog --client:

mcelog: failed to prefill DIMM database from DMI data

Kernel does not support page offline interface

mcelog: Family 6 Model 37 CPU: only decoding architectural errors

Hardware event. This is not a software error.

MCE 0

CPU 0 BANK 0

ADDR fef80000

TIME 978536917 Wed Jan  3 10:48:37 2001

MCG status:

MCi status:

Uncorrected error

MCi_ADDR register valid

Processor context corrupt

MCA: Internal unclassified error: 410

Running trigger `unknown-error-trigger'

STATUS a600000007600410 MCGSTATUS 0

MCGCAP 806 APICID 0 SOCKETID 0

CPUID Vendor Intel Family 6 Model 55

BrianRichardson
Сотрудник
20 917Просмотр.

Thanks Jong. We'll investigate this and let you know what we find.

BrianRichardson
Сотрудник
20 917Просмотр.

Jong: did you try to enable ECC memory on your board?

JONG_L_
Новый участник I
20 917Просмотр.

Hello Brain,

Unfortunately, we don't have ECC (E3845 - DRAM1_DQ[56..x] aka DRAM0_ECC_DQ[0..x]) in RC10 board design. We didn't think it was necessary.

Are you recommending to have ECC in RC10 board design? Do you think the MCE message come from memory?

 

BrianRichardson
Сотрудник
20 917Просмотр.

No, I don't recommend using ECC with this project. It wasn't a feature enabled on the MinnowBoard Max. I'm just trying to rule it out as a problem. Thanks for the information.

JONG_L_
Новый участник I
20 917Просмотр.

Hello Brian,

For your information I tired 0.84 firmware from https://firmware.intel.com/projects/minnowboard-max on both RC10 and MinnowBoard MAX. None of those had the mce error. I think the firmware built from the Intel Firmware Engine had some problem. What do you think?

BrianRichardson
Сотрудник
20 917Просмотр.

We're investigating the 0.84 codebase differences already. There may be some delay on our end due to the Christmas holiday, but I'll keep you posted. Thanks.

JONG_L_
Новый участник I
20 917Просмотр.

Hello,

Is there any update? Thank you.

Yi_Q_Intel
Сотрудник
20 917Просмотр.

what kind of linux did you try? yocto?

JONG_L_
Новый участник I
20 917Просмотр.

It's Debian 8 Jessie. As I mentioned previously, 0.84 firmware from https://firmware.intel.com/projects/minnowboard-max didn't have mce error with Debian 8.

Yi_Q_Intel
Сотрудник
20 917Просмотр.

I can reproduce it in ubuntu and yocto. After debugging, i found this machine check error actually happens during bios post. It is not a critical error, minor issue and happens only once. Will not impact later OS running. You can temporarily ignore it. Besides, the root cause has been found, we are gonna fix this bug in later release.

Jarlstrom_Intel
Сотрудник
20 917Просмотр.

Hi,

Is it possible to send you a release candidate to see if you see the issue again?

 

JONG_L_
Новый участник I
20 917Просмотр.

Hello Laurie,

Yes, I can try.

Jarlstrom_Intel
Сотрудник
20 917Просмотр.

Can you send an email to Firmware_Engine@intel.com so I can give instructions for downloading a pre-release for testing.

Ответить