Hello,
I have a custom board(RC10), which has E3845 and is similar to MinnowBoard MAX. I have customized from Intel Firmware Engine MinnowBoard MAX firmware to RC10 by enabling i2c-0, PCIe-2, etc. When the Linux system boots, it shows "mce: [Hardware Error]: Machine check events logged" 300 seconds after the boot.
1. Since the original configuration came from the MinnowBoard MAX, which uses E3825, the mce error might come from it. If yes, how can I change the processor to E3845.
2. Other than #1 I don't have any idea where the mce error came from. Is there any way to track it down by disabling HW components(e.g. PCIE-0)?
- 标记:
- Firmware
Hello Brian,
Here is the output of mcelog --client:
mcelog: failed to prefill DIMM database from DMI data
Kernel does not support page offline interface
mcelog: Family 6 Model 37 CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 0
ADDR fef80000
TIME 978536917 Wed Jan 3 10:48:37 2001
MCG status:
MCi status:
Uncorrected error
MCi_ADDR register valid
Processor context corrupt
MCA: Internal unclassified error: 410
Running trigger `unknown-error-trigger'
STATUS a600000007600410 MCGSTATUS 0
MCGCAP 806 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 55
链接已复制
We'd like to get the log of the machine check exception to figure out what's going on.
On Linux systems, you should be able to get this using mcelog - http://mcelog.org/
As an example you can install this on Ubuntu/Debian using apt-get:
sudo apt-get install mcelog
The events will be logged to /var/log/mcelog
. You can also run:
sudo mcelog --client
to query the mcelog
daemon for errors.
Hello Brian,
Here is the output of mcelog --client:
mcelog: failed to prefill DIMM database from DMI data
Kernel does not support page offline interface
mcelog: Family 6 Model 37 CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 0
ADDR fef80000
TIME 978536917 Wed Jan 3 10:48:37 2001
MCG status:
MCi status:
Uncorrected error
MCi_ADDR register valid
Processor context corrupt
MCA: Internal unclassified error: 410
Running trigger `unknown-error-trigger'
STATUS a600000007600410 MCGSTATUS 0
MCGCAP 806 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 55
Thanks Jong. We'll investigate this and let you know what we find.
Jong: did you try to enable ECC memory on your board?
Hello Brain,
Unfortunately, we don't have ECC (E3845 - DRAM1_DQ[56..x] aka DRAM0_ECC_DQ[0..x]) in RC10 board design. We didn't think it was necessary.
Are you recommending to have ECC in RC10 board design? Do you think the MCE message come from memory?
No, I don't recommend using ECC with this project. It wasn't a feature enabled on the MinnowBoard Max. I'm just trying to rule it out as a problem. Thanks for the information.
Hello Brian,
For your information I tired 0.84 firmware from https://firmware.intel.com/projects/minnowboard-max on both RC10 and MinnowBoard MAX. None of those had the mce error. I think the firmware built from the Intel Firmware Engine had some problem. What do you think?
We're investigating the 0.84 codebase differences already. There may be some delay on our end due to the Christmas holiday, but I'll keep you posted. Thanks.
what kind of linux did you try? yocto?
It's Debian 8 Jessie. As I mentioned previously, 0.84 firmware from https://firmware.intel.com/projects/minnowboard-max didn't have mce error with Debian 8.
I can reproduce it in ubuntu and yocto. After debugging, i found this machine check error actually happens during bios post. It is not a critical error, minor issue and happens only once. Will not impact later OS running. You can temporarily ignore it. Besides, the root cause has been found, we are gonna fix this bug in later release.
Hi,
Is it possible to send you a release candidate to see if you see the issue again?
Can you send an email to Firmware_Engine@intel.com so I can give instructions for downloading a pre-release for testing.
