Embedded Intel Atom® Processors
Technological Conversations about Intel Atom® Hardware, Software, Firmware, Graphics
Announcements
For support on Altera products please visit the Altera Community Forums.

E3845 Cache Errors

DarrenF
Beginner
456 Views

We have an embedded board with an E3845 processor on it that randomly issues a kernel panic and reboots. It's running RHEL 7.6 with microcode 0x90d. Each time it kernel panic and reboots it issues one or more Machine Check Exception, which I've decoded using the mcelog facility.

For example,

mce: [Hardware Error]: CPU 2: Machine Check Exception: 4 Bank 0: 9000000020000003
mce: [Hardware Error]: TSC 2b2522a1b4
mce: [Hardware Error]: PROCESSOR 0:30679 TIME 1388556142 SOCKET 0 APIC 4 microcode 90d

mce: [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank 2: b20000040002010a
mce: [Hardware Error]: TSC 2b2522a141
mce: [Hardware Error]: PROCESSOR 0:30679 TIME 1388556142 SOCKET 0 APIC 0 microcode 90d

mce: [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank 5: f40000400090000f
mce: [Hardware Error]: TSC 2b2522a1b4 ADDR 70f01c80
mce: [Hardware Error]: PROCESSOR 0:30679 TIME 1388556142 SOCKET 0 APIC 0 microcode 90d

mce: [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank 0: b600000013080810
mce: [Hardware Error]: TSC 2b2522a141 ADDR 70f01c80
mce: [Hardware Error]: PROCESSOR 0:30679 TIME 1388556142 SOCKET 0 APIC 0 microcode 90d

..which decodes to:

Hardware event. This is not a software error.
CPU 2 BANK 0 TSC 2b2522a1b4
TIME 1388556142 Wed Jan 1 00:02:22 2014
MCG status:MCIP
MCi status:
Corrected error
Error enabled
MCA: External error
STATUS 9000000020000003 MCGSTATUS 4
CPUID Vendor Intel Family 6 Model 55 Step 9
SOCKET 0 APIC 4 microcode 90d

--

Hardware event. This is not a software error.
CPU 0 BANK 2 TSC 2b2522a141
TIME 1388556142 Wed Jan 1 00:02:22 2014
MCG status:MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Generic CACHE Level-2 Generic Error
STATUS b20000040002010a MCGSTATUS 4
CPUID Vendor Intel Family 6 Model 55 Step 9
SOCKET 0 APIC 0 microcode 90d

--

Hardware event. This is not a software error.
CPU 0 BANK 5 TSC 2b2522a1b4
ADDR 70f01c80
TIME 1388556142 Wed Jan 1 00:02:22 2014
MCG status:MCIP
MCi status:
Error overflow
Uncorrected error
Error enabled
MCi_ADDR register valid
MCA: Level-3 Generic cache hierarchy error
STATUS f40000400090000f MCGSTATUS 4
CPUID Vendor Intel Family 6 Model 55 Step 9
SOCKET 0 APIC 0 microcode 90d

--

Hardware event. This is not a software error.
CPU 0 BANK 0 TSC 2b2522a141
ADDR 70f01c80
TIME 1388556142 Wed Jan 1 00:02:22 2014
MCG status:MCIP
MCi status:
Uncorrected error
Error enabled
MCi_ADDR register valid
Processor context corrupt
MCA: BUS error: -1 0 Level-0 Local-CPU-originated-request Read Memory-access Request-did-not-timeout
STATUS b600000013080810 MCGSTATUS 4
CPUID Vendor Intel Family 6 Model 55 Step 9
SOCKET 0 APIC 0 microcode 90d

The is a head-less system that is PXE booted so capturing this state is very difficult, especially since we don't know how to cause the failure. The card vendor has provided the latest version of the BIOS for this board and has been unable to identify a root cause.

Has anyone had an issue like this with this part? Is there a later microcode version that would address this issue? Thank you!

0 Kudos
2 Replies
Jaime_Lizarme
Moderator
339 Views

Hi  @DarrenF 

 

Thank you for contacting the Intel Embedded Community.

 

We are sorry to inform you that the CPU you are using (E3845) has reached the End Of Interactive Support; please contact your board manufacturer for support.

 

https://www.intel.com/content/www/us/en/products/sku/78475/intel-atom-processor-e3845-2m-cache-1-91-ghz/specifications.html

 

Best regards,

Jaime L.

 

0 Kudos
DarrenF
Beginner
317 Views

Thanks Jamie, but that response isn't very helpful. I'm aware the processors has reached end of support, that's why I'm reaching out to the community and not submitting a ticket to Intel support. I have been in contact with the board manufacturer and so far they have been unable to solve this problem for us.

Two questions I'm hoping I can get an answer to:

1) Is microcode 0x90d the latest microcode for this processor?

2) Do the L2/L3 cache errors decoded from the MCE provide any insight into what might be failing in the system?

 

0 Kudos
Reply