- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
On an Intel SKL platform, dual socket, 24 x 64GB, DDR4 2666 MHz (1.5TB in total) , we were running some memory related workload and seeing lot of DIMM ECC errors.
OS: RHEL 7.5
Apparently after lot of ECC, the DIMM encounters an UECC.
After decoding the mcelog, the location of the DIMM doesnt show up correctly.
Please see the o/p from MCElog.
CPU 26 BANK 8
MISC 200000c020001086 ADDR 1754d88ef40
MCG status:
MCi status:
Error overflow
Corrected error
Error enabled
MCi_MISC register valid
MCi_ADDR register valid
MCA: MEMORY CONTROLLER RD_CHANNEL0_ERR
Transaction: Memory read error
M2M: MscodDataRdErr
STATUS dc0000c001010090 MCGSTATUS 0
MCGCAP f000c14 APICID 40 SOCKETID 1
PPIN 1fc0448d77e07d88
CPUID Vendor Intel Family 6 Model 85
Fallback Socket memory error count 4 exceeded threshold: 26 in 24h
Location SOCKET:1 CHANNEL:? DIMM:? []
My question is, how or from where does mcedaemon get the channel and DIMM location? - is it ACPI ?
I decoded the MC_Status and figured out the IMC and channel info, but unable to decode the DIMM Rank (2 x 4R in case of 2DPC).
Link Copied
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page