Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
122 Views

Mcelog in linux doesnt show the exact DIMM location for ECC/UECC error

Hi,

On an Intel SKL platform, dual socket, 24 x 64GB, DDR4 2666 MHz (1.5TB in total) , we were running some memory related workload and seeing lot of DIMM ECC errors. 

OS: RHEL 7.5 

Apparently after lot of ECC, the DIMM encounters an UECC.

After decoding the mcelog, the location of the DIMM doesnt show up correctly.

Please see the o/p from MCElog.

CPU 26 BANK 8 
MISC 200000c020001086 ADDR 1754d88ef40 
MCG status:
MCi status:
Error overflow
Corrected error
Error enabled
MCi_MISC register valid
MCi_ADDR register valid
MCA: MEMORY CONTROLLER RD_CHANNEL0_ERR
Transaction: Memory read error
M2M: MscodDataRdErr
STATUS dc0000c001010090 MCGSTATUS 0
MCGCAP f000c14 APICID 40 SOCKETID 1 
PPIN 1fc0448d77e07d88
CPUID Vendor Intel Family 6 Model 85
Fallback Socket memory error count 4 exceeded threshold: 26 in 24h
Location SOCKET:1 CHANNEL:? DIMM:? []

My question is, how or from where does mcedaemon get the channel and DIMM location? - is it ACPI ?

I decoded the MC_Status and figured out the IMC and channel info, but unable to decode the DIMM Rank (2 x 4R in case of 2DPC).

0 Kudos
0 Replies