Community
cancel
Showing results for 
Search instead for 
Did you mean: 
ddbug1
Beginner
152 Views

Montor ECC memory status?

Hello there, Does anyone know a tool for monitoring number of errors detected by ECC memory/controller? Thanks, -- dd
0 Kudos
10 Replies
Bernard
Black Belt
152 Views

HP Integrated Lights-Out can report ECC memory errors. Link to HP whitepaper :http://h20000.www2.hp.com/bc/docs/support/SupportManual/c02878598/c02878598.pdf
ddbug1
Beginner
152 Views

Thank you. I wanted to get an ECC enabled machine to see how often DRAM errors occur in my environment. (Interesting: you cannot understand whether you need ECC, unless you already have it?) But, after reading the xeon-e5-2600-uncore-guide, this HP paper and MS WHEA docum, the whole ECC topic looks too intimidating. I'll surrender for now... - dd
Roman_D_Intel
Employee
152 Views

Hi dd,

Please look at this manual for Intel Xeon E7 processors. FVC events can be configured to count memory ECC errors (see page 2-126 for example). They can also count corrected/uncorrected memory request responses.

Best regards,

Roman

Bernard
Black Belt
152 Views

Low level details of hardware and/or its programming interface are not an easy thing to grasp very quickly:)

ddbug1
Beginner
152 Views

Thanks guys. I see your point, Ilya...  There's an anecdote about senior and junior toilet cleaners... ;)

My goal is to measure how often RAM errors occur on my machines and whether I want ECC.

But the DRAM controller of Xeons (and the ECC RAM itself of course) looks much more complex than on "normal" non-ECC mobos, there are more parts that may fail.  Do you think that measurement of RAM errors rate on ECC enabled machine can be extrapolated to a simpler non-ECC sandy/ivy bridge system?

Building the PCM to get the counters is not a problem.

Regards,

-- dd

Bernard
Black Belt
152 Views

Does PCM measure ECC errors?

Patrick_F_Intel1
Employee
152 Views

Hello ddbug,

So... is ECC worth the extra money... that is a good question.

My first response is, how much does it matter whether you can catch memory errors?

If you are doing something where you don't mind rebooting then you probably don't need ECC memory.

For mission critical applications where you absolutely need to know whether there are memory issues (yes, DIMMs do go bad) then ECC is a requirement. This is why servers always have ECC support.

I think you can monitor ECC errors on windows in the system event log in the event viewer (eventvwr.msc).

Pat

ddbug1
Beginner
152 Views

> Does PCM measure ECC errors? I have not checked this yet. Even if not, the docum explains how to get these counters. > So... is ECC worth the extra money... that is a good question. The ECC RAM modules cost not much more, it is a whole new machine of a higher class that is expensive... Finally we've got approval for a Dell server. The exact model and h/w details not known yet. thanks, -- dd
Bernard
Black Belt
152 Views

>>>I think you can monitor ECC errors on windows in the system event log in the event viewer (eventvwr.msc).>>>

This is implemented by WHEA architecture.

SergeyKostrov
Valued Contributor II
152 Views

>>But, after reading the xeon-e5-2600-uncore-guide, this HP paper and MS WHEA docum, the whole ECC topic >>looks too intimidating. I'll surrender for now... In 2012 I saw some Intel equipment and I remember it allowed to simulate some memory errors for server platforms. Honestly, I didn't dare to ask how much it is...
Reply