<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Montor ECC memory status? in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/Montor-ECC-memory-status/m-p/1015001#M3879</link>
    <description>&lt;P&gt;Hello all:&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; I followed a post with the same title and I found Intel E7 Xeon CPU manual (datasheet actually). That URL address was listed below:&lt;/P&gt;

&lt;P&gt;&lt;A href="https://software.intel.com/en-us/forums/topic/393904" target="_blank"&gt;https://software.intel.com/en-us/forums/topic/393904&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Through that manual, there is a FSV event related to ECC memory status.&amp;nbsp; But what I need is the&lt;/P&gt;

&lt;P&gt;related registers when it comes to Intel E5 Xeon 2650. Anyone please help. I will appreciate.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Joe&lt;/P&gt;</description>
    <pubDate>Mon, 08 Dec 2014 09:16:05 GMT</pubDate>
    <dc:creator>Joe_H_1</dc:creator>
    <dc:date>2014-12-08T09:16:05Z</dc:date>
    <item>
      <title>Montor ECC memory status?</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Montor-ECC-memory-status/m-p/1015001#M3879</link>
      <description>&lt;P&gt;Hello all:&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; I followed a post with the same title and I found Intel E7 Xeon CPU manual (datasheet actually). That URL address was listed below:&lt;/P&gt;

&lt;P&gt;&lt;A href="https://software.intel.com/en-us/forums/topic/393904" target="_blank"&gt;https://software.intel.com/en-us/forums/topic/393904&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Through that manual, there is a FSV event related to ECC memory status.&amp;nbsp; But what I need is the&lt;/P&gt;

&lt;P&gt;related registers when it comes to Intel E5 Xeon 2650. Anyone please help. I will appreciate.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Joe&lt;/P&gt;</description>
      <pubDate>Mon, 08 Dec 2014 09:16:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Montor-ECC-memory-status/m-p/1015001#M3879</guid>
      <dc:creator>Joe_H_1</dc:creator>
      <dc:date>2014-12-08T09:16:05Z</dc:date>
    </item>
    <item>
      <title>Background: According to the</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Montor-ECC-memory-status/m-p/1015002#M3880</link>
      <description>&lt;P&gt;Background: According to the big study by Google (http://research.google.com/pubs/pub35162.html), only about 8% of DIMMs experience one or more errors per calendar year, but the DIMMs that have errors sometimes have them at fairly high rates.&amp;nbsp; The average was about 4000 per year (but with a very skewed distribution) -- so if you don't see any errors in a few days of operation, your DIMMs are probably OK.&lt;/P&gt;

&lt;P&gt;Procedure: Under Linux it is relatively easy to set up one of the Uncore iMC performance counters to count ECC_CORRECTABLE_ERRORS.&amp;nbsp;&amp;nbsp;&amp;nbsp; Since this event increments extremely infrequently on most systems you won't need to worry about the counter overflowing --- just set it up and then check it every week or so (provided that you are not using the iMC performance counters for anything else).&amp;nbsp;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;As an example for the Xeon E5-2650 the following code will set up iMC Counter 3 on each of the four channels on each socket to count correctable ECC errors.&amp;nbsp;&amp;nbsp; First you have to figure out which buses your system uses for the Uncore performance counters.&amp;nbsp; My systems use either 3f and 7f (for socket 0 and socket 1, respectively) or 7f and ff (for socket 0 and socket 1, respectively).&amp;nbsp;&amp;nbsp; The easiest way to find this is to run this simple command:&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;# lspci | grep :10.1
7f:10.1 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Channel 0-3 Thermal Control 1 (rev 07)
ff:10.1 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Channel 0-3 Thermal Control 1 (rev 07)
&lt;/PRE&gt;

&lt;P&gt;The text description will vary from system to system, but the first characters of each line show the buses that correspond to the two sockets.&amp;nbsp; The examples below will use bus 7f for socket 0 and bus ff for socket 1.&lt;/P&gt;

&lt;P&gt;First program the counters using this script.&amp;nbsp; (The EVENTCODE that is commented out can be used for testing the script, since it will increment on a correctly working system, while the ECC_CORRECTABLE_ERRORS will not increment unless there is really a problem.)&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;#!/bin/bash

# Program Counter 3 of each iMC on each chip to count ECC correctable errors

export EVENTCODE=0x00400009		# ECC_CORRECTABLE_ERRORS
#export EVENTCODE=0x00400006		# DRAM_PRE_ALL -- use as a test to make sure the counters are actually counting

export SETPCI=/sbin/setpci

# Step 1: disable each counter, then clear the count (lower &amp;amp; upper words)
echo "Disabling and clearing iMC Counter 3 of each channel on each processor"
for BUS in 7f ff
do
	for CHANNEL in 0 1 4 5
	do
		$SETPCI -s ${BUS}:10.${CHANNEL} e4.l=0x00
		$SETPCI -s ${BUS}:10.${CHANNEL} b8.l=0x00
		$SETPCI -s ${BUS}:10.${CHANNEL} bc.l=0x00
	done
done
# Step 2: enable the counter with the new event
echo "Programming iMC Counter 3 in each channel on each processor"
for BUS in 7f ff
do
	for CHANNEL in 0 1 4 5
	do
		$SETPCI -s ${BUS}:10.${CHANNEL} e4.l=$EVENTCODE
	done
done
&lt;/PRE&gt;

&lt;P&gt;Next you can read the counters with this script:&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;#!/bin/bash

export SETPCI=/sbin/setpci

echo "Reading iMC Counter 3 of each channel on each processor"
for BUS in 7f ff
do
	for CHANNEL in 0 1 4 5
	do
		echo -n "Bus $BUS Channel $CHANNEL Counter 3 low:  "
		$SETPCI -s ${BUS}:10.${CHANNEL} b8.l
		echo -n "Bus $BUS Channel $CHANNEL Counter 3 high: "
		$SETPCI -s ${BUS}:10.${CHANNEL} bc.l
	done
done
&lt;/PRE&gt;

&lt;P&gt;I tested this with the DRAM_PRE_ALL event on a Xeon E5-2680 system and the script appears to have set things up correctly.&amp;nbsp; It is running now with the ECC_CORRECTABLE_ERRORS event, but (no surprise) has not shown any events yet.&lt;/P&gt;</description>
      <pubDate>Tue, 09 Dec 2014 22:06:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Montor-ECC-memory-status/m-p/1015002#M3880</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2014-12-09T22:06:00Z</dc:date>
    </item>
  </channel>
</rss>

