DRAM monitoring events for Haswell Core i7 CPU

Clémentine_M_ · ‎06-17-2015

Hi all,

I am looking at monitoring DRAM events on a Haswell CPU (Core i7-4790, Signature 06_3Ch). Also possibly on other processors such as Core i5 Sandy Bridge and Ivy Bridge CPUs. I already have experience monitoring some performance counters related to CBoxes via the MSRs, but none with PCI config space.

I saw some interesting events on the "Intel Xeon Processors E5 and E7 v3 Family Uncore Performance Monitoring" guide, such as "ACT_COUNT", "CAS_COUNT", and "DRAM_REFRESH".

I am now looking at the Software Developer Manual vol. 3 (Order Number: 325384-054US), where the events are documented for the non-Xeon processors.
There are some events that are interesting to me (and look similar as the Xeon ones), such as "UNC_DRAM_OPEN.CHi", "UNC_DRAM_READ_CAS.CHi", "UNC_DRAM_WRITE_CAS.CHi", and "UNC_DRAM_REFRESH.CHi".
However, I can only find them in Table 19-14 (Nehalem), and Table 19-16 (Westmere). It is not written that any of these tables applies to Sandy Bridge, Ivy Bridge or Haswell micro-architecture on their respective sections.

I have the following questions:
1. Can I monitor these DRAM-related events on non-Xeon Sandy Bridge, Ivy Bridge or Haswell processors?
2. If yes, did I miss some documentation?
3. If no, would there be related events that might be of interest?

Thank you,
Clémentine

McCalpinJohn · ‎06-17-2015

There are memory controller performance counters on the Sandy Bridge Xeon E3-12xx series processors, but I can't find the Intel article describing them right this minute and I am having trouble remembering the differences between these counters and the similar memory controller counters on Xeon Phi. Maybe after another cup of coffee....

I have used the counter on a Xeon E3-1270 (Sandy Bridge) and they appear to be accurate. They are very likely the same on the Ivy Bridge based Xeon E3-1200 and Core i3/5/7 processors, but I have not looked to see if they are the same on Haswell-based processors.

These counters might be documented in the Intel PCM source code, or perhaps in one of the Intel VTune configuration files.

If I recall correctly, one inconvenient feature is that these are 32-bit counters, so they overflow rather quickly. E.g., at the peak DRAM transfer rate of 25.6 GB/s (2 channels of DDR3/1600), you are accessing 0.4 billion cache lines per second. This does a full loop through a 32-bit counter in 10.74 seconds, so you need to read the counters at least this often in order to unambiguously identify and correct for counter wrapping.

Clémentine_M_ · ‎06-18-2015

Thank you very much for your response!

I just found an article that describes such events for Core processors: https://software.intel.com/en-us/articles/monitoring-integrated-memory-controller-requests-in-the-2nd-3rd-and-4th-generation-intel

The real difference between the counters I found in this article, and the ones that are described in the SDM vol. 3 for Nehalem processors + the Xeon uncore performance monitoring manuals, are that these counters are aggregated for all channels (the counters all say "sum of all channels").

I am however searching for the fine-grained counters per channel that are described in the other manuals.

Do you know if the counters you are referring to for the Xeon E3-1270 were one of these aggregated counters?

I also looked at the PCM programs (v2.7) on my Core i5 Ivy Bridge, on Linux.
- When using pcm-memory.x, I have this error message:
"Detected Intel(R) Core(TM) i5-3340M CPU @ 2.70GHz "Intel(r) microarchitecture codename Ivy Bridge"
Jaketown, Ivytown or Haswell Server CPU is required for this tool!"
- When using pcm.x, I have "N/A" on the "READ" and "WRITE" columns.

I am not sure if this is related to the performance counters/events I am trying to monitor.

Concerning the possible overflow, it is a good thing to know, but it won't be a problem in my case, as what I am searching to monitor will be below this threshold of 10 seconds.

Cheers,
Clémentine

McCalpinJohn · ‎06-18-2015

Thanks for finding that reference -- that is indeed the article that I used to implement my code that read the DRAM counters on the Xeon E3-1270 (Sandy Bridge) system. Using the STREAM benchmark to provide a known number of DRAM accesses, I tested the DRAM_DATA_READS and DRAM_DATA_WRITES counters and obtained results that appeared accurate -- the number of reads and writes was typically between 1% and 3% higher than the values I expected. These are typical overheads and are due to some combination of TLB traffic and the extra reads and writes associated with the OS instantiating and zero-filling the pages when I first access them.

These counters are in a memory-mapped IO region that is not trivially accessible. On my system I ran as root so I could open /dev/mem and use an mmap() call to get a pointer to the BAR mentioned in the article. I treated the mmap'd region as an array of 32-bit unsigned integers and simply loaded the DRAM_DATA_READS value at Byte offset 0x5050 as array element 0x5050 / 4 = 0x1414 = 5140 (decimal), and the DRAM_DATA_WRITES value as the next element of the array (0x5054 / 4 = 0x1415 = 5141 (decimal).

Fortunately no writing is required to program these counters --- writing directly to physical memory addresses via /dev/mem is not a very safe thing to do! (But it is necessary to program the memory controller performance counters on Xeon Phi, which are also located in a general memory-mapped IO BAR. I tested that program many times using only read operations before I was willing to try it with writes.)