I am looking at monitoring DRAM events on a Haswell CPU (Core i7-4790, Signature 06_3Ch). Also possibly on other processors such as Core i5 Sandy Bridge and Ivy Bridge CPUs. I already have experience monitoring some performance counters related to CBoxes via the MSRs, but none with PCI config space.
I saw some interesting events on the "Intel Xeon Processors E5 and E7 v3 Family Uncore Performance Monitoring" guide, such as "ACT_COUNT", "CAS_COUNT", and "DRAM_REFRESH".
I am now looking at the Software Developer Manual vol. 3 (Order Number: 325384-054US), where the events are documented for the non-Xeon processors.
There are some events that are interesting to me (and look similar as the Xeon ones), such as "UNC_DRAM_OPEN.CHi", "UNC_DRAM_READ_CAS.CHi", "UNC_DRAM_WRITE_CAS.CHi", and "UNC_DRAM_REFRESH.CHi".
However, I can only find them in Table 19-14 (Nehalem), and Table 19-16 (Westmere). It is not written that any of these tables applies to Sandy Bridge, Ivy Bridge or Haswell micro-architecture on their respective sections.
I have the following questions:
1. Can I monitor these DRAM-related events on non-Xeon Sandy Bridge, Ivy Bridge or Haswell processors?
2. If yes, did I miss some documentation?
3. If no, would there be related events that might be of interest?
There are memory controller performance counters on the Sandy Bridge Xeon E3-12xx series processors, but I can't find the Intel article describing them right this minute and I am having trouble remembering the differences between these counters and the similar memory controller counters on Xeon Phi. Maybe after another cup of coffee....
I have used the counter on a Xeon E3-1270 (Sandy Bridge) and they appear to be accurate. They are very likely the same on the Ivy Bridge based Xeon E3-1200 and Core i3/5/7 processors, but I have not looked to see if they are the same on Haswell-based processors.
These counters might be documented in the Intel PCM source code, or perhaps in one of the Intel VTune configuration files.
If I recall correctly, one inconvenient feature is that these are 32-bit counters, so they overflow rather quickly. E.g., at the peak DRAM transfer rate of 25.6 GB/s (2 channels of DDR3/1600), you are accessing 0.4 billion cache lines per second. This does a full loop through a 32-bit counter in 10.74 seconds, so you need to read the counters at least this often in order to unambiguously identify and correct for counter wrapping.
Thank you very much for your response!
I just found an article that describes such events for Core processors: https://software.intel.com/en-us/articles/monitoring-integrated-memory-controller-requests-in-the-2n...
The real difference between the counters I found in this article, and the ones that are described in the SDM vol. 3 for Nehalem processors + the Xeon uncore performance monitoring manuals, are that these counters are aggregated for all channels (the counters all say "sum of all channels").
I am however searching for the fine-grained counters per channel that are described in the other manuals.
Do you know if the counters you are referring to for the Xeon E3-1270 were one of these aggregated counters?
I also looked at the PCM programs (v2.7) on my Core i5 Ivy Bridge, on Linux.
- When using pcm-memory.x, I have this error message:
"Detected Intel(R) Core(TM) i5-3340M CPU @ 2.70GHz "Intel(r) microarchitecture codename Ivy Bridge"
Jaketown, Ivytown or Haswell Server CPU is required for this tool!"
- When using pcm.x, I have "N/A" on the "READ" and "WRITE" columns.
I am not sure if this is related to the performance counters/events I am trying to monitor.
Concerning the possible overflow, it is a good thing to know, but it won't be a problem in my case, as what I am searching to monitor will be below this threshold of 10 seconds.
Thanks for finding that reference -- that is indeed the article that I used to implement my code that read the DRAM counters on the Xeon E3-1270 (Sandy Bridge) system. Using the STREAM benchmark to provide a known number of DRAM accesses, I tested the DRAM_DATA_READS and DRAM_DATA_WRITES counters and obtained results that appeared accurate -- the number of reads and writes was typically between 1% and 3% higher than the values I expected. These are typical overheads and are due to some combination of TLB traffic and the extra reads and writes associated with the OS instantiating and zero-filling the pages when I first access them.
These counters are in a memory-mapped IO region that is not trivially accessible. On my system I ran as root so I could open /dev/mem and use an mmap() call to get a pointer to the BAR mentioned in the article. I treated the mmap'd region as an array of 32-bit unsigned integers and simply loaded the DRAM_DATA_READS value at Byte offset 0x5050 as array element 0x5050 / 4 = 0x1414 = 5140 (decimal), and the DRAM_DATA_WRITES value as the next element of the array (0x5054 / 4 = 0x1415 = 5141 (decimal).
Fortunately no writing is required to program these counters --- writing directly to physical memory addresses via /dev/mem is not a very safe thing to do! (But it is necessary to program the memory controller performance counters on Xeon Phi, which are also located in a general memory-mapped IO BAR. I tested that program many times using only read operations before I was willing to try it with writes.)