I am trying to understand the behavior the precise MEM_LOAD_RETIRED events on the Nehalem architecutre.In Dr. Levinthal's Performance Analysis Guide for Core i7 processors, he says "The sum of all the MEM_LOAD_RETIRED events will equal the MEM_INST_RETIRED.LOADS count", but I am unclear whether this should include theMEM_LOAD_RETIRED.DTLB_MISS. In particular does a memory load that misses the LLC and the DTLB trigger both the LLC_MISS and DTLB_MISS events or are they mutually exclusive?
Collecting these events using VTune 9.1, I get the following event counts:
I am surprised to see that the number DTLB_MISS events is almost twice as large as the number of LLC_MISS events. I understand that there may be edge cases in which we see a DTLB miss and a LLC hit (e.g. as disscussed at http://origin-software.intel.com/en-us/forums/showthread.php?t=70535), but I would expect these to be relatively infrequent.
Am I misinterpreting these counters? I'm really just trying to find the fraction of loads serviced by each level of the memory hierarchy, and I'm not sure what to do with the DTLB misses.