I perform a customized analysis of my program by using vtune. I found there are many events about the L3 miss, including:
Sometimes the MEM_LOAD_RETIRED.L3_MISS_PS equals zero but the L3_MISS_RETIRED.LOCAL_DRAM_PS not. What's the difference between these events? How should I measure the L3 data cache miss using these events?
- cache miss
Judging by the event names, I think you're profiling on a processor with one of the Skylake microarchitectures family.
- The event "MEM_LOAD_RETIRED.L3_MISS_PS" represents a request from a retired demand data load uop that reached all the way to the L3 and missed in the L3 and got eventually serviced from any source. This event counts accurately without privilege-level filtering.
- The event "MEM_LOAD_L3_MISS_RETIRED.LOCAL_DRAM_PS" represents a request from a demand data load uop or a page walker request that reached all the way to the L3 and missed in the L3 and got eventually serviced from one of the IMCs in the local NUMA node, irrespective of whether the load uop retired or not. This event counts accurately.
It should be clear now that these two are not necessarily equal and any of the two is not necessarily larger or smaller than the other.
In addition, VTune, by default, profiles many events with event multiplexing. If these two events happen to be counted in different groups of events, then their counts are for different periods of the program execution and may not comparable.
In particular, the zero value in VTune means that the count is lower than the threshold for triggering an interrupt.
For the Intel SKX platform, the https://download.01.org/perfmon/SKX/skylakex_core_v1.24.json says that the default threshold used for both of these events is 100007, so any count lower than that will result in zero samples.
The actual threshold used for these tests may depend on the version of VTune used and on the specific collection type requested.