I'm working with a Core 2 CPU, and I'm somewhat confused regarding the difference between the MEM_LOAD_RETIRED.L2_MISS, MEM_LOAD_RETIRED.L2_LINE_MISS, and L2_LINES_IN events. From what I understand from the documentation, L2_LINES_IN is more generic than MEM_LOAD_RETIRED, with the latter only limited to misses limited to explicit loads. Is my understanding correct? Under what circumstances would MEM_LOAD_RETIRED.L2_MISS not equal MEM_LOAD_RETIRED.L2_LINE_MISS?
Suppose I'm seeing a high MEM_LOAD_RETIRED count but a low L2_LINES_IN count. What would that imply?
Thanks for your help.
Check below for respective defn.-
L2_LINES_IN: This event counts the number of cache lines allocated in the L2 cache. Cache lines are allocated in the L2 cache as a result of requests from the L1 data and instruction caches and the L2 hardware prefetchers to cache lines that are missing in the L2 cache.
If the data is/are not present in the cache, or if the cache line is invalidated, the CPU updates its cache by reading the data from the main memory. This processor eventis termedas
MEM_LOAD_RETIRED.L2_LINE_MISS: This event counts the number of load operations that miss the L2 cache and result in bus request to fetch the missing cache line. That is the missing cache line fetching has not yet started. This event count is equal to the number of cache lines fetched from memory by retired loads. The event might not be counted if the load is blocked.
One can infer - when the event count is multiplied with penalty in cycles, one can estimate the impact on the stalls cycles. Try to minimize this cycles with proper optimizations or re-defining the code or section of code or performing vectorizations.
MEM_LOAD_RETIRED.L2_MISS: This event counts the number of retired load operations that missed the L2 cache.
I think by now, one can have clarity of these three events.
You qouted - "Suppose I'm seeing a high MEM_LOAD_RETIRED count but a low L2_LINES_IN count. What would that imply?"
Could you check any level of prefetching is being applied. But it would mean, almost no/low L2 Cache Miss Rate as, L2_CACHE_MISS_RATE ~ L2_LINES_IN/INST_RETIRED. Do check in percentage it's improvement.
Note that MEM_LOAD_RETIRED.L2_MISS includes the samples which also contains the address of instruction's executions (L2 Instruction Cache Misses), but L2_LINES_IN counts for L2 Data Cache Misses only.
Thanks for both your replies. So if I want to examine data cache miss events, I should be primarily concerned with L2_LINES_IN, and MEM_LOAD_RETIRED should only be of secondary interest, correct? If L2_LINES_IN only counts data misses, why does the documentation for it mention "requests from the L1 data and instruction caches" (emphasis mine)?
As a final clarification, suppose an L2 miss occurred for addresses n and n + 8, where both lie within the same cache line. Am I correct in thinking that MEM_LOAD_RETIRED.L2_MISS will be 2 while MEM_LOAD_RETIRED.L2_LINE_MISS will be 1 (because the first miss will trigger a bus request to fetch the cache line containing both addresses)?
It's my fault that MEM_LOAD_RETIRED.L2_LINE_MISS should not include instruction prefetch, but L2_LINES_IN should include all L2 misses. So MEM_LOAD_RETIRED.L2_LINE_MISS should be first considered (if you care about L2 misses caused by data load only), please refer dochttp://assets.devx.com/goparallel/18027.pdf, at last page - A few more useful events