I am trying to measure L1 cache misses for my program. I turned on the L1 data cache miss rate for sampling. When checked, noticed that it has turned on the L2D_REPL event. The L1 data cache miss rate is defined as
L1 data cache rate = L1D_REPL/INST_RETIRED.ANY
L1D_REPL : This event counts the number of lines brought into the L1 data cache
I was hoping that for L1 data cache misses, it will be
L1 data cache rate = L1D_CACHE_LD.I_STATE/INST_RETIRED.ANY
L1D_CACHE_LD.I_STATE : Counts how many times requests miss the cache
Then I tried to measure L1D_CACHE_LD.I_STATE and L1D_REPL events. Though they are not exactly same, those are not far apart either.
So I am trying to understand why L1 data cache miss rate is considering L1D_REPL instead of L1D_CACHE_LD.I_STATE.
On the couterpart of L2 cache miss rate, it seems to be considering L2_LINES_IN.SELF.ANY/INST_RETIRED.ANY.
L2_LINES_IN.SELF.ANY : This event counts the number of cache lines allocated in the L2 cache.
L2 as well as L1 cache miss rate is based on cache lines allocated. Why isnt it based on cache misses such as L1D_CACHE_LD.I_STATE. Also, for L2 cache, I dont see something similar to L1D_CACHE_LD.I_STATE.
Any insight will help.
You talked about "L1 data cache miss rate" - Iperfer to use MEM_LOAD_RETIRED.L1D_MISS event, or L1D_CACHE_LD.I_STATE event.
It doesn't make sense to use L1D_REPL event to measure L1 data cache misses.
For L2 data cache miss, please use MEM_LOAD_RETIRED.L2_MISS event. (L2_LINES_IN measures both L2 instruction cache misses andL2 data cache misses)
I am not sure why the built-in L1 data cache miss rate is using L1D_REPL.
I was not very clear about the difference between MEM_LOAD_RETIRED.L1D/L2_MISS and MEM_LOAD_RETIRED.L1D/L2_LINE_MISS. Any thoughts ?
Another related query -- If I add multiple events to be counted, VTune needs multiple runs of the program. If I am counting cache related events, I guess the first run of the program will have impact on the second run as the data will be cached by first run. Is there anyway to invalidate the whole cache before any run ? This will give clean counts for cache related events.
If you have multiple events to be monitored in separated runs, note that each run is independent.
You are right! Sometime first run will remain data in cache - it will impact on second run:-(
So you can split one activity to two activities - before you run second activity, run other program on your platform to invalidate the cache.
Does it help?
To that matter I have written function to churn lot of data before actually executing the funtion of interest. That way probably most of the cache should have flushed the data. But all this is speculation that the cache will be flushed. I have tried 'clflush' as well but do not see much impact with/without clflush.
Anyway. Thanks for your responses. Its good to know that multiple runs of the program to collect different events are run independently.
Thanks, - Milind