- Counted L1D events (Counts the number of lines brought from/to the L1 data cache.) with a unit mask of 0x01 (repl Counts the number of lines brought into the L1 data cache) count 10000
- Counted L1D_CACHE_LD events (Counts L1 data cache read requests.) with a unit mask of 0x01 (i_state Counts L1 data cache read requests where the cache line to be loaded is in the I (invalid) state, i) count 10000
I don't know why you care of L1D miss, actually the penalty of one L1D miss only costs extra 4-8 cycles. Usually most of developers care of L2 misses, LLC misses.
A count of L1D misses can be achieved with the use of all the MEM_LOAD_RETIRED
events, except MEM_LOAD_RETIRED.L1D_HIT:
L1D_MISSES = MEM_LOAD_RETIRED.HIT_LFB +
MEM_LOAD_RETIRED.L2_HIT + MEM_LOAD_RETIRED.LLC_UNSHARED_HIT
+ MEM_LOAD_RETIRED.OTHER_CORE_HIT_HITM +
Please read this article, written by Dr David Levinthal.
Hope it helps.
NOTE: many of these events are known to overcount (l1d_cache_ld, l1d_cache_lock) sothey can only be used for qualitative analysis.
Thanks for explaining your requirements!
I agree that many events are overlapped...but the user should select them adequately...
In my view:
L1D.REPL is for L1D cache line flushing, driven by page fault and TLB will translate/reload data to L1D
L1D_CACHE_LD.I_STATE counts all L1D misses, that is what you want.
L1D cache miss happens - it doesn't mean L1D page fault.