Which counter should I use to measure L1I-cache misses on Skylake/SkylakeX platform?

Denis_B_Intel · ‎07-23-2018

I found that perf measure ICACHE_64B.IFTAG_MISS counter for L1-icache-load-misses pre-defined event. See this thread: https://www.spinics.net/lists/linux-perf-users/msg06381.html
The thing here is that when I run the same workload on different platforms, say, Skylake and Haswell I have results that differ by an order of magnitude:
Skylake:
$ perf stat -e L1-icache-load-misses ./a.out
3291090 L1-icache-load-misses # measured based on ICACHE_64B.IFTAG_MISS
Haswell:
$ perf stat -e L1-icache-load-misses ./a.out
521119 L1-icache-load-misses # measured based on ICACHE.MISSES

This doesn't look like an improvement between the architectures. On Skylake we have FRONTEND_RETIRED.L1I_MISS which supports PEBS and gives a number closer to Haswell (in fact it's lower: 341626).

Any comments?

Dmitry_R_Intel1 · ‎07-23-2018

The documentation for FRONTEND_RETIRED.L1I_MISS event explicitly says that this event counts retired instructions. While ICACHE_64B.IFTAG_MISS documentation doesn't specify this so probably it counts all fetches. This can be one of the reasons for the difference you saw.

In VTune we are using FRONTEND_RETIRED.L1I_MISS event for SKL.

YeeHaaw · ‎03-14-2021

I think for Skylake, it says L1-icache-load-misses # measured based on ICACHE_64B.IFTAG_MISS
But for Haswell, it says L1-icache-load-misses # measured based on ICACHE.MISSES
so maybe it means the no i-cache (L1 i-cache, L2 cache, L3 cache) can serve it? That is what ICACHE.MISSES should mean