Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Which counter should I use to measure L1I-cache misses on Skylake/SkylakeX platform?

Denis_B_Intel
Employee
1,194 Views

I found that perf measure ICACHE_64B.IFTAG_MISS counter for L1-icache-load-misses pre-defined event. See this thread: https://www.spinics.net/lists/linux-perf-users/msg06381.html
The thing here is that when I run the same workload on different platforms, say, Skylake and Haswell I have results that differ by an order of magnitude:
Skylake:
$ perf stat -e L1-icache-load-misses ./a.out
           3291090      L1-icache-load-misses         # measured based on ICACHE_64B.IFTAG_MISS
Haswell:
$ perf stat -e L1-icache-load-misses ./a.out
            521119      L1-icache-load-misses         # measured based on ICACHE.MISSES

This doesn't look like an improvement between the architectures. On Skylake we have FRONTEND_RETIRED.L1I_MISS which supports PEBS and gives a number closer to Haswell (in fact it's lower: 341626).

Any comments?

 

0 Kudos
2 Replies
Dmitry_R_Intel1
Employee
1,194 Views

The documentation for FRONTEND_RETIRED.L1I_MISS event explicitly says that this event counts retired instructions. While ICACHE_64B.IFTAG_MISS documentation doesn't specify this so probably it counts all fetches. This can be one of the reasons for the difference you saw.

In VTune we are using FRONTEND_RETIRED.L1I_MISS event for SKL.

0 Kudos
YeeHaaw
Beginner
1,077 Views
I think for Skylake, it says L1-icache-load-misses # measured based on ICACHE_64B.IFTAG_MISS
But for Haswell, it says L1-icache-load-misses # measured based on ICACHE.MISSES
so maybe it means the no i-cache (L1 i-cache, L2 cache, L3 cache) can serve it? That is what ICACHE.MISSES should mean
0 Kudos
Reply