- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found that perf measure ICACHE_64B.IFTAG_MISS counter for L1-icache-load-misses pre-defined event. See this thread: https://www.spinics.net/lists/linux-perf-users/msg06381.html
The thing here is that when I run the same workload on different platforms, say, Skylake and Haswell I have results that differ by an order of magnitude:
Skylake:
$ perf stat -e L1-icache-load-misses ./a.out
3291090 L1-icache-load-misses # measured based on ICACHE_64B.IFTAG_MISS
Haswell:
$ perf stat -e L1-icache-load-misses ./a.out
521119 L1-icache-load-misses # measured based on ICACHE.MISSES
This doesn't look like an improvement between the architectures. On Skylake we have FRONTEND_RETIRED.L1I_MISS which supports PEBS and gives a number closer to Haswell (in fact it's lower: 341626).
Any comments?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The documentation for FRONTEND_RETIRED.L1I_MISS event explicitly says that this event counts retired instructions. While ICACHE_64B.IFTAG_MISS documentation doesn't specify this so probably it counts all fetches. This can be one of the reasons for the difference you saw.
In VTune we are using FRONTEND_RETIRED.L1I_MISS event for SKL.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
But for Haswell, it says L1-icache-load-misses # measured based on ICACHE.MISSES
so maybe it means the no i-cache (L1 i-cache, L2 cache, L3 cache) can serve it? That is what ICACHE.MISSES should mean

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page