- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi guys, i would understand better this counter, because in some description it seems that count the miss and hit to the instruction cache, from my benchmark it seems not happens, i ask me if maybe count other thing, in some paper that i read, some guys use this counter for counting instruction TLB miss and hit so i'am little counfused about its purpose..!
Thanks a lot
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
These are certainly documented as being related to instruction fetches, but this can be very hard to understand in Intel implementations.
Figure 2-2 and Figure 2-6 in the Intel Architectures Optimization Reference Manual (document 248966-042b, September 2019) show block diagrams for the SKX and SKL cores, respectively. Micro-ops to be executed can be fetched from the "Decoded ICache (DSB)" or from the "Legacy Decode Pipeline" or from the "MSROM" (Microcode). It is not clear exactly where "instruction fetch" is being counted. If it is being counted only in the Legacy Decode Pipeline, for example, then the counts could miss fetches by the BPU (Branch Prediction Unit) into the Decoded ICache. The comment in the Intel perfmon website (https://download.01.org/perfmon/SKX/skylakex_core_v1.17.json) for this event says that it counts at 64-Byte Cacheline granularity, which may not make sense for pre-decoded uops.
I have never tested the ICache-related counters on Intel architectures -- partly because the details can be confusing (and partly because applications in my world are not limited by instruction fetch).
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page