- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There are many prefetcher in SoC, hardware, MC,XPT prefetch etc.
Prefetch can reduce cache miss rate but may result in residual overhead for memory bandwith which will raise latency.
I wanna to find a metrics to measure how efficient the prefetch is.
For example, in 1 second there are 1000 memory access ( data load from memory to L3) and among the 1000 access there are 200 prefetches. I think there may be a tag in each cache line to indicate that if it's prefeteched or not.
Then is there a metric that show how efficient a cache line is? e.g. before evicted how many times has the cache line be visited,0 or 1 or 100+?
Can vtune tell me this?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To get the full picture that you sketched out would require access to an accurate processor + memory-system simulator. This is not available outside of Intel -- too many of the details are not published (e.g., replacement policies, dynamically adaptive prefetch policies, etc.). But I can think of three approaches that can provide partial information:
- Intel added a new event in Skylake Xeon (and Cascade Lake Xeon) that provides a little bit of information. The event L2_LINES_OUT.USELESS_HPWF (Event 0xF2, Umask 0x04) increments when a line that has been prefetched by the hardware prefetcher is evicted from the L2 without being accessed by the core.
- It is also easy to disable the hardware prefetchers on most Intel systems (https://software.intel.com/en-us/articles/disclosure-of-hw-prefetcher-control-on-some-intel-processors). Comparing hit/miss rates with the various HW prefetchers enabled/disabled can provide a good indication of the effectiveness of the HW prefetching engines.
- Intel's Processor Event Based Sampling (PEBS) facility (and its relatives the Load Latency Monitoring Facility and the Precise Store Facility) can provide detailed information on the TLB and cache interactions of a random subset of the load instructions from an execution. This is sampled, so you don't get to choose the addresses that are chosen, but it does provide very specific information. The Linux "perf mem" command uses the PEBS facility to gather data about memory references in a program execution.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for reaching out to us.
We are investigating internally regarding this. We will get back to you soon with an update.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To get the full picture that you sketched out would require access to an accurate processor + memory-system simulator. This is not available outside of Intel -- too many of the details are not published (e.g., replacement policies, dynamically adaptive prefetch policies, etc.). But I can think of three approaches that can provide partial information:
- Intel added a new event in Skylake Xeon (and Cascade Lake Xeon) that provides a little bit of information. The event L2_LINES_OUT.USELESS_HPWF (Event 0xF2, Umask 0x04) increments when a line that has been prefetched by the hardware prefetcher is evicted from the L2 without being accessed by the core.
- It is also easy to disable the hardware prefetchers on most Intel systems (https://software.intel.com/en-us/articles/disclosure-of-hw-prefetcher-control-on-some-intel-processors). Comparing hit/miss rates with the various HW prefetchers enabled/disabled can provide a good indication of the effectiveness of the HW prefetching engines.
- Intel's Processor Event Based Sampling (PEBS) facility (and its relatives the Load Latency Monitoring Facility and the Precise Store Facility) can provide detailed information on the TLB and cache interactions of a random subset of the load instructions from an execution. This is sampled, so you don't get to choose the addresses that are chosen, but it does provide very specific information. The Linux "perf mem" command uses the PEBS facility to gather data about memory references in a program execution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you please give us an update? Has the solution provided by McCalpinJohn helped?
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have not heard back from you, so we will close this inquiry now. If you need further assistance, please post a new question.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you. I have accept solution provided by McCalpinJohn already.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page