Analyzers
Support for Analyzers (Intel VTune™ Profiler, Intel Advisor, Intel Inspector)
4641 Discussions

Is there a metric or a way to measure how effiency the prefetcher is?

oleotiger
Novice
422 Views

There are many prefetcher in SoC, hardware, MC,XPT prefetch etc.

Prefetch  can reduce cache miss rate but may result in residual overhead for memory bandwith which will raise latency.

 I wanna to find a metrics to measure how efficient the prefetch is.

For example, in 1 second there are 1000 memory access ( data load from memory to L3) and among the 1000 access there are 200 prefetches. I think there may be a tag in each cache line to indicate that if it's prefeteched or not.

Then is there a metric that show how efficient a cache line is? e.g. before evicted how many times has the cache line be visited,0 or 1 or 100+?

 

Can vtune tell me this?

 

0 Kudos
1 Solution
McCalpinJohn
Black Belt
394 Views

To get the full picture that you sketched out would require access to an accurate processor + memory-system simulator.  This is not available outside of Intel -- too many of the details are not published (e.g., replacement policies, dynamically adaptive prefetch policies, etc.).  But I can think of three approaches that can provide partial information:

  1. Intel added a new event in Skylake Xeon (and Cascade Lake Xeon) that provides a little bit of information.  The event L2_LINES_OUT.USELESS_HPWF (Event 0xF2, Umask 0x04) increments when a line that has been prefetched by the hardware prefetcher is evicted from the L2 without being accessed by the core.
  2. It is also easy to disable the hardware prefetchers on most Intel systems (https://software.intel.com/en-us/articles/disclosure-of-hw-prefetcher-control-on-some-intel-processo...).  Comparing hit/miss rates with the various HW prefetchers enabled/disabled can provide a good indication of the effectiveness of the HW prefetching engines.
  3. Intel's Processor Event Based Sampling (PEBS) facility (and its relatives the Load Latency Monitoring Facility and the Precise Store Facility) can provide detailed information on the TLB and cache interactions of a random subset of the load instructions from an execution.  This is sampled, so you don't get to choose the addresses that are chosen, but it does provide very specific information.  The Linux "perf mem" command uses the PEBS facility to gather data about memory references in a program execution.

View solution in original post

5 Replies
AthiraM_Intel
Moderator
401 Views

Hi,


Thanks for reaching out to us.


We are investigating internally regarding this. We will get back to you soon with an update.


Thanks


McCalpinJohn
Black Belt
395 Views

To get the full picture that you sketched out would require access to an accurate processor + memory-system simulator.  This is not available outside of Intel -- too many of the details are not published (e.g., replacement policies, dynamically adaptive prefetch policies, etc.).  But I can think of three approaches that can provide partial information:

  1. Intel added a new event in Skylake Xeon (and Cascade Lake Xeon) that provides a little bit of information.  The event L2_LINES_OUT.USELESS_HPWF (Event 0xF2, Umask 0x04) increments when a line that has been prefetched by the hardware prefetcher is evicted from the L2 without being accessed by the core.
  2. It is also easy to disable the hardware prefetchers on most Intel systems (https://software.intel.com/en-us/articles/disclosure-of-hw-prefetcher-control-on-some-intel-processo...).  Comparing hit/miss rates with the various HW prefetchers enabled/disabled can provide a good indication of the effectiveness of the HW prefetching engines.
  3. Intel's Processor Event Based Sampling (PEBS) facility (and its relatives the Load Latency Monitoring Facility and the Precise Store Facility) can provide detailed information on the TLB and cache interactions of a random subset of the load instructions from an execution.  This is sampled, so you don't get to choose the addresses that are chosen, but it does provide very specific information.  The Linux "perf mem" command uses the PEBS facility to gather data about memory references in a program execution.
AthiraM_Intel
Moderator
349 Views

Hi,


Could you please give us an update? Has the solution provided by McCalpinJohn helped?



Thanks.



AthiraM_Intel
Moderator
321 Views

Hi,


We have not heard back from you, so we will close this inquiry now. If you need further assistance, please post a new question.


Thanks.


oleotiger
Novice
257 Views

Thank you. I have accept solution provided by McCalpinJohn already.

Reply