Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
5251 토론

Is there a metric or a way to measure how effiency the prefetcher is?

oleotiger
초보자
2,111 조회수

There are many prefetcher in SoC, hardware, MC,XPT prefetch etc.

Prefetch  can reduce cache miss rate but may result in residual overhead for memory bandwith which will raise latency.

 I wanna to find a metrics to measure how efficient the prefetch is.

For example, in 1 second there are 1000 memory access ( data load from memory to L3) and among the 1000 access there are 200 prefetches. I think there may be a tag in each cache line to indicate that if it's prefeteched or not.

Then is there a metric that show how efficient a cache line is? e.g. before evicted how many times has the cache line be visited,0 or 1 or 100+?

 

Can vtune tell me this?

 

0 포인트
1 솔루션
McCalpinJohn
명예로운 기여자 III
2,083 조회수

To get the full picture that you sketched out would require access to an accurate processor + memory-system simulator.  This is not available outside of Intel -- too many of the details are not published (e.g., replacement policies, dynamically adaptive prefetch policies, etc.).  But I can think of three approaches that can provide partial information:

  1. Intel added a new event in Skylake Xeon (and Cascade Lake Xeon) that provides a little bit of information.  The event L2_LINES_OUT.USELESS_HPWF (Event 0xF2, Umask 0x04) increments when a line that has been prefetched by the hardware prefetcher is evicted from the L2 without being accessed by the core.
  2. It is also easy to disable the hardware prefetchers on most Intel systems (https://software.intel.com/en-us/articles/disclosure-of-hw-prefetcher-control-on-some-intel-processors).  Comparing hit/miss rates with the various HW prefetchers enabled/disabled can provide a good indication of the effectiveness of the HW prefetching engines.
  3. Intel's Processor Event Based Sampling (PEBS) facility (and its relatives the Load Latency Monitoring Facility and the Precise Store Facility) can provide detailed information on the TLB and cache interactions of a random subset of the load instructions from an execution.  This is sampled, so you don't get to choose the addresses that are chosen, but it does provide very specific information.  The Linux "perf mem" command uses the PEBS facility to gather data about memory references in a program execution.

원본 게시물의 솔루션 보기

5 응답
AthiraM_Intel
중재자
2,090 조회수

Hi,


Thanks for reaching out to us.


We are investigating internally regarding this. We will get back to you soon with an update.


Thanks


0 포인트
McCalpinJohn
명예로운 기여자 III
2,084 조회수

To get the full picture that you sketched out would require access to an accurate processor + memory-system simulator.  This is not available outside of Intel -- too many of the details are not published (e.g., replacement policies, dynamically adaptive prefetch policies, etc.).  But I can think of three approaches that can provide partial information:

  1. Intel added a new event in Skylake Xeon (and Cascade Lake Xeon) that provides a little bit of information.  The event L2_LINES_OUT.USELESS_HPWF (Event 0xF2, Umask 0x04) increments when a line that has been prefetched by the hardware prefetcher is evicted from the L2 without being accessed by the core.
  2. It is also easy to disable the hardware prefetchers on most Intel systems (https://software.intel.com/en-us/articles/disclosure-of-hw-prefetcher-control-on-some-intel-processors).  Comparing hit/miss rates with the various HW prefetchers enabled/disabled can provide a good indication of the effectiveness of the HW prefetching engines.
  3. Intel's Processor Event Based Sampling (PEBS) facility (and its relatives the Load Latency Monitoring Facility and the Precise Store Facility) can provide detailed information on the TLB and cache interactions of a random subset of the load instructions from an execution.  This is sampled, so you don't get to choose the addresses that are chosen, but it does provide very specific information.  The Linux "perf mem" command uses the PEBS facility to gather data about memory references in a program execution.
AthiraM_Intel
중재자
2,038 조회수

Hi,


Could you please give us an update? Has the solution provided by McCalpinJohn helped?



Thanks.



0 포인트
AthiraM_Intel
중재자
2,010 조회수

Hi,


We have not heard back from you, so we will close this inquiry now. If you need further assistance, please post a new question.


Thanks.


0 포인트
oleotiger
초보자
1,946 조회수

Thank you. I have accept solution provided by McCalpinJohn already.

0 포인트
응답