Solved: Is there a metric or a way to measure how effiency the prefetcher is?

oleotiger · ‎03-28-2021

There are many prefetcher in SoC, hardware, MC,XPT prefetch etc.

Prefetch can reduce cache miss rate but may result in residual overhead for memory bandwith which will raise latency.

I wanna to find a metrics to measure how efficient the prefetch is.

For example, in 1 second there are 1000 memory access ( data load from memory to L3) and among the 1000 access there are 200 prefetches. I think there may be a tag in each cache line to indicate that if it's prefeteched or not.

Then is there a metric that show how efficient a cache line is? e.g. before evicted how many times has the cache line be visited,0 or 1 or 100+?

Can vtune tell me this?

McCalpinJohn · ‎03-30-2021

To get the full picture that you sketched out would require access to an accurate processor + memory-system simulator. This is not available outside of Intel -- too many of the details are not published (e.g., replacement policies, dynamically adaptive prefetch policies, etc.). But I can think of three approaches that can provide partial information:

Intel added a new event in Skylake Xeon (and Cascade Lake Xeon) that provides a little bit of information. The event L2_LINES_OUT.USELESS_HPWF (Event 0xF2, Umask 0x04) increments when a line that has been prefetched by the hardware prefetcher is evicted from the L2 without being accessed by the core.
It is also easy to disable the hardware prefetchers on most Intel systems (https://software.intel.com/en-us/articles/disclosure-of-hw-prefetcher-control-on-some-intel-processors). Comparing hit/miss rates with the various HW prefetchers enabled/disabled can provide a good indication of the effectiveness of the HW prefetching engines.
Intel's Processor Event Based Sampling (PEBS) facility (and its relatives the Load Latency Monitoring Facility and the Precise Store Facility) can provide detailed information on the TLB and cache interactions of a random subset of the load instructions from an execution. This is sampled, so you don't get to choose the addresses that are chosen, but it does provide very specific information. The Linux "perf mem" command uses the PEBS facility to gather data about memory references in a program execution.

View solution in original post

AthiraM_Intel · ‎03-30-2021

Hi,

Thanks for reaching out to us.

We are investigating internally regarding this. We will get back to you soon with an update.

Thanks

McCalpinJohn · ‎03-30-2021

To get the full picture that you sketched out would require access to an accurate processor + memory-system simulator. This is not available outside of Intel -- too many of the details are not published (e.g., replacement policies, dynamically adaptive prefetch policies, etc.). But I can think of three approaches that can provide partial information:

Intel added a new event in Skylake Xeon (and Cascade Lake Xeon) that provides a little bit of information. The event L2_LINES_OUT.USELESS_HPWF (Event 0xF2, Umask 0x04) increments when a line that has been prefetched by the hardware prefetcher is evicted from the L2 without being accessed by the core.
It is also easy to disable the hardware prefetchers on most Intel systems (https://software.intel.com/en-us/articles/disclosure-of-hw-prefetcher-control-on-some-intel-processors). Comparing hit/miss rates with the various HW prefetchers enabled/disabled can provide a good indication of the effectiveness of the HW prefetching engines.
Intel's Processor Event Based Sampling (PEBS) facility (and its relatives the Load Latency Monitoring Facility and the Precise Store Facility) can provide detailed information on the TLB and cache interactions of a random subset of the load instructions from an execution. This is sampled, so you don't get to choose the addresses that are chosen, but it does provide very specific information. The Linux "perf mem" command uses the PEBS facility to gather data about memory references in a program execution.

AthiraM_Intel · ‎04-05-2021

Hi,

Could you please give us an update? Has the solution provided by McCalpinJohn helped?

Thanks.

AthiraM_Intel · ‎04-12-2021

Hi,

We have not heard back from you, so we will close this inquiry now. If you need further assistance, please post a new question.

Thanks.

oleotiger · ‎04-28-2021

Thank you. I have accept solution provided by McCalpinJohn already.