I'm tuning 462.libquantum with PMU events in SNB. Several L2 events confuse me.
1) l2_rqsts.pf_hit (24H:40H): Requests from L2 Hardware prefetcher that hit L2.
What's L2 Hardware prefetcher? Where the data it prefetchs go? Why it access L2?
There are two types of prefetchres in SNB according to the Intel manual. One is "prefetch to L1" and another is "prefetch to L2/LLC". My understanding is that L2 Hardware prefetchre belongs to the latter. Right?
2) What's the difference between l2_rqsts.all_demand_data_rd (24H:03H) and l2_trans.demand_data_rd (F0H:01H)?
l2_rqsts.all_demand_data_rd: Counts any demand and L1 HW prefetch data load request to L2.
l2_trans.demand_data_rd: Demand data read requests tha access L2 Cache
That is, l2_rqsts.all_demand_data_rd - l2_trans.demand_data_rd = Number of L1 HW prefetches?
3) What's LLC prefetch? I see the term in the description of l2_trans.all_pf. "L2 or LLC prefetches that access L2"
Thanks a ton!
You can read about the prefetchers in the Optimization guide http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimizat...
It is an error in the documentation that l2_rqsts.all_demand_data_rd says it counts L1 hw prefetch. l2_rqsts.all_demand_data_rd doesn't count prefetches.
1) That means that there was cache hit when data was speculatively pre-loaded into L2 cache.Prefetched data goes to LLC and/or L2 cache.I suppose that LLC cache can be shared by multiply cores thus allowing synchronized interoperations between multithreaded process sharing the same cache.
There are two types of L2 prefetchers: a) spatial prefetcher and b) streamer prefetcher