The documentation on this is a little difficult to understand. So, from the Intel 64 and ia32 Architecture Developer's Manual (Vol. 3B), there are a number of PMC's that I can use to monitor L3 cache. Two of them are interesting but I wanted to make sure what they were doing. (This is from section 19.2)
B0H 10H OFFCORE_REQUESTS.L3_MISS_ DEMAND_DATA_RD Demand data read requests that missed L3
2EH 41H LONGEST_LAT_CACHE.MISS This event counts each cache miss condition for references to the L3 cache.
Is the difference in these two that the 1st counts only offcore (which i guess means other cores than the polling one) and the other gives a cummulative? Or is there something that I'm missing?
I have not tested these events on Skylake, but if the definitions are similar to earlier processors, the LONGEST_LAT_CACHE.MISS event will count demand loads that miss the LLC and demand stores that miss the LLC. The OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD will only count demand loads that miss the LLC. Neither event will count L2 hardware prefetches that miss the LLC, so neither event is useful for determining the actual data traffic. They are intended to help identify accesses that are *not* prefetched, since these are more likely to cause stalls.