Solved: Understand the relationship between uncore load and LLC miss

Zhu_G_ · ‎12-02-2015

Hi community!

I am reading from uncore monitoring event on intel XEON V3. And I have found that the count of uncore read/write is much larger than the count of LLC miss.

For example, in 1 sec I monitored 93763661 load and there is only 20119450 llc miss.

Maybe it has something to do with memory level parallelism or just because of burst access. But I am not sure about this.

I wonder if anyone can tell me how to understand this.

McCalpinJohn · ‎12-07-2015

The CAS_COUNT.RD event from the IMC units in the uncore appears to be reliable, but it is important to remember that this event will count accesses from all cores, from all IO devices, and for all access types (LLC load miss, LLC store miss, and hardware prefetches that miss the LLC).

I am not sure that I checked this on a Xeon E5 v3, but I think that the LONGEST_LAT_CACHE.MISS event only counts demand loads that miss the LLC or demand stores that miss the LLC -- it does not count hardware prefetches that miss the LLC.

If you disable the hardware prefetchers (as described at https://software.intel.com/en-us/articles/disclosure-of-hw-prefetcher-control-on-some-intel-processors), and count LONGEST_LAT_CACHE.MISS on all the cores, the counts should be pretty close except for IO traffic (and perhaps page miss handler accesses, if any).

View solution in original post

McCalpinJohn · ‎12-03-2015

These issues can only be addressed with very specific and very detailed information about the hardware platform and the specific performance counter events being compared.

In this case I will guess that when you say "count of LLC miss" you are referring to a core hardware performance counter event such as LONGEST_LAT_CACHE.MISS (Event 0x2E, Umask 0x41). This event appears to count accesses by demand load or demand store instructions that miss in the LLC cache. It does not appear to count accesses by the L2 hardware prefetchers. So one would not expect the values to be similar to those of LLC access counts measured at the LLC.

It is much harder to guess what you mean when you say "count of uncore read/write". You say that this is in reference to an uncore performance monitoring event, but none of the events described in Section 2.3 of the Xeon E5 v3 uncore performance monitoring guide are that described so inexactly. You might be referring to the "Offcore response" event of the core performance counters, but those events are also much more specifically described than "count of read/write".

Zhu_G_ · ‎12-06-2015

Thank you Dr. Bandwidth!

And sorry for not providing the event I used in the last post. For load request I mean CAS_COUNT.RD from uncore and for LLC miss I used LONGEST_LAT_CACHE.MISS.

McCalpinJohn · ‎12-07-2015

The CAS_COUNT.RD event from the IMC units in the uncore appears to be reliable, but it is important to remember that this event will count accesses from all cores, from all IO devices, and for all access types (LLC load miss, LLC store miss, and hardware prefetches that miss the LLC).

I am not sure that I checked this on a Xeon E5 v3, but I think that the LONGEST_LAT_CACHE.MISS event only counts demand loads that miss the LLC or demand stores that miss the LLC -- it does not count hardware prefetches that miss the LLC.

If you disable the hardware prefetchers (as described at https://software.intel.com/en-us/articles/disclosure-of-hw-prefetcher-control-on-some-intel-processors), and count LONGEST_LAT_CACHE.MISS on all the cores, the counts should be pretty close except for IO traffic (and perhaps page miss handler accesses, if any).