- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Everyone,
There is an issue confusing me for a long time. I use perf subsystem in Linux kernel to monitor last-level-cache-related events. And I know perf is implemented based on setting up and reading specific MSR for monitoring specific architectural or non-architectural core events or uncore events for intel machine. But some of events implemented in perf are based on kernel rather than MSR which cannot be detected by PMU.
Here comes my confusion. For LLC_misses, Intel has dedicated MSR responsible for monitoring it and also it is one of architectural events. What I am confused about is that whether PMU could detect and differentiate load-misses and store-misses in last level cache? In my opinion, every store-miss will cause a load-miss. And in the view of cache, it will treat each load-miss and each store-miss as load-miss, so it cannot differentiate load against store misses. I also checked Intel manual 3B chapter 19 Table 19-11. And I didn't find the events for llc-load-misses and llc-store-misses. However, there is an event called MEM_LOAD_UOPS_RETIRED.LLC_MISS (may be equivalent to llc-load-misses) which is retired load uops whose data source is LLC miss, in the core view (not cache view). But why there is not an event called MEM_STORE_UOPS_RETIRED.LLC_MISS which is retired store uops whose data source is LLC miss? And for the architectural event LLC_misses, does it include both last level load misses and store misses?
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is certainly possible to distinguish between store misses and load misses at the hardware level. Although both cause data to be loaded, they have very different requirements for cache coherence, so they are implemented using different transactions.
Transactions through the memory hierarchy can often be monitored from multiple locations. As you noted, Intel does not include an explicit event in the core counters for differentiating between load and store accesses to the LLC, but they do include this capability (and much more) in the OFFCORE_RESPONSE event (described in Chapter 18 of Volume 3 of the SW Developer's Manual). For the OFFCORE_RESPONSE events, you can count load misses with the DMND_DATA_RD request type and you can count store misses with the DMND_RFO request type. You can differentiate between LLC hits and LLC misses using combinations of the "response supplier" or "snoop info" fields.
Intel has been careful not to put a very specific definition in place for the "architectural" version of the LLC reference and LLC miss events, but the processor-specific descriptions of the event/umask combinations 2EH_41H and 2EH_4FH have slightly different wording that can provide some indication of what is counted on each specific processor. For the Xeon E5 (Sandy Bridge) and Xeon E5 v3 (Haswell) systems that I have tested, these events appear to count both demand load misses and demand store misses, but definitely not L2 hardware prefetches. They probably also count L1 hardware prefetches that miss in the L2, but that is harder to test.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is certainly possible to distinguish between store misses and load misses at the hardware level. Although both cause data to be loaded, they have very different requirements for cache coherence, so they are implemented using different transactions.
Transactions through the memory hierarchy can often be monitored from multiple locations. As you noted, Intel does not include an explicit event in the core counters for differentiating between load and store accesses to the LLC, but they do include this capability (and much more) in the OFFCORE_RESPONSE event (described in Chapter 18 of Volume 3 of the SW Developer's Manual). For the OFFCORE_RESPONSE events, you can count load misses with the DMND_DATA_RD request type and you can count store misses with the DMND_RFO request type. You can differentiate between LLC hits and LLC misses using combinations of the "response supplier" or "snoop info" fields.
Intel has been careful not to put a very specific definition in place for the "architectural" version of the LLC reference and LLC miss events, but the processor-specific descriptions of the event/umask combinations 2EH_41H and 2EH_4FH have slightly different wording that can provide some indication of what is counted on each specific processor. For the Xeon E5 (Sandy Bridge) and Xeon E5 v3 (Haswell) systems that I have tested, these events appear to count both demand load misses and demand store misses, but definitely not L2 hardware prefetches. They probably also count L1 hardware prefetches that miss in the L2, but that is harder to test.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Many thanks, Dr. McCalpin! I got the idea!
John McCalpin wrote:
It is certainly possible to distinguish between store misses and load misses at the hardware level. Although both cause data to be loaded, they have very different requirements for cache coherence, so they are implemented using different transactions.
Transactions through the memory hierarchy can often be monitored from multiple locations. As you noted, Intel does not include an explicit event in the core counters for differentiating between load and store accesses to the LLC, but they do include this capability (and much more) in the OFFCORE_RESPONSE event (described in Chapter 18 of Volume 3 of the SW Developer's Manual). For the OFFCORE_RESPONSE events, you can count load misses with the DMND_DATA_RD request type and you can count store misses with the DMND_RFO request type. You can differentiate between LLC hits and LLC misses using combinations of the "response supplier" or "snoop info" fields.
Intel has been careful not to put a very specific definition in place for the "architectural" version of the LLC reference and LLC miss events, but the processor-specific descriptions of the event/umask combinations 2EH_41H and 2EH_4FH have slightly different wording that can provide some indication of what is counted on each specific processor. For the Xeon E5 (Sandy Bridge) and Xeon E5 v3 (Haswell) systems that I have tested, these events appear to count both demand load misses and demand store misses, but definitely not L2 hardware prefetches. They probably also count L1 hardware prefetches that miss in the L2, but that is harder to test.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page