- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am trying to find out if my application has some instances of false-sharing that i can improve. I read this awesome article: https://software.intel.com/en-us/articles/avoiding-and-identifying-false-sharing-among-threads
but the referred performance counters are not available in my Sandy Bridge machine. Does someone know the corresponding performance counters i should use to detect false-sharing?
Thank you
- Tags:
- Parallel Computing
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There are quite a few events on Sandy Bridge processors that can be used to obtain similar information.
The main thing that you are looking for is a rapid increase in cache misses that hit modified data in another cache. These often have names that include "HitM".
From Section 19.6 of Volume 3 of the Intel Architectures Software Developer's Manual, a number of events whose name or description includes "HitM":
- L1D.ALL_M_REPLACEMENT (Event 0x51, Umask 0x08) counts dirty lines that are evicted from the L1 Data Cache either by "Snoop HitM" or by victim eviction of modified lines.
- MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HITM (Event 0xD2, Umask 0x04) counts load uops whose data source was a dirty line in another core in the same package.
Many of the events using the OFFCORE_RESPONSE performance counter events are slightly different between the "client" and "server" Sandy Bridge processors, as described in Tables 19-14 and 19-15 of Volume 3 of the SW Developer's Manual. For the "client" Sandy Bridge parts, it looks like the event OFFCORE_RESPONSE.ALL_RFO.LLC_HIT.HITM_OTHER_CORE_[01] is exactly what you want -- it counts every time a store misses in a core's cache and the cache line is found modified in another core's cache. (These are always "local", since the "client" parts only support a single package per system.) I have not tested this event, but if it works correctly, it should be exactly what you want.
The "client" parts also have some relevant uncore counters described in Table 19-16, but these are a little harder to use.
For the "server" Sandy Bridge parts, the preface to Table19-15 notes that a bypass needs to be disabled for the MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HITM event to be accurate. I think that this should be done automatically by VTune if this event is selected. Intel only lists a subset of the possible OFFCORE_RESPONSE events in Tables 19-14 and 19-15. The ones that are listed are quite likely to work, but other sub-events might also work. The events are described in Section 18.8.5, which should be read in conjunction with the examples in Table 19-15, but this is not easy reading....
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There are quite a few events on Sandy Bridge processors that can be used to obtain similar information.
The main thing that you are looking for is a rapid increase in cache misses that hit modified data in another cache. These often have names that include "HitM".
From Section 19.6 of Volume 3 of the Intel Architectures Software Developer's Manual, a number of events whose name or description includes "HitM":
- L1D.ALL_M_REPLACEMENT (Event 0x51, Umask 0x08) counts dirty lines that are evicted from the L1 Data Cache either by "Snoop HitM" or by victim eviction of modified lines.
- MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HITM (Event 0xD2, Umask 0x04) counts load uops whose data source was a dirty line in another core in the same package.
Many of the events using the OFFCORE_RESPONSE performance counter events are slightly different between the "client" and "server" Sandy Bridge processors, as described in Tables 19-14 and 19-15 of Volume 3 of the SW Developer's Manual. For the "client" Sandy Bridge parts, it looks like the event OFFCORE_RESPONSE.ALL_RFO.LLC_HIT.HITM_OTHER_CORE_[01] is exactly what you want -- it counts every time a store misses in a core's cache and the cache line is found modified in another core's cache. (These are always "local", since the "client" parts only support a single package per system.) I have not tested this event, but if it works correctly, it should be exactly what you want.
The "client" parts also have some relevant uncore counters described in Table 19-16, but these are a little harder to use.
For the "server" Sandy Bridge parts, the preface to Table19-15 notes that a bypass needs to be disabled for the MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HITM event to be accurate. I think that this should be done automatically by VTune if this event is selected. Intel only lists a subset of the possible OFFCORE_RESPONSE events in Tables 19-14 and 19-15. The ones that are listed are quite likely to work, but other sub-events might also work. The events are described in Section 18.8.5, which should be read in conjunction with the examples in Table 19-15, but this is not easy reading....
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page