Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

profiling for false-sharing on sandy bridge

Ricardo_F_1
Beginner
505 Views

Hello,

I am trying to find out if my application has some instances of false-sharing that i can improve. I read this awesome article: https://software.intel.com/en-us/articles/avoiding-and-identifying-false-sharing-among-threads

but the referred performance counters are not available in my Sandy Bridge machine. Does someone know the corresponding performance counters i should use to detect false-sharing? 

Thank you

0 Kudos
1 Solution
McCalpinJohn
Honored Contributor III
505 Views

There are quite a few events on Sandy Bridge processors that can be used to obtain similar information.

The main thing that you are looking for is a rapid increase in cache misses that hit modified data in another cache.  These often have names that include "HitM".

From Section 19.6 of Volume 3 of the Intel Architectures Software Developer's Manual, a number of events whose name or description includes "HitM":

  • L1D.ALL_M_REPLACEMENT (Event 0x51, Umask 0x08) counts dirty lines that are evicted from the L1 Data Cache either by "Snoop HitM" or by victim eviction of modified lines.
  • MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HITM (Event 0xD2, Umask 0x04) counts load uops whose data source was a dirty line in another core in the same package.

Many of the events using the OFFCORE_RESPONSE performance counter events are slightly different between the "client" and "server" Sandy Bridge processors, as described in Tables 19-14 and 19-15 of Volume 3 of the SW Developer's Manual.   For the "client" Sandy Bridge parts, it looks like the event OFFCORE_RESPONSE.ALL_RFO.LLC_HIT.HITM_OTHER_CORE_[01] is exactly what you want -- it counts every time a store misses in a core's cache and the cache line is found modified in another core's cache.  (These are always "local", since the "client" parts only support a single package per system.)   I have not tested this event, but if it works correctly, it should be exactly what you want.

The "client" parts also have some relevant uncore counters described in Table 19-16, but these are a little harder to use.

For the "server" Sandy Bridge parts, the preface to Table19-15 notes that a bypass needs to be disabled for the MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HITM  event to be accurate.  I think that this should be done automatically by VTune if this event is selected.   Intel only lists a subset of the possible OFFCORE_RESPONSE events in Tables 19-14 and 19-15.  The ones that are listed are quite likely to work, but other sub-events might also work.   The events are described in Section 18.8.5, which should be read in conjunction with the examples in Table 19-15, but this is not easy reading....

View solution in original post

0 Kudos
1 Reply
McCalpinJohn
Honored Contributor III
506 Views

There are quite a few events on Sandy Bridge processors that can be used to obtain similar information.

The main thing that you are looking for is a rapid increase in cache misses that hit modified data in another cache.  These often have names that include "HitM".

From Section 19.6 of Volume 3 of the Intel Architectures Software Developer's Manual, a number of events whose name or description includes "HitM":

  • L1D.ALL_M_REPLACEMENT (Event 0x51, Umask 0x08) counts dirty lines that are evicted from the L1 Data Cache either by "Snoop HitM" or by victim eviction of modified lines.
  • MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HITM (Event 0xD2, Umask 0x04) counts load uops whose data source was a dirty line in another core in the same package.

Many of the events using the OFFCORE_RESPONSE performance counter events are slightly different between the "client" and "server" Sandy Bridge processors, as described in Tables 19-14 and 19-15 of Volume 3 of the SW Developer's Manual.   For the "client" Sandy Bridge parts, it looks like the event OFFCORE_RESPONSE.ALL_RFO.LLC_HIT.HITM_OTHER_CORE_[01] is exactly what you want -- it counts every time a store misses in a core's cache and the cache line is found modified in another core's cache.  (These are always "local", since the "client" parts only support a single package per system.)   I have not tested this event, but if it works correctly, it should be exactly what you want.

The "client" parts also have some relevant uncore counters described in Table 19-16, but these are a little harder to use.

For the "server" Sandy Bridge parts, the preface to Table19-15 notes that a bypass needs to be disabled for the MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HITM  event to be accurate.  I think that this should be done automatically by VTune if this event is selected.   Intel only lists a subset of the possible OFFCORE_RESPONSE events in Tables 19-14 and 19-15.  The ones that are listed are quite likely to work, but other sub-events might also work.   The events are described in Section 18.8.5, which should be read in conjunction with the examples in Table 19-15, but this is not easy reading....

0 Kudos
Reply