I am using Vtunes' bandwidth profile to look at the fraction of time my software is waiting on any cache accesses on my HSW i7 processor. The CYCLE_ACTIVITY.CYCLES_NO_EXECUTE gives this time. When I try to break this down into fraction of time waiting on L1, L2, and L3+Mem, I am trying to use CYCLE_ACTIVITY.STALLS_L1D_PENDING, ...STALLS_L2_PENDING, and STALLS_LDM_PENDING. However, the sum of these three counts is > the CYCLES_NO_EXECUTE count always.
Can someone please clarify what other events are being counted in these counters which CYCLES_NO_EXECUTE doesn't count?
Some of the descriptions of this event are oversimplifications that can be confusing....
Other causes of dispatch stalls can include:
So in summary: