what is the difference between CYCLE_ACTIVITY:CYCLES_L1D_PENDING and CYCLE_ACTIVITY:STALLS_L1D_PENDING events for IVY-BRIDGE processors. Is it that STALLS indicate number of times executing stalled and CYCLES_ indicate the total time in cycles for the stalls?
CYCLE_ACTIVITY.CYCLES_L1D_PENDING increments in each processor cycle if there is at least one L1 Data Cache load miss outstanding.
CYCLE_ACTIVITY.STALLS_L1D_PENDING increments in each processor cycle if there is *both* at least one L1 Data Cache load miss outstanding *AND* no uops are dispatched to the execution ports.
The latter event is intended to help identify cases in which cache misses are the cause of the stall.
This should be understood as an *indication*, not as proof that the cache miss(es) actually caused the stall. (In an out-of-order processor there are too many ambiguous cases -- for example how do you assign "blame" when the processor is stalled for multiple reasons in the same cycle?
I don't think that I have tested this carefully yet, but I expect this event to systematically under-count. The problem is that the processor does not know if the data is in the cache, so it can "dispatch" memory load uops to the execution port(s) multiple times. If the data is not in the L1 Data Cache, the uop is rejected and retried later. The counters cannot distinguish between uops that are dispatched and complete vs uops that are dispatched and rejected, so this event will *not* count cycles in the latter category as stall cycles -- even though most of us would consider that to be a stall cycle for practical purposes.