Hi,
1. Are the events listed in table 19-10 in the ia32 manual(http://courses.cs.washington.edu/courses/cse451/15au/readings/ia32-3.pdf) supposed to be associated with each core of the processor?
2. Clearing the performance counter on some event is do-able from RING0 by simply clearing the corresponding PMC, right?
3. As for the CYCLE_ACTIVITY event, with event number A3H, what do CYCLE_ACTIVITY.STALLS_L2_PENDING and CYCLE_ACTIVITY.CYCLES_LDM_PENDING measure?
4. Why is it possible that I can get zero MEM_LOAD_UOPS_RETIRED.L1_MISS and zero MEM_LOAD_UOPS_RETIRED.L2_MISS but nonzero CYCLE_ACTIVITY.STALLS_L2_PENDING? And, why is it possible that CYCLE_ACTIVITY.STALLS_L1D_PENDING is zero but CYCLE_ACTIVITY.STALLS_L2_PENDING is not?
Thanks.
Min
Link Copied
Hi Dr. Bandwidth,
Your clarification is extremely helpful. I never expect any individual to provide so much insightful information within one response;-)
As for the question on clearing PMC, to clarify, once the PMC registers get erased, the counting will starts from zero when I bind the PMC to some events later on, right?
One potential mistake in your explanation on CYCLE_STALLS_LDM_PENDING is that, since its umask is 06H (CYCLES_LDM_PENDING AND CYCLES_NO_EXECUTE), we should set its Cmask to 06H to count the CYCLE_STALLS_LDM_PENDING. It seems that the Cmask of each of these CYCLE_STALLS_XXX_PENDING events should be set to the same value as its Umask value.
Thanks.
Min
Thanks for the catch -- I have updated my answer to show the correct CMASK of 6 (decimal) for CYCLE_STALLS_LDM_PENDING and added CYCLE_STALLS_L1D_PENDING (which uses a CMASK value of 12).
Once the PMC registers are cleared, they won't change unless the counter is enabled or another process writes a new value. Unfortunately there is not really any way to prevent other processes from using the counters, so I very seldom clear them -- I just leave them in "free-running" mode and take differences.
For more complete information about compiler optimizations, see our Optimization Notice.