Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

RESOURCE_STALLS:ANY vs UOPS_EXECUTED:STALL_CYCLES

niklascote
Beginner
667 Views

Hi, I'm currently working on a master thesis, in which I try to measure how memory-bound certain applications are.

I am interested to learn what the difference between RESOURCE_STALLS:ANY and UOPS_EXECUTED:STALL_CYCLES are, and how they can be used to measure memory-boundedness?

0 Kudos
1 Reply
McCalpinJohn
Honored Contributor III
648 Views

There have been many discussions of this topic over the years, but I don't know of any single authoritative location to look for answers....

Going back a few years to the Intel Sandy Bridge processor core, I wrote up a discussion at:

https://sites.utexas.edu/jdm4372/2014/06/04/counting-stall-cycles-on-the-intel-sandy-bridge-processor/

(The link to the Intel forum site in my blog post is broken, but this links works for the moment:

https://community.intel.com/t5/Software-Tuning-Performance/Counting-stall-cycles-on-Sandy-Bridge/m-p/927138)

For the specific topic of stalls associated with memory access latency, there are a some other helpful events:

  • CYCLE_ACTIVITY (Event 0xA3) has some Umask options that provide the ability to count cycles in which there is *both* an "execution stall" and a demand load miss outstanding at one of several levels of the cache hierarchy.
    • The event counts *correlation*, not *causation*, but for cache misses at the L2 or beyond it becomes more and more accurate to assume that these very long load stalls (>60 cycles) will exhaust the out-of-order resources and cause a processor execution stall.
    • "Execution stall" probably means cycles in which no uops are dispatched to execution units.  (Stalls can happen at many places in the pipeline -- memory-related stalls usually happen after the processor fills up the reorder buffer (or some other shared resource required for speculative execution) and cannot dispatch any more instructions until the memory reference completes.  

For the newest processors, there is a new "TOPDOWN.SLOTS" event (and fixed-function counter) that might be useful, but I have not played with it very much yet -- I thought I remembered reading a detailed description of the event, but can't find it right now....

0 Kudos
Reply