Software Tuning, Performance Optimization & Platform Monitoring
Discussion around monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform monitoring

[Intel PCM] Reference Cycles and Cycles Lost Due to Cache Misses

Good afternoon.
I've been using the Intel PCM to analyze my software in a way that I could measure the number of CPU cycles consumed while executing and the number of CPU cycles spent while waiting for some event (such as IO operation or waiting for a reading/writing in the memory).  
Does the value "Reference Cycles", acquired through the call getRefCycles(), count exclusively the number of CPU cycles wasted by the CPU while waiting for the memory? And does getCycles() return the Executed and Waited cycles?
In addition, I'm interested in count the number of CPU cycles spent due to misses at the Cache Levels. For that, Intel provides the getCyclesLostDueL2CacheMisses() call. However, it returns a double value (i. e. Number of Cycles Lost Due to L2CacheMisses: 0.0652477). Does this represent a percentage of the Total CPU Cycles? 
Thanks in advance.
Anderson Venturini
0 Kudos
1 Reply
Hi Anderson,
The getRefCycles is not counting exclusively the number of CPU cycles wasted by the CPU while waiting for the memory. It just returns the reference clock cycles while clock signal on the core is running. The reference clock operates at a fixed frequency. This metric is computed from the CPU_CLK_UNHALTED.REF performance monitoring hardware event.
The getCycles returns the number of "used" cycles (halted cycles are not counted). This metric is computed from the CPU_CLK_UNHALTED.THREAD performance monitoring hardware event.
As for getCyclesLostDueL2CacheMisses: 0.065 should correspond to 6.5%. But note it is just a very rough static estimation. On a real system the number of cycles lost due to a cache miss may vary depending on location of memory (local/remote socket), memory speed, memory latency hiding when using Intel® Hyper-Threading technology, etc.
In case of I/O, usually operating systems deschedule waiting threads from the CPU, thus your program might not consume CPU cycles during waiting at all. To profile I/O waiting times you may try the "Locks&Waits" analysis in the Intel® VTune Amplifier XE 2011 profiler. The tool has also very detailed memory access analysis types for Intel processors.
Best regards,