Recently, I am using tools such as Vtune or likwid, But I get confused by event "CPU_CLK_UNHALTED.CORE" and " CPU_CLK_UNHALTED.REF". I have found official definition
"CPU_CLK_UNHALTED.CORE" : Core cycles when core is not halted
" CPU_CLK_UNHALTED.REF": Reference cycles when core is not halted.
What's the difference between core cycles and reference cycles?
Thank you for your help!
"CPU_CLK_UNHALTED.CORE" provides the numbers of cycles when the core was not in HALT state.
Here is an excerpet from the VTune documentation about the reference cycles
"Counts the number of base clock (133 Mhz) reference cycles that the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is not affected by thread frequency changes but counts as if the thread is running at the maximum frequency all the time."
So, if I want to calculate CPI (Clock ticks per instructions retired), should I be using:
Thanks for your help.
Which event to use depends on what you want to do. The CPU_CLK_UNHALTED.REF event counts (when the cpu isn't halted) clockticks at the frequency of the TSC (time stamp counter). So, if the cpu is not halted at all, the difference in CPU_CLK_UNHALTED.REF over an interval will be equal to difference in the TSC (as read with the rdtsc assembler instruction) over the same interval.
CPU_CLK_UNHALTED.CORE event counts (when the cpu isn't halted) clockticks at whatever frequency the cpu is running. So if the cpu is running at say 800 MHz in some low power mode over an interval of 1 second, the difference in CPU_CLK_UNHALTED.CORE would be 800e6 clockticks over the second. If the same cpu is in turbo mode at say 2.6 GHz over the 1 second, you would get about 2.6e9 clockticks over the 1 second. If the TSC runs at 2.0 GHz, CPU_CLK_UNHALTED.REF would give 2.0e9 over the same 1 second interval regardless of the actual frequency of the cpu.
So which should you use in CPI? Usually, as illyapolak says, folks use CPU_CLK_UNHALTED.CORE. But either one provides a (slightly different) insight into your workload.
Just to add to the confusion, the performance counter event that refers to "reference" cycles (Event 3Ch (CPU_CLK_UNHALTED) with umask 01h) appears to have changed its meaning across processor generations. In some cases it counts the reference clock cycles and in some cases it counts the reference clock cycles times the standard CPU frequency multiplier (sometimes called the "Maximum Non-Turbo Ratio" -- the same one used by the TSC for systems with "Invariant TSC" support -- see note 1 below).
- For the Intel Core and Intel Atom processors, Event 3C, Umask 01 is called CPU_CLK_UNHALTED.BUS and counts bus cycles.
- For the Intel Nehalem and Westmere processors, Event 3C, Umask 01 is called CPU_CLK_UNHALTED.REF_P and counts at the same rate as the TSC (i.e., reference clock times standard multiplier)
- For the Intel Sandy Bridge (and later) processors, Event 4C, Umask 01 is called CPU_CLK_UNHALTED.REF_XCLK and counts reference clock cycles (i.e., no multiplier). This event was described incorrectly in some prior versions of Vol 3 of the Intel SW Developer's Manual, but the description is correct now (Revision 047, June 2013).
In addition to this programmable performance counter event, the "fixed-function" performance counter (IA32_PERF_FIXED_CTR2) was added with the Core microarchitecture and is called CPU_CLK_UNHALTED.REF --- almost, but not quite the same. Fixed counter 2 is described as having a constant ratio with CPU_CLK_UNHALTED.BUS. On the systems that I have access to, this "constant ratio" is the standard CPU multiplier value, so this counter increments at the same rate as the TSC (but only when the processor is unhalted). But a footnote to Table 19-1 in Vol 3 of the SW developers guide says that this counted bus clocks on the Intel Core 2, Intel Core Duo and Intel Core Solo processors -- i.e, the "constant ratio" is one for those systems.
Perversely, chapters 18 and 19 of Volume 3 of the SW Developers Guide do not appear to tell the user how to access the fixed-function counters using the RDPMC instruction. I had to find that info in the description of the RDPMC instruction in Volume 2 of the SW Developer's Guide. The trick is to set the counter number to 2^30 plus 0, 1, or 2, to read the fixed-function counters 0, 1, 2.
For reference, the fixed-function counters are:
- FIXED_CTR0 "Inst_Retired.Any" <-- not identical to Event C0h, Umask 00h ("INST_RETIRED.ANY_P"), but the differences are not described
- FIXED_CTR1 "CPU_CKL_UNHALTED.CORE" <-- appears to be the same as Event 3Ch, Umask 00h
- FIXED_CTR2 "CPU_CLK_UNHALTED.REF" <-- either the same as Event 3Ch, Umask 01h, or 3C/01 mutliplied by the standard multiplier (depending on the platform)
Note 1: I can't find a list of all Intel processors that support the "Invariant TSC" feature. It is declared to be the standard architecture moving forward, but I don't know if there are exceptions (Atom? Xeon Phi?).