Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
357 Views

RDTSC() for measuring latency of an operation

Jump to solution
Hi,

I am trying to measure the latency of an operation by using rdtsc().
The problem I am facing is that the latency of that operation or number of cycles taken by that operation remains the same even when I change the frequency of the processor core from 3 Ghz to 2 Ghz. In other words there is no effect on output of rdtsc when I change the frequency
Can anyone please tell me why this is happening.

Thank You.
0 Kudos

Accepted Solutions
Highlighted
357 Views

Please see chapter 16.11 from Intel SDM vol 3A.Fora range of Intel CPU (please see in the document)and moving forward the TSC increments at a constant rate (there are two wayshow to set that rate). Therefore your results are the expected ones since the TSC will be incremented ata constant rate.
Anyway please note that rdtsc() is not serializing or ordered with other instructions. Tthereforeyou might end up measuringless or more that what you really want. You could try rdtscp() which is a serializing instruction.

View solution in original post

0 Kudos
8 Replies
Highlighted
Beginner
357 Views
RDTSC does count cycles. If the frequency is 2 GHz, a cycle lasts 0.5 ns, with 3 GHz itlasts 0.33 ns.

When you measurethe latency in cycles, as you do when using RDTSC, it must remain the same. To convert the result into seconds you need to divide the cycles through the clock frequency.
0 Kudos
Highlighted
Beginner
357 Views
Thank For your reply.
Actally I am trying to measure the latency of a DVFS switch. I check the rdtsc() before and after changing the frequency and subtract those 2 values to get the number of cycles. The problem is, the number of cycles do not change. I mean the rdtsc() is fixed for 3Ghz and does not change with frequency. I am not able to understand why this is happening.
0 Kudos
Highlighted
Black Belt
357 Views
rdtsc counts front side or QPI buss clock ticks, and multiplies those by the default multiplier (presumably in accordance with 3Ghz in your case). The last Intel CPU where it measured CPU clock ticks rather than buss clock time was Northwood (ia32 only). rdtsc latency is at least 6 CPU clock ticks, depending on CPU model. If you serialize so as to control exactly what you are measuring, you will be measuring a significant additional time for the serialization.
You should at least check the granularity you are seeing for rdtsc by writing a tight loop which does rdtsc repeatedly and finds the smallest time interval by which it changes reliably. According to the tests in the classic Livermore Fortran Kernel, the granularity is nearly 10^-7 sec (100 nsec) on most IA CPUs.
0 Kudos
Highlighted
Beginner
357 Views
I understand what you are saying. But I am just trying to measure the latency of a frequency DVFS switch by using rdtsc(). When I use sleep(1) and use rdtsc() to measure the cycles spent in 1 second, the reuslt is the same which I get at 3Ghz and 2 Ghz. The result which I get is around 2.97*10^9 cycles, which I would exepct for 3 GHz. But why this also shows up at 2 GHz I don't understand. For 2 GHZ shouldn't it be 2*10^9 cycles approx? Or is the rdtsc() always counts the cycles in terms of highest frequency of the system?
0 Kudos
Highlighted
Black Belt
357 Views
You should interpret it as measuring elapsed time in terms of FSB or QPI with the built-in nominal multiplier, e.g. the one in the cpuid text string, if any. If the events aren't affected by clock multiplier, the reported "cycles" should be the same.
0 Kudos
Highlighted
358 Views

Please see chapter 16.11 from Intel SDM vol 3A.Fora range of Intel CPU (please see in the document)and moving forward the TSC increments at a constant rate (there are two wayshow to set that rate). Therefore your results are the expected ones since the TSC will be incremented ata constant rate.
Anyway please note that rdtsc() is not serializing or ordered with other instructions. Tthereforeyou might end up measuringless or more that what you really want. You could try rdtscp() which is a serializing instruction.

View solution in original post

0 Kudos
Highlighted
Beginner
357 Views
Hello,

While reading that section, I came across section 16.11.1 which is about "Invariant TSC" in newer processors. How is this different from "Constant TSC" as per section 16.11 ?

Thanks...
0 Kudos
Highlighted
357 Views

invariant TSC (availablefrom Nehalem onwards) will count at aconstant rate no matter what state (P, C or T) is the CPU in.
Previous processors, while incrementing at a constant rate as well,will stop counting while in deep sleep states (for example C6 state).
I hope is clearer now.

0 Kudos