Showing results for 
Search instead for 
Did you mean: 

RDTSC variability in Core 2 Quad

I am attempting to measure the overhead of use rdtsc to measure ticks. To do this I set up a loop to have two successive rdtsc calls and then subtract for the difference. I make a kernel module in C to perform this loop:

#define ITERS 500
unsigned long arr[ITERS];
unsigned long el, sl;
int i;
for (i = 0; i < ITERS; i++) {
asm ( "mov %%cr0, %%edi;"
"mov %%eax, %%esi;"
"mov %%cr0, %%edi;"
"mov %%cr0, %%edi;"
"mov %%eax, %%edi;"
"mov %%esi, %0;"
"mov %%edi, %1;"
"=m" (sl),
"=m" (el)
arr = el - sl;

I basically sandwich two rdtsc's as close together as possible. I add few instructions
to the mix:
three moves from cr0's : I use these to serialize, prevent reordering
an instruction to save the result of the first rdtsc

The majority of the time I get a consistent value (72 ticks for 1.6GHz Core 2 Quad E5310). However, I occasionally get 66 or 78 ticks.

I have two questions:
1) The tick counts always seem to be multiples of 6 ticks. I am assuming this is becuase ticks are measured at the front bus cycles (1066 MHz, Quad pumped => 266 MHz) and multipled by the front bus-to-core frequency ratio (which is 6 for this processor). Is this correct?

2) It seems for this simple loop that there should be no variations. However, there are frequent ly outliers in these measurements. Are there non-deterministic factors I am missing here?

Thanks for any help you can offer,
0 Kudos
1 Reply
Black Belt

Yes, all 64-bit Intel CPUs have used the multiplied FSB tick count method in rdtsc.
I don't know enough to explain the variation, but I'm not surprised. I usually time a much longer group of instructions, >= 0.1 microseconds, without bothering with serialization.
Does it make a difference if you set affinity to a single core?