Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.
1114 Discussions

Variability in timing measure using RDTSC

aeric
Beginner
780 Views
I'm trying to measure the execution time bound of an inline assembly code,
using cpuid (for serializing) + rdtsc.

The code measure is a bare assembly loop which is doing only "mov $0x3,%%eax"


void * exemple_MOV_im(){

int i;
int r = 5;
int* pr = &r;
timeEx2=0;

asm volatile ("xorl %%eax,%%eax\\n\\t"
"cpuid\\n\\t"
"rdtsc\\n\\t"
"mov %%eax,%%esi\\n\\t"
"mov $0x0,%%ebx\\n\\t" /* loop counter */
"DEB: mov $0x3,%%eax\\n\\t"
"add $0x1,%%ebx\\n\\t"
"cmp $0x500,%%ebx\\n\\t"
"jle DEB;\\n\\t"
"xorl %%eax,%%eax\\n\\t"
"cpuid\\n\\t"
"rdtsc\\n\\t"
"subl %%esi, %%eax\\n\\t"
"mov %%eax, %[t2]\\n\\t"
: [t2] "=m" (timeEx2)
:
: "eax", "ebx" , "ecx", "edx", "esi");

printf("t2=%lu\\n",timeEx2);
}

int main(){

cpu_set_t set;
CPU_ZERO(&set);
CPU_SET(1,&set);

/* Declaration of a value to check the functions */
int retcode ;

/* Scheduling parameters */
int priomax ;
struct sched_param param;

retcode = mlockall(MCL_CURRENT | MCL_FUTURE);
if (retcode == -1) {
printf("mlockall a echoue\\n");
}

sched_setaffinity(0,1,&set);
priomax = sched_get_priority_max(SCHED_FIFO) ; /* Max priority */
param.sched_priority = priomax;
sched_setscheduler(0, SCHED_FIFO, &param);

for (i=0;i<20;++i) {
exemple_MOV_im();
}
return 0;
}

This is executed on a Linux ia32 (fedora 11 with kernel 2.6.30.9-96.fc11.i686.PAE)

When executing this I get :

t2=2066
t2=2040
t2=1997
t2=1998
t2=1997
t2=1989
t2=1997
t2=1998
t2=1998
t2=1997
t2=1997
t2=1998
t2=1998
t2=1997
t2=1997
t2=1998
t2=1997
t2=1997
t2=1997
t2=1997

Which is not fully repeatable but basically the pattern is the same:

between 2 and 3 maximum time execution and then almost non varying execution.
Could you tell me what would explain the "2 and 3 maximum time execution" ?

I would expect to have the first execution with maximum time (instruction & data cache load)
and then constant time execution.

Note that the "mlockall, sched_setaffinity,sched_setscheduler" calls should ensure
maximum isolation because the system was booted with "isolcpus=1" which
ensure no other process can go to the CPU 1.

CPU is Intel Core2 Quad CPU Q9550 @ 2.83GHz
same result (with different timing) on
Intel Xeon CPU X5472 @ 3.00GHz

Any advice ? Explanation?
Our goal would be to have "predictable performance" fro some "basic" code.
Predictable meaning which can stay within a known performance interval.
0 Kudos
4 Replies
barragan_villanueva_
Valued Contributor I
780 Views
Hi,

It's a good question.
Firsrly, I like your code to set CPU mask and lock pages. But instead of
sched_setaffinity(0,1,&set);
there should be more correct to use
sched_setaffinity(0,sizeof(set),&set);

As to instability on the first measurements, they are in my opinion, due to CPU instruction caches or pipe-line init.
But in general your code is run in OS environment and can be interupted by other devices/demons like timer, IO*, eth0* etc. Look at /proc/interrupts on Linux to see CPU-interupts.
So fluctuations in time-stamp-counters are possible on the same piece of code.
Therefore, I'd suggest using kind of statistical analisys of resuts to get predictable performance.

For example, I got the following on my machine:
t2=4711
t2=4683
t2=4683
t2=4683
t2=4683
t2=4683
t2=4683
t2=4683
t2=18123
t2=4683
t2=4683
t2=4683
t2=4683
t2=4676
t2=4683
t2=4676
t2=4683
t2=4683
t2=4683
t2=4683

0 Kudos
aeric
Beginner
780 Views
Thank you for your answer,

Sorry for the late answer but I didn't saw it before now.
(Is there a way to get mail notice of message follow-up?)

Thanks you for noticing my typo about the sched_affinity call.

Concerning the interrupt, I did do the test
on an isolated cpu (using isolcpus linux kernel option)
such that no can go to the isolated CPUs unless explicitely told by the user.

I thought the interrupt mask followed the same scheme
(and I did deactivate irqbalance) but this does not seem to be the case.
I'll redo my test with irq affinity properly set.

Concerning statistical analysis, I'm not interested in this approach because
I want to bound the worst case execution time.

Filtering out first numbers may be ok as soon as later numbers are "stable enough".
0 Kudos
Thomas_W_Intel
Employee
780 Views
Quoting aeric
Thank you for your answer,

Sorry for the late answer but I didn't saw it before now.
(Is there a way to get mail notice of message follow-up?)

There is a check box next to "Subscribed to this Thread" near the top of the page. If youclick onthis check box, you will get an email notification whenever someone is posting a new answer to this thread.
0 Kudos
barragan_villanueva_
Valued Contributor I
780 Views
Quoting aeric
Concerning statistical analysis, I'm not interested in this approach because
I want to bound the worst case execution time.

Filtering out first numbers may be ok as soon as later numbers are "stable enough".


Speaking about statistical analysis I meant to do something like as follows: skip first numbers and calculate average of the others

0 Kudos
Reply