- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm trying to measure the execution time bound of an inline assembly code,
using cpuid (for serializing) + rdtsc.
The code measure is a bare assembly loop which is doing only "mov $0x3,%%eax"
void * exemple_MOV_im(){
int i;
int r = 5;
int* pr = &r;
timeEx2=0;
asm volatile ("xorl %%eax,%%eax\\n\\t"
"cpuid\\n\\t"
"rdtsc\\n\\t"
"mov %%eax,%%esi\\n\\t"
"mov $0x0,%%ebx\\n\\t" /* loop counter */
"DEB: mov $0x3,%%eax\\n\\t"
"add $0x1,%%ebx\\n\\t"
"cmp $0x500,%%ebx\\n\\t"
"jle DEB;\\n\\t"
"xorl %%eax,%%eax\\n\\t"
"cpuid\\n\\t"
"rdtsc\\n\\t"
"subl %%esi, %%eax\\n\\t"
"mov %%eax, %[t2]\\n\\t"
: [t2] "=m" (timeEx2)
:
: "eax", "ebx" , "ecx", "edx", "esi");
printf("t2=%lu\\n",timeEx2);
}
int main(){
cpu_set_t set;
CPU_ZERO(&set);
CPU_SET(1,&set);
/* Declaration of a value to check the functions */
int retcode ;
/* Scheduling parameters */
int priomax ;
struct sched_param param;
retcode = mlockall(MCL_CURRENT | MCL_FUTURE);
if (retcode == -1) {
printf("mlockall a echoue\\n");
}
sched_setaffinity(0,1,&set);
priomax = sched_get_priority_max(SCHED_FIFO) ; /* Max priority */
param.sched_priority = priomax;
sched_setscheduler(0, SCHED_FIFO, ¶m);
for (i=0;i<20;++i) {
exemple_MOV_im();
}
return 0;
}
This is executed on a Linux ia32 (fedora 11 with kernel 2.6.30.9-96.fc11.i686.PAE)
When executing this I get :
t2=2066
t2=2040
t2=1997
t2=1998
t2=1997
t2=1989
t2=1997
t2=1998
t2=1998
t2=1997
t2=1997
t2=1998
t2=1998
t2=1997
t2=1997
t2=1998
t2=1997
t2=1997
t2=1997
t2=1997
Which is not fully repeatable but basically the pattern is the same:
between 2 and 3 maximum time execution and then almost non varying execution.
Could you tell me what would explain the "2 and 3 maximum time execution" ?
I would expect to have the first execution with maximum time (instruction & data cache load)
and then constant time execution.
Note that the "mlockall, sched_setaffinity,sched_setscheduler" calls should ensure
maximum isolation because the system was booted with "isolcpus=1" which
ensure no other process can go to the CPU 1.
CPU is Intel Core2 Quad CPU Q9550 @ 2.83GHz
same result (with different timing) on
Intel Xeon CPU X5472 @ 3.00GHz
Any advice ? Explanation?
Our goal would be to have "predictable performance" fro some "basic" code.
Predictable meaning which can stay within a known performance interval.
using cpuid (for serializing) + rdtsc.
The code measure is a bare assembly loop which is doing only "mov $0x3,%%eax"
void * exemple_MOV_im(){
int i;
int r = 5;
int* pr = &r;
timeEx2=0;
asm volatile ("xorl %%eax,%%eax\\n\\t"
"cpuid\\n\\t"
"rdtsc\\n\\t"
"mov %%eax,%%esi\\n\\t"
"mov $0x0,%%ebx\\n\\t" /* loop counter */
"DEB: mov $0x3,%%eax\\n\\t"
"add $0x1,%%ebx\\n\\t"
"cmp $0x500,%%ebx\\n\\t"
"jle DEB;\\n\\t"
"xorl %%eax,%%eax\\n\\t"
"cpuid\\n\\t"
"rdtsc\\n\\t"
"subl %%esi, %%eax\\n\\t"
"mov %%eax, %[t2]\\n\\t"
: [t2] "=m" (timeEx2)
:
: "eax", "ebx" , "ecx", "edx", "esi");
printf("t2=%lu\\n",timeEx2);
}
int main(){
cpu_set_t set;
CPU_ZERO(&set);
CPU_SET(1,&set);
/* Declaration of a value to check the functions */
int retcode ;
/* Scheduling parameters */
int priomax ;
struct sched_param param;
retcode = mlockall(MCL_CURRENT | MCL_FUTURE);
if (retcode == -1) {
printf("mlockall a echoue\\n");
}
sched_setaffinity(0,1,&set);
priomax = sched_get_priority_max(SCHED_FIFO) ; /* Max priority */
param.sched_priority = priomax;
sched_setscheduler(0, SCHED_FIFO, ¶m);
for (i=0;i<20;++i) {
exemple_MOV_im();
}
return 0;
}
This is executed on a Linux ia32 (fedora 11 with kernel 2.6.30.9-96.fc11.i686.PAE)
When executing this I get :
t2=2066
t2=2040
t2=1997
t2=1998
t2=1997
t2=1989
t2=1997
t2=1998
t2=1998
t2=1997
t2=1997
t2=1998
t2=1998
t2=1997
t2=1997
t2=1998
t2=1997
t2=1997
t2=1997
t2=1997
Which is not fully repeatable but basically the pattern is the same:
between 2 and 3 maximum time execution and then almost non varying execution.
Could you tell me what would explain the "2 and 3 maximum time execution" ?
I would expect to have the first execution with maximum time (instruction & data cache load)
and then constant time execution.
Note that the "mlockall, sched_setaffinity,sched_setscheduler" calls should ensure
maximum isolation because the system was booted with "isolcpus=1" which
ensure no other process can go to the CPU 1.
CPU is Intel Core2 Quad CPU Q9550 @ 2.83GHz
same result (with different timing) on
Intel Xeon CPU X5472 @ 3.00GHz
Any advice ? Explanation?
Our goal would be to have "predictable performance" fro some "basic" code.
Predictable meaning which can stay within a known performance interval.
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
It's a good question.
Firsrly, I like your code to set CPU mask and lock pages. But instead of
sched_setaffinity(0,1,&set);
there should be more correct to use
sched_setaffinity(0,sizeof(set),&set);
As to instability on the first measurements, they are in my opinion, due to CPU instruction caches or pipe-line init.
But in general your code is run in OS environment and can be interupted by other devices/demons like timer, IO*, eth0* etc. Look at /proc/interrupts on Linux to see CPU-interupts.
So fluctuations in time-stamp-counters are possible on the same piece of code.
Therefore, I'd suggest using kind of statistical analisys of resuts to get predictable performance.
For example, I got the following on my machine:
t2=4711
t2=4683
t2=4683
t2=4683
t2=4683
t2=4683
t2=4683
t2=4683
t2=18123
t2=4683
t2=4683
t2=4683
t2=4683
t2=4676
t2=4683
t2=4676
t2=4683
t2=4683
t2=4683
t2=4683
It's a good question.
Firsrly, I like your code to set CPU mask and lock pages. But instead of
sched_setaffinity(0,1,&set);
there should be more correct to use
sched_setaffinity(0,sizeof(set),&set);
As to instability on the first measurements, they are in my opinion, due to CPU instruction caches or pipe-line init.
But in general your code is run in OS environment and can be interupted by other devices/demons like timer, IO*, eth0* etc. Look at /proc/interrupts on Linux to see CPU-interupts.
So fluctuations in time-stamp-counters are possible on the same piece of code.
Therefore, I'd suggest using kind of statistical analisys of resuts to get predictable performance.
For example, I got the following on my machine:
t2=4711
t2=4683
t2=4683
t2=4683
t2=4683
t2=4683
t2=4683
t2=4683
t2=18123
t2=4683
t2=4683
t2=4683
t2=4683
t2=4676
t2=4683
t2=4676
t2=4683
t2=4683
t2=4683
t2=4683
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your answer,
Sorry for the late answer but I didn't saw it before now.
(Is there a way to get mail notice of message follow-up?)
Thanks you for noticing my typo about the sched_affinity call.
Concerning the interrupt, I did do the test
on an isolated cpu (using isolcpus linux kernel option)
such that no can go to the isolated CPUs unless explicitely told by the user.
I thought the interrupt mask followed the same scheme
(and I did deactivate irqbalance) but this does not seem to be the case.
I'll redo my test with irq affinity properly set.
Concerning statistical analysis, I'm not interested in this approach because
I want to bound the worst case execution time.
Filtering out first numbers may be ok as soon as later numbers are "stable enough".
Sorry for the late answer but I didn't saw it before now.
(Is there a way to get mail notice of message follow-up?)
Thanks you for noticing my typo about the sched_affinity call.
Concerning the interrupt, I did do the test
on an isolated cpu (using isolcpus linux kernel option)
such that no can go to the isolated CPUs unless explicitely told by the user.
I thought the interrupt mask followed the same scheme
(and I did deactivate irqbalance) but this does not seem to be the case.
I'll redo my test with irq affinity properly set.
Concerning statistical analysis, I'm not interested in this approach because
I want to bound the worst case execution time.
Filtering out first numbers may be ok as soon as later numbers are "stable enough".
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting aeric
Thank you for your answer,
Sorry for the late answer but I didn't saw it before now.
(Is there a way to get mail notice of message follow-up?)
Sorry for the late answer but I didn't saw it before now.
(Is there a way to get mail notice of message follow-up?)
There is a check box next to "Subscribed to this Thread" near the top of the page. If youclick onthis check box, you will get an email notification whenever someone is posting a new answer to this thread.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting aeric
Concerning statistical analysis, I'm not interested in this approach because
I want to bound the worst case execution time.
Filtering out first numbers may be ok as soon as later numbers are "stable enough".
I want to bound the worst case execution time.
Filtering out first numbers may be ok as soon as later numbers are "stable enough".
Speaking about statistical analysis I meant to do something like as follows: skip first numbers and calculate average of the others
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page