- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

using cpuid (for serializing) + rdtsc.

The code measure is a bare assembly loop which is doing only "mov $0x3,%%eax"

void * exemple_MOV_im(){

int i;

int r = 5;

int* pr = &r;

timeEx2=0;

asm volatile ("xorl %%eax,%%eax\\n\\t"

"cpuid\\n\\t"

"rdtsc\\n\\t"

"mov %%eax,%%esi\\n\\t"

"mov $0x0,%%ebx\\n\\t" /* loop counter */

"DEB: mov $0x3,%%eax\\n\\t"

"add $0x1,%%ebx\\n\\t"

"cmp $0x500,%%ebx\\n\\t"

"jle DEB;\\n\\t"

"xorl %%eax,%%eax\\n\\t"

"cpuid\\n\\t"

"rdtsc\\n\\t"

"subl %%esi, %%eax\\n\\t"

"mov %%eax, %[t2]\\n\\t"

: [t2] "=m" (timeEx2)

:

: "eax", "ebx" , "ecx", "edx", "esi");

printf("t2=%lu\\n",timeEx2);

}

int main(){

cpu_set_t set;

CPU_ZERO(&set);

CPU_SET(1,&set);

/* Declaration of a value to check the functions */

int retcode ;

/* Scheduling parameters */

int priomax ;

struct sched_param param;

retcode = mlockall(MCL_CURRENT | MCL_FUTURE);

if (retcode == -1) {

printf("mlockall a echoue\\n");

}

sched_setaffinity(0,1,&set);

priomax = sched_get_priority_max(SCHED_FIFO) ; /* Max priority */

param.sched_priority = priomax;

sched_setscheduler(0, SCHED_FIFO, ¶m);

for (i=0;i<20;++i) {

exemple_MOV_im();

}

return 0;

}

This is executed on a Linux ia32 (fedora 11 with kernel 2.6.30.9-96.fc11.i686.PAE)

When executing this I get :

t2=2066

t2=2040

t2=1997

t2=1998

t2=1997

t2=1989

t2=1997

t2=1998

t2=1998

t2=1997

t2=1997

t2=1998

t2=1998

t2=1997

t2=1997

t2=1998

t2=1997

t2=1997

t2=1997

t2=1997

Which is not fully repeatable but basically the pattern is the same:

between 2 and 3 maximum time execution and then almost non varying execution.

Could you tell me what would explain the "2 and 3 maximum time execution" ?

I would expect to have the first execution with maximum time (instruction & data cache load)

and then constant time execution.

Note that the "mlockall, sched_setaffinity,sched_setscheduler" calls should ensure

maximum isolation because the system was booted with "isolcpus=1" which

ensure no other process can go to the CPU 1.

CPU is Intel Core2 Quad CPU Q9550 @ 2.83GHz

same result (with different timing) on

Intel Xeon CPU X5472 @ 3.00GHz

Any advice ? Explanation?

Our goal would be to have "predictable performance" fro some "basic" code.

Predictable meaning which can stay within a known performance interval.

Link Copied

4 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

It's a good question.

Firsrly, I like your code to set CPU mask and lock pages. But instead of

sched_setaffinity(0,1,&set);

there should be more correct to use

sched_setaffinity(0,

**sizeof(set)**,&set);

As to instability on the first measurements, they are in my opinion, due to CPU instruction caches or pipe-line init.

But in general your code is run in OS environment and can be interupted by other devices/demons like timer, IO*, eth0* etc. Look at /proc/interrupts on Linux to see CPU-interupts.

So fluctuations in time-stamp-counters are possible on the same piece of code.

Therefore, I'd suggest using kind of statistical analisys of resuts to get predictable performance.

For example, I got the following on my machine:

t2=4711

t2=4683

t2=4683

t2=4683

t2=4683

t2=4683

t2=4683

t2=4683

t2=18123

t2=4683

t2=4683

t2=4683

t2=4683

t2=4676

t2=4683

t2=4676

t2=4683

t2=4683

t2=4683

t2=4683

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Sorry for the late answer but I didn't saw it before now.

(Is there a way to get mail notice of message follow-up?)

Thanks you for noticing my typo about the sched_affinity call.

Concerning the interrupt, I did do the test

on an isolated cpu (using isolcpus linux kernel option)

such that no can go to the isolated CPUs unless explicitely told by the user.

I thought the interrupt mask followed the same scheme

(and I did deactivate irqbalance) but this does not seem to be the case.

I'll redo my test with irq affinity properly set.

Concerning statistical analysis, I'm not interested in this approach because

I want to bound the worst case execution time.

Filtering out first numbers may be ok as soon as later numbers are "stable enough".

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Quoting aeric

*Thank you for your answer,*

Sorry for the late answer but I didn't saw it before now.

(Is there a way to get mail notice of message follow-up?)

Sorry for the late answer but I didn't saw it before now.

(Is there a way to get mail notice of message follow-up?)

There is a check box next to "Subscribed to this Thread" near the top of the page. If youclick onthis check box, you will get an email notification whenever someone is posting a new answer to this thread.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Quoting aeric

*Concerning statistical analysis, I'm not interested in this approach because*

I want to bound the worst case execution time.

Filtering out first numbers may be ok as soon as later numbers are "stable enough".

I want to bound the worst case execution time.

Filtering out first numbers may be ok as soon as later numbers are "stable enough".

Speaking about * statistical analysis *I meant to do something like as follows: skip first numbers and calculate average of the others

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page