Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.

Timing methods

zhangxiuxia
Beginner
2,978 Views
How to measure the precise clock of a piece of code ?

Alger fog use this following method
_rdpmc
{
cpuid
rdpmc
cpuid
}

first test overhead of these instruction by finding the minum time interval between two _rdpmc .

But I find it varies greately . Iested 1000 times. Most cases , the minumum overhead is 420 cyles.
While sometimes it is 188 cycle. It varies greately . What cause the difference ?

I use taskset -C to test this to reduce shift beteen process.
0 Kudos
21 Replies
drMikeT
New Contributor I
198 Views
Quoting drMikeT
...if the minimum duration is sought after then repeated trials gives you an idea of a minimal time. My question was can one
fight the non-determinancy by minimizing interference of unrelated events?

[SergeyK] This is a really hard task and it is almost impossible to achievein modern preemtive
operating systems.

In an ideal environmnet one uses one core runing nothing else except the code under test...

[SergeyK]Did you try to do some performance tests in MS-DOS operating system? :)

Best regards,
Sergey

Hi Sergey,

eliminating the non-determinancy from computing and also providing "hard" guarantees is the bread and butter of the Real-Time systems. When a system has to react within a predermined amount of time, as say when a pilot applies a control and the airplane actuates it, requries a system with predictable upper bounds in reaction time.

On a multicore (or any SMP) system one as a minimum needs to be able to

1) direct all h/w interrupts to specific CPU sets only

2) lock down memory that a particular task needs (i.e., no ad hoc paging activity) that includes all objects from libraries

3) utilize a "kernel dispatcher" which suports a RT scheduling class

I admit I am not sure how much a stock consummer platform can do for 1). I tend to believe that it is doable given the programmability of recent interrupt controller chips.

2) Is could be done implicitly by running through the short code segment to do all the page ins and the cache fetching. This part could get hairy if the code is not well behaved but it seems this is not the case with the OP's code.

Even if you cannot use a truly RT kernel (see https://rt.wiki.kernel.org/index.html), you could use the RT sched class which is available on most Linux systems. Again the thread in question should have the max priority and it should run to completion. This can be done by creating a Pthread (pthread_create) with "system scope" and using "pthread_setschedparam()" with the SCHED_FIFO policy.

The usual caveat on RT sched class is that a runaway pthrad in SCHED_FIFO ("run to completion") with the highest prioritywill never get preempted (especially when timer interrupts are not honorred there).

I am not claiming that this is not tedious or without risks. But one can minimize non-determinancy .... :)

Hence the question "how accurate do you really want to be?" and "How will I be using these measurements?"

regards --Michael

0 Kudos
Reply