Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.
1094 Discussions

Measure the execution time using RDTSC

AHoyle
Beginner
2,633 Views

Hi

I have been trying to use RDTSC and RDTSCP to measure the execution time of some code under test. I found the article “How to Benchmark Code Execution Times on Intel® IA-32 and IA-64 Instruction Set Architectures”, September 2010 (www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf)

I have tried to implement the suggested test method and I have a few questions. The paper suggest the following steps:

                CPUID

                RDTSC

                mov edx, %0

                mov eax, %1

** Code under test **

               RDTSCP

                mov edx, %2

                mov eax, %3

                CPUID

 

I understand the CPUID instructions ensure full serialization to prevent the execution of other instructions crossing these instructions. The “mov” instructions copy the registers to appropriate location so the difference in timings measured can be calculated later. When I test these steps I get quite a bit of variation in the timings measured. If I replace the RDTSC with RDTSCP I get much better results. Which suggests the RDTSC instruction is not waiting for the first CPUID to complete before starting to read the TSC, is this correct?  

 

I understand that RDTSCP waits for the previous command to finish, hence replacing the RDTSC with a RDTSCP seems to work.

What is preventing the instructions from my code under test, moving between the first CPUID and RDTSCP instructions?

Does the RDTSCP prevent instructions moving before it?

And if is does do I need the first CPUID instruction?

 

My processor is an Intel Atom E3950

 

Sorry for such a long question.

Kindest Regards.

0 Kudos
1 Solution
McCalpinJohn
Honored Contributor III
2,612 Views

CPUID is a very heavy-weight instruction -- 200 cycles?  It does serialize everything, but at a very high cost.

 

This can be a complicated subject, but I think this writeup is still accurate:

https://sites.utexas.edu/jdm4372/2018/07/23/comments-on-timing-short-code-sections-on-intel-processors/

 

View solution in original post

0 Kudos
2 Replies
McCalpinJohn
Honored Contributor III
2,613 Views

CPUID is a very heavy-weight instruction -- 200 cycles?  It does serialize everything, but at a very high cost.

 

This can be a complicated subject, but I think this writeup is still accurate:

https://sites.utexas.edu/jdm4372/2018/07/23/comments-on-timing-short-code-sections-on-intel-processors/

 

0 Kudos
AHoyle
Beginner
2,538 Views

Hi John

Thank you for your response. I am not too worried by the duration of the CPUID instruction as I think it is outside of the measurement window, and the speed of the overall test is not that important.  Reading the writeup you suggested does show that this subject is much more complex than I had appreciated. I think I will add a LFENCE instruction after the first RDTSCP instruction to try to prevent my code under test executing before the RDTSCP has read the ‘start’ time. However, I don’t think I am going to have the time to study this problem in enough detail to fully understand it. So for now I will have to accept some variation in my test results. I just can’t justify spending much more time on this.

 

Thanks again for your help.

Alastair

0 Kudos
Reply