Software Archive
Read-only legacy content
17060 Discussions

Latency of RDTSC and RDTSCP instructions on Intel CPUs

SergeyKostrov
Valued Contributor II
5,015 Views
*** Latency of RDTSC and RDTSCP instructions on Intel CPUs ***
0 Kudos
25 Replies
SergeyKostrov
Valued Contributor II
859 Views
[ An example of disassembled codes for a test with RDTSCP instruction - 64-bit ] ... 000000013F652A81 rdtscp 000000013F652A84 mov rbx, rax 000000013F652A87 rdtscp 000000013F652A8A rdtscp 000000013F652A8D rdtscp 000000013F652A90 rdtscp 000000013F652A93 rdtscp 000000013F652A96 rdtscp 000000013F652A99 rdtscp 000000013F652A9C rdtscp 000000013F652A9F rdtscp 000000013F652AA2 rdtscp 000000013F652AA5 sub rax, rbx ...
0 Kudos
SergeyKostrov
Valued Contributor II
859 Views
Note: In two examples of disassembled codes ( see Posts #21 and #22 ) EDX or RDX registers are Not saved to improve accuracy of measurements and it is possible that overflow of values in EAX and RAX GPRs could happen.
0 Kudos
Brian_W_1
Beginner
859 Views

Nice information.

Have you tried this experiment on v4 or v3 cpus?  In particular E5-2699 v3 and E5-2699 v4?

Is the test code available so I could do the test myself?

Thanks,

Brian

0 Kudos
SergeyKostrov
Valued Contributor II
859 Views
Thanks for the feedback, Brian. Here are my answers: >>Have you tried this experiment on v4 or v3 cpus? In particular E5-2699 v3 and E5-2699 v4? No. I don't have these systems around me. I wish I would do some R&D related to that subject for these Intel CPUs. >>Is the test code available so I could do the test myself? No. Test codes are integrated with a test subsystem of a ScaLib for BDP project and since this is Not an Open Source project even test codes can't be provided. I've expected that question actually and created a pseudo-codes example. Take a look at: Minimal Averaged Delta of Intel RDTSC and RDTSCP instructions . https://software.intel.com/en-us/forums/watercooler-catchall/topic/698641 In essence, it is very easy to implement your own test following pseudo-codes example in above mentioned thread. Also, there are examples of disassembled codes in the current thread. Another note is as follows. I've been doing high accuracy measurements of core parts of some algorithms during last a couple of months and I have a resolute opinion that overhead of a single RDTSC or RDTSCP instructions should Not be taken into account. This is because only time intervals are measured and in that case an access time, or an overhead of these two instructions, will be taken into account as soon as you calculate NumberOfClocks2 - NumberOfClocks1. But, if these two instructions are inside of another "bigger" C, or C++, or Assembler functions than an overhead of these functions could be taken into account. Do you see a difference in my statements? That is, ...should Not be taken into account... vs ...could be taken into account.... I also verified how a Time Interval Counter instruction needs to be used on Itanium and Itanium 2 CPUs and there was a similar statement, but very fuzzy, from an Intel Software Engineer.
0 Kudos
SergeyKostrov
Valued Contributor II
859 Views
>>Have you tried this experiment on v4 or v3 cpus? In particular E5-2699 v3 and E5-2699 v4? Here are results of my tests for Intel Xeon Phi Processor 7210: http://ark.intel.com/products/94033/Intel-Xeon-Phi-Processor-7210-16GB-1_30-GHz-64-core Intel Xeon Phi Processor 7210 ( 16GB, 1.30 GHz, 64 core ) Processor name : Intel(R) Xeon Phi(TM) 7210 Packages (sockets) : 1 Cores : 64 Processors (CPUs) : 256 Cores per package : 64 Threads per core : 4 [ Output for RDTSC instruction ] ... Access Time to TSC: 36.40 clock cycles Access Time to TSC: 37.70 clock cycles Access Time to TSC: 36.40 clock cycles Access Time to TSC: 36.40 clock cycles Access Time to TSC: 36.40 clock cycles Access Time to TSC: 36.40 clock cycles Access Time to TSC: 36.40 clock cycles Access Time to TSC: 36.40 clock cycles Access Time to TSC: 36.40 clock cycles Access Time to TSC: 36.40 clock cycles Access Time to TSC: 36.40 clock cycles Access Time to TSC: 37.70 clock cycles Access Time to TSC: 36.40 clock cycles Access Time to TSC: 36.40 clock cycles Access Time to TSC: 36.40 clock cycles Access Time to TSC: 36.40 clock cycles ...
0 Kudos
Reply