- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
*** Latency of RDTSC and RDTSCP instructions on Intel CPUs ***
Link Copied
- « Previous
-
- 1
- 2
- Next »
25 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ An example of disassembled codes for a test with RDTSCP instruction - 64-bit ]
...
000000013F652A81 rdtscp
000000013F652A84 mov rbx, rax
000000013F652A87 rdtscp
000000013F652A8A rdtscp
000000013F652A8D rdtscp
000000013F652A90 rdtscp
000000013F652A93 rdtscp
000000013F652A96 rdtscp
000000013F652A99 rdtscp
000000013F652A9C rdtscp
000000013F652A9F rdtscp
000000013F652AA2 rdtscp
000000013F652AA5 sub rax, rbx
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Note: In two examples of disassembled codes ( see Posts #21 and #22 ) EDX or RDX registers are Not saved to improve accuracy of measurements and it is possible that overflow of values in EAX and RAX GPRs could happen.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Nice information.
Have you tried this experiment on v4 or v3 cpus? In particular E5-2699 v3 and E5-2699 v4?
Is the test code available so I could do the test myself?
Thanks,
Brian
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the feedback, Brian. Here are my answers:
>>Have you tried this experiment on v4 or v3 cpus? In particular E5-2699 v3 and E5-2699 v4?
No. I don't have these systems around me. I wish I would do some R&D related to that subject for these Intel CPUs.
>>Is the test code available so I could do the test myself?
No. Test codes are integrated with a test subsystem of a ScaLib for BDP project and since this is Not an Open Source project even test codes can't be provided. I've expected that question actually and created a pseudo-codes example. Take a look at:
Minimal Averaged Delta of Intel RDTSC and RDTSCP instructions
.
https://software.intel.com/en-us/forums/watercooler-catchall/topic/698641
In essence, it is very easy to implement your own test following pseudo-codes example in above mentioned thread. Also, there are examples of disassembled codes in the current thread.
Another note is as follows.
I've been doing high accuracy measurements of core parts of some algorithms during last a couple of months and I have a resolute opinion that overhead of a single RDTSC or RDTSCP instructions should Not be taken into account. This is because only time intervals are measured and in that case an access time, or an overhead of these two instructions, will be taken into account as soon as you calculate NumberOfClocks2 - NumberOfClocks1.
But, if these two instructions are inside of another "bigger" C, or C++, or Assembler functions than an overhead of these functions could be taken into account.
Do you see a difference in my statements? That is, ...should Not be taken into account... vs ...could be taken into account....
I also verified how a Time Interval Counter instruction needs to be used on Itanium and Itanium 2 CPUs and there was a similar statement, but very fuzzy, from an Intel Software Engineer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>Have you tried this experiment on v4 or v3 cpus? In particular E5-2699 v3 and E5-2699 v4?
Here are results of my tests for Intel Xeon Phi Processor 7210:
http://ark.intel.com/products/94033/Intel-Xeon-Phi-Processor-7210-16GB-1_30-GHz-64-core
Intel Xeon Phi Processor 7210 ( 16GB, 1.30 GHz, 64 core )
Processor name : Intel(R) Xeon Phi(TM) 7210
Packages (sockets) : 1
Cores : 64
Processors (CPUs) : 256
Cores per package : 64
Threads per core : 4
[ Output for RDTSC instruction ]
...
Access Time to TSC: 36.40 clock cycles
Access Time to TSC: 37.70 clock cycles
Access Time to TSC: 36.40 clock cycles
Access Time to TSC: 36.40 clock cycles
Access Time to TSC: 36.40 clock cycles
Access Time to TSC: 36.40 clock cycles
Access Time to TSC: 36.40 clock cycles
Access Time to TSC: 36.40 clock cycles
Access Time to TSC: 36.40 clock cycles
Access Time to TSC: 36.40 clock cycles
Access Time to TSC: 36.40 clock cycles
Access Time to TSC: 37.70 clock cycles
Access Time to TSC: 36.40 clock cycles
Access Time to TSC: 36.40 clock cycles
Access Time to TSC: 36.40 clock cycles
Access Time to TSC: 36.40 clock cycles
...

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
-
- 1
- 2
- Next »