Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.

Synchronizing Time Stamp Counter

Roman_Oderov
Beginner
4,493 Views
Hello everyone! I have to synchronize time between processors in a multicore system i.e. I have to calculate TSC differences of all processors relative to one of them. I tried rdtsc() but it returned TSC of the current processor. Is there any way to get TSC from the necessary processor? Or may be I can define processor id somewhere and use an appropriate time stamp counter value. Thanks in advance, Roman
0 Kudos
76 Replies
Roman_D_Intel
Employee
2,312 Views
Hi Roman, there is no IA instruction that directly returns TSC from the core that you can specify as a parameter. Operating systems usually implement various tricks executing rdtsc on all cores and using low-latency thread synchronization/spinning on signal variables to estimate differences between processor TSCs. Best regards, Roman
0 Kudos
SergeyKostrov
Valued Contributor II
2,312 Views
Hi everybody, >>[ Roman Oderov ]I have to synchronize time between processors in a multicore system i.e. I have to calculate TSC differences of all >>processors relative to one of them... >>... >>[ Roman Dementiev ] there is no IA instruction that directly returns TSC from the core that you can specify as a parameter... However, if you use a Windows OS there are a couple of Win32 API functions that could help you: - GetCurrentThread - SetThreadPriority - SetThreadAffinityMask - Sleep Here is what I would try: - [Step00] Let's say you have 2 CPUs ( CPU1 and CPU2 ) - [Step01] Declare a static / global 'Array' of two 64-bit values - [Step02] Initialize array values with 0 - [Step03] Create a new thread - [Step04] Set the thread priority to 'Normal' - [Step05] Set the thread affinity to CPU1 with SetThreadAffinityMask - [Step06] Call Sleep( 0 ) - [Step07] Set the thread priority to 'Time Critical' - [Step08] Use inline assembler and call RDTSC and store the value in 'Array[0]' - [Step09] Set the thread affinity to CPU2 with SetThreadAffinityMask - [Step10] Call Sleep( 0 ) - [Step11] Use inline assembler and call RDTSC and store the value in 'Array[1]' - [Step12] Calculate a difference between 'Array[0]' and 'Array[1]' Here are some additional notes: - an overhead for steps [Step08], [Step09], [Step10] and [Step11] has to be evaluated - it is very important to call Sleep( 0 ) after a call to SetThreadAffinityMask - do as many as possible tests and some average differences have to used but they should not exceed some accuracy threshold ( in nano-seconds ) defined in your specs Best regards, Sergey
0 Kudos
Roman_Oderov
Beginner
2,312 Views
Sergey Kostrov wrote:
[embed]

Hi everybody,

>>[ Roman Oderov ]I have to synchronize time between processors in a multicore system i.e. I have to calculate TSC differences of all
>>processors relative to one of them...
>>...
>>[ Roman Dementiev ] there is no IA instruction that directly returns TSC from the core that you can specify as a parameter...

However, if you use a Windows OS there are a couple of Win32 API functions that could help you:

- GetCurrentThread
- SetThreadPriority
- SetThreadAffinityMask
- Sleep

Here is what I would try:

- [Step00] Let's say you have 2 CPUs ( CPU1 and CPU2 )
- [Step01] Declare a static / global 'Array' of two 64-bit values
- [Step02] Initialize array values with 0
- [Step03] Create a new thread
- [Step04] Set the thread priority to 'Normal'
- [Step05] Set the thread affinity to CPU1 with SetThreadAffinityMask
- [Step06] Call Sleep( 0 )
- [Step07] Set the thread priority to 'Time Critical'
- [Step08] Use inline assembler and call RDTSC and store the value in 'Array[0]'
- [Step09] Set the thread affinity to CPU2 with SetThreadAffinityMask
- [Step10] Call Sleep( 0 )
- [Step11] Use inline assembler and call RDTSC and store the value in 'Array[1]'
- [Step12] Calculate a difference between 'Array[0]' and 'Array[1]'

Here are some additional notes:

- an overhead for steps [Step08], [Step09], [Step10] and [Step11] has to be evaluated
- it is very important to call Sleep( 0 ) after a call to SetThreadAffinityMask
- do as many as possible tests and some average differences have to used but they should not exceed some accuracy threshold ( in nano-seconds ) defined in your specs

Best regards,
Sergey

[/embed]
Sergey, thank you for your answer. I've chosen right this method in my project.
0 Kudos
Bernard
Valued Contributor I
2,312 Views
>>>- an overhead for steps [Step08], [Step09], [Step10] and [Step11] has to be evaluated>>> In such a unpredictable environment like Windows OS you have to pay attention also to time-critical kernel componenets that are managing context switching and scheduling these components will always postpone currently running thread.Moreover hardware interrupts and their ISR and DPC will run at higher priority than user-mode code.Would not be a better option to run timing code in kernel mode at DPC level as a dummy driver.You can also queue a DPC at another CPU so you can have some kind of "concurrency".
0 Kudos
SergeyKostrov
Valued Contributor II
2,312 Views
>>...Moreover hardware interrupts and their ISR and DPC... What is DPC?
0 Kudos
SergeyKostrov
Valued Contributor II
2,312 Views
>>However, if you use a Windows OS there are a couple of Win32 API functions that could help you: >> >>- GetCurrentThread >>- SetThreadPriority >>- SetThreadAffinityMask >>- Sleep In case of Non-Windows OS a similar set these functions has to be used.
0 Kudos
Bernard
Valued Contributor I
2,312 Views
>>>What is DPC?>>> In Windows kernel architecture DPC stands for "Deferred procedure calls".These are global system-wide procedure(kernel objects) scheduled to perform some action on behalf of driver'sISR routine at DPC interrupt level.
0 Kudos
SergeyKostrov
Valued Contributor II
2,312 Views
>>In Windows kernel architecture DPC stands for "Deferred procedure calls".These are global system-wide procedure(kernel objects) >>scheduled to perform some action on behalf of driver'sISR routine at DPC interrupt level. Thanks, Iliya.
0 Kudos
SergeyKostrov
Valued Contributor II
2,312 Views
Hi everybody, >>I have to synchronize time between processors in a multicore system i.e. I have to calculate TSC differences of all processors >>relative to one of them... I'll provide C/C++ sources for a test case. Here is a screenshot that demonstrates output: cpusswitchdemo.jpg
0 Kudos
Bernard
Valued Contributor I
2,312 Views
Hi Sergey! By looking at your console test picture I can see that RDTSC overhead is only 24.000 cpi and is very close to the result measured by Agner Fog. How many timing tests did you perform? I can spot small spike before the start of your testing loop could that be a CreateThread function overhead which includes also context switching penalty. If you are interested in more precise profilling and instruction break down timing analysis you can use XPERF tool and Kernrate tool will track the instruction pointer in kernel space and report the results.
0 Kudos
SergeyKostrov
Valued Contributor II
2,312 Views
>>I'll provide C/C++ sources for a test case... Attached.
0 Kudos
SergeyKostrov
Valued Contributor II
2,312 Views
This is an example of output when some error happened: ... Test-Case 2 - Switching CPUs at runtime Switched to CPU2 Switched to CPU1 Test-Case 3 - Retrieving RDTSC values for CPUs RDTSC for CPU1 : 10124080002908 RDTSC for CPU2 : 10124080010328 RDTSC Difference: 7420 ( RDTSC2 - RDTSC1 ) dwThreadAMPrev1 : 1 ( Processing Error if 0 ) dwThreadAMPrev2 : 0 ( Processing Error if 0 ) // <= It was a simple verification that error processing works ...
0 Kudos
SergeyKostrov
Valued Contributor II
2,312 Views
>>By looking at your console test picture I can see that RDTSC overhead is only 24.000 cpi and is very close to >>the result measured by Agner Fog. It is good to know that my number looks right. >>How many timing tests did you perform? 1000000 ( one million ) But, I also tested for 10000000 and 100000000: ... iNumOfIterations = 1000000; // iNumOfIterations = 10000000; // iNumOfIterations = 100000000; ... and results for RDTSC overhead were very consistent. >>I can spot small spike before the start of your testing loop could that be a CreateThread function overhead which includes >>also context switching penalty. I think this is related to some network transfers.
0 Kudos
SergeyKostrov
Valued Contributor II
2,312 Views
** A question to Roman Dementiev (Intel) ** Is there an Intel document that describes TSC related solutions / issues in a multi-core system? Best regards, Sergey
0 Kudos
Bernard
Valued Contributor I
2,312 Views
>>>It is good to know that my number looks right.>>> Good job:) >>>RDTSC Difference: 7420 ( RDTSC2 - RDTSC1 )>>> These results could be polluted by arbitrary context thread(even your thread) running driver's ISR and DPC routines and also some kernel mode time critical components could postpone your processing loop. In order to minimize this dependency run your tests(not RDTSC for-loop) 1e4 or 1e5 times and average the results.
0 Kudos
SergeyKostrov
Valued Contributor II
2,312 Views
Hi Iliya, >>>>RDTSC Difference: 7420 ( RDTSC2 - RDTSC1 ) >>>>... >>>>dwThreadAMPrev2 : 0 ( Processing Error if 0 ) // <= It was a simple verification that error processing works >>>>RDTSC Difference: 7420 ( RDTSC2 - RDTSC1 ) >> >>These results could be polluted by arbitrary context thread(even your thread)... '7420' is a wrong number anyway because in that case I tried to switch processing to a CPU that doesn't exists ( CPU #8 ). I simply wanted to see that error processing works.
0 Kudos
Bernard
Valued Contributor I
2,312 Views
>>>'7420' is a wrong number anyway because in that case I tried to switch processing to a CPU that doesn't exists ( CPU #8 ). I simply wanted to see that error processing works.>>> Misunderstood your post:)
0 Kudos
SergeyKostrov
Valued Contributor II
2,312 Views
I have not found anything in these Intel manuals that says a RDTSC value could be different for CPUs of some multi-core system. ** Intel(R) 64 and IA-32 Architectures Software Developer's Manual ** >> Volume 3A: System Programming Guide, Part 1 << ...Chapter 7. MULTIPLE-PROCESSOR MANAGEMENT ** Intel(R) 64 and IA-32 Architectures Software Developer's Manual ** >> Volume 3B: System Programming Guide, Part 2 << ...Chapter 18. DEBUGGING AND PERFORMANCE MONITORING ... ...18.17 COUNTING CLOCKS ... Time-stamp counter - Measures clock cycles in which the physical processor is not in deep sleep. These ticks cannot be measured on a logical-processor basis. ... ...18.17.3 Incrementing the Time-Stamp Counter
0 Kudos
Bernard
Valued Contributor I
2,312 Views
>>>Time-stamp counter - Measures clock cycles in which the physical processor is not in deep sleep. These ticks cannot be measured on a logical-processor basis.>>> So HT logical cores cannot be sampled with RDTSC instruction.I think that here is difference between logical HT cores with gp register and apic state and fully fledged cores with it own FPU and SIMD Vector units
0 Kudos
Roman_Oderov
Beginner
2,002 Views
But, if I got right, the invariant TSC in newer processors (17.13.1 in Vol.3) guarantees me TSC values been synchronized. Well, in older processors I can't still rely on TSC's of different cores without manual synchronization. Am I right?
0 Kudos
Reply