Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Roman_Oderov
Beginner
740 Views

Synchronizing Time Stamp Counter

Hello everyone! I have to synchronize time between processors in a multicore system i.e. I have to calculate TSC differences of all processors relative to one of them. I tried rdtsc() but it returned TSC of the current processor. Is there any way to get TSC from the necessary processor? Or may be I can define processor id somewhere and use an appropriate time stamp counter value. Thanks in advance, Roman
0 Kudos
76 Replies
Roman_D_Intel
Employee
468 Views

Hi Roman, there is no IA instruction that directly returns TSC from the core that you can specify as a parameter. Operating systems usually implement various tricks executing rdtsc on all cores and using low-latency thread synchronization/spinning on signal variables to estimate differences between processor TSCs. Best regards, Roman
SergeyKostrov
Valued Contributor II
468 Views

Hi everybody, >>[ Roman Oderov ]I have to synchronize time between processors in a multicore system i.e. I have to calculate TSC differences of all >>processors relative to one of them... >>... >>[ Roman Dementiev ] there is no IA instruction that directly returns TSC from the core that you can specify as a parameter... However, if you use a Windows OS there are a couple of Win32 API functions that could help you: - GetCurrentThread - SetThreadPriority - SetThreadAffinityMask - Sleep Here is what I would try: - [Step00] Let's say you have 2 CPUs ( CPU1 and CPU2 ) - [Step01] Declare a static / global 'Array' of two 64-bit values - [Step02] Initialize array values with 0 - [Step03] Create a new thread - [Step04] Set the thread priority to 'Normal' - [Step05] Set the thread affinity to CPU1 with SetThreadAffinityMask - [Step06] Call Sleep( 0 ) - [Step07] Set the thread priority to 'Time Critical' - [Step08] Use inline assembler and call RDTSC and store the value in 'Array[0]' - [Step09] Set the thread affinity to CPU2 with SetThreadAffinityMask - [Step10] Call Sleep( 0 ) - [Step11] Use inline assembler and call RDTSC and store the value in 'Array[1]' - [Step12] Calculate a difference between 'Array[0]' and 'Array[1]' Here are some additional notes: - an overhead for steps [Step08], [Step09], [Step10] and [Step11] has to be evaluated - it is very important to call Sleep( 0 ) after a call to SetThreadAffinityMask - do as many as possible tests and some average differences have to used but they should not exceed some accuracy threshold ( in nano-seconds ) defined in your specs Best regards, Sergey
Roman_Oderov
Beginner
468 Views

Sergey Kostrov wrote:
[embed]

Hi everybody,

>>[ Roman Oderov ]I have to synchronize time between processors in a multicore system i.e. I have to calculate TSC differences of all
>>processors relative to one of them...
>>...
>>[ Roman Dementiev ] there is no IA instruction that directly returns TSC from the core that you can specify as a parameter...

However, if you use a Windows OS there are a couple of Win32 API functions that could help you:

- GetCurrentThread
- SetThreadPriority
- SetThreadAffinityMask
- Sleep

Here is what I would try:

- [Step00] Let's say you have 2 CPUs ( CPU1 and CPU2 )
- [Step01] Declare a static / global 'Array' of two 64-bit values
- [Step02] Initialize array values with 0
- [Step03] Create a new thread
- [Step04] Set the thread priority to 'Normal'
- [Step05] Set the thread affinity to CPU1 with SetThreadAffinityMask
- [Step06] Call Sleep( 0 )
- [Step07] Set the thread priority to 'Time Critical'
- [Step08] Use inline assembler and call RDTSC and store the value in 'Array[0]'
- [Step09] Set the thread affinity to CPU2 with SetThreadAffinityMask
- [Step10] Call Sleep( 0 )
- [Step11] Use inline assembler and call RDTSC and store the value in 'Array[1]'
- [Step12] Calculate a difference between 'Array[0]' and 'Array[1]'

Here are some additional notes:

- an overhead for steps [Step08], [Step09], [Step10] and [Step11] has to be evaluated
- it is very important to call Sleep( 0 ) after a call to SetThreadAffinityMask
- do as many as possible tests and some average differences have to used but they should not exceed some accuracy threshold ( in nano-seconds ) defined in your specs

Best regards,
Sergey

[/embed]
Sergey, thank you for your answer. I've chosen right this method in my project.
Bernard
Black Belt
468 Views

>>>- an overhead for steps [Step08], [Step09], [Step10] and [Step11] has to be evaluated>>> In such a unpredictable environment like Windows OS you have to pay attention also to time-critical kernel componenets that are managing context switching and scheduling these components will always postpone currently running thread.Moreover hardware interrupts and their ISR and DPC will run at higher priority than user-mode code.Would not be a better option to run timing code in kernel mode at DPC level as a dummy driver.You can also queue a DPC at another CPU so you can have some kind of "concurrency".
SergeyKostrov
Valued Contributor II
468 Views

>>...Moreover hardware interrupts and their ISR and DPC... What is DPC?
SergeyKostrov
Valued Contributor II
468 Views

>>However, if you use a Windows OS there are a couple of Win32 API functions that could help you: >> >>- GetCurrentThread >>- SetThreadPriority >>- SetThreadAffinityMask >>- Sleep In case of Non-Windows OS a similar set these functions has to be used.
Bernard
Black Belt
468 Views

>>>What is DPC?>>> In Windows kernel architecture DPC stands for "Deferred procedure calls".These are global system-wide procedure(kernel objects) scheduled to perform some action on behalf of driver'sISR routine at DPC interrupt level.
SergeyKostrov
Valued Contributor II
468 Views

>>In Windows kernel architecture DPC stands for "Deferred procedure calls".These are global system-wide procedure(kernel objects) >>scheduled to perform some action on behalf of driver'sISR routine at DPC interrupt level. Thanks, Iliya.
SergeyKostrov
Valued Contributor II
468 Views

Hi everybody, >>I have to synchronize time between processors in a multicore system i.e. I have to calculate TSC differences of all processors >>relative to one of them... I'll provide C/C++ sources for a test case. Here is a screenshot that demonstrates output: cpusswitchdemo.jpg
Bernard
Black Belt
468 Views

Hi Sergey! By looking at your console test picture I can see that RDTSC overhead is only 24.000 cpi and is very close to the result measured by Agner Fog. How many timing tests did you perform? I can spot small spike before the start of your testing loop could that be a CreateThread function overhead which includes also context switching penalty. If you are interested in more precise profilling and instruction break down timing analysis you can use XPERF tool and Kernrate tool will track the instruction pointer in kernel space and report the results.
SergeyKostrov
Valued Contributor II
468 Views

>>I'll provide C/C++ sources for a test case... Attached.
SergeyKostrov
Valued Contributor II
468 Views

This is an example of output when some error happened: ... Test-Case 2 - Switching CPUs at runtime Switched to CPU2 Switched to CPU1 Test-Case 3 - Retrieving RDTSC values for CPUs RDTSC for CPU1 : 10124080002908 RDTSC for CPU2 : 10124080010328 RDTSC Difference: 7420 ( RDTSC2 - RDTSC1 ) dwThreadAMPrev1 : 1 ( Processing Error if 0 ) dwThreadAMPrev2 : 0 ( Processing Error if 0 ) // <= It was a simple verification that error processing works ...
SergeyKostrov
Valued Contributor II
468 Views

>>By looking at your console test picture I can see that RDTSC overhead is only 24.000 cpi and is very close to >>the result measured by Agner Fog. It is good to know that my number looks right. >>How many timing tests did you perform? 1000000 ( one million ) But, I also tested for 10000000 and 100000000: ... iNumOfIterations = 1000000; // iNumOfIterations = 10000000; // iNumOfIterations = 100000000; ... and results for RDTSC overhead were very consistent. >>I can spot small spike before the start of your testing loop could that be a CreateThread function overhead which includes >>also context switching penalty. I think this is related to some network transfers.
SergeyKostrov
Valued Contributor II
468 Views

** A question to Roman Dementiev (Intel) ** Is there an Intel document that describes TSC related solutions / issues in a multi-core system? Best regards, Sergey
Bernard
Black Belt
468 Views

>>>It is good to know that my number looks right.>>> Good job:) >>>RDTSC Difference: 7420 ( RDTSC2 - RDTSC1 )>>> These results could be polluted by arbitrary context thread(even your thread) running driver's ISR and DPC routines and also some kernel mode time critical components could postpone your processing loop. In order to minimize this dependency run your tests(not RDTSC for-loop) 1e4 or 1e5 times and average the results.
SergeyKostrov
Valued Contributor II
468 Views

Hi Iliya, >>>>RDTSC Difference: 7420 ( RDTSC2 - RDTSC1 ) >>>>... >>>>dwThreadAMPrev2 : 0 ( Processing Error if 0 ) // <= It was a simple verification that error processing works >>>>RDTSC Difference: 7420 ( RDTSC2 - RDTSC1 ) >> >>These results could be polluted by arbitrary context thread(even your thread)... '7420' is a wrong number anyway because in that case I tried to switch processing to a CPU that doesn't exists ( CPU #8 ). I simply wanted to see that error processing works.
Bernard
Black Belt
468 Views

>>>'7420' is a wrong number anyway because in that case I tried to switch processing to a CPU that doesn't exists ( CPU #8 ). I simply wanted to see that error processing works.>>> Misunderstood your post:)
SergeyKostrov
Valued Contributor II
468 Views

I have not found anything in these Intel manuals that says a RDTSC value could be different for CPUs of some multi-core system. ** Intel(R) 64 and IA-32 Architectures Software Developer's Manual ** >> Volume 3A: System Programming Guide, Part 1 << ...Chapter 7. MULTIPLE-PROCESSOR MANAGEMENT ** Intel(R) 64 and IA-32 Architectures Software Developer's Manual ** >> Volume 3B: System Programming Guide, Part 2 << ...Chapter 18. DEBUGGING AND PERFORMANCE MONITORING ... ...18.17 COUNTING CLOCKS ... Time-stamp counter - Measures clock cycles in which the physical processor is not in deep sleep. These ticks cannot be measured on a logical-processor basis. ... ...18.17.3 Incrementing the Time-Stamp Counter
Bernard
Black Belt
468 Views

>>>Time-stamp counter - Measures clock cycles in which the physical processor is not in deep sleep. These ticks cannot be measured on a logical-processor basis.>>> So HT logical cores cannot be sampled with RDTSC instruction.I think that here is difference between logical HT cores with gp register and apic state and fully fledged cores with it own FPU and SIMD Vector units
Roman_Oderov
Beginner
158 Views

But, if I got right, the invariant TSC in newer processors (17.13.1 in Vol.3) guarantees me TSC values been synchronized. Well, in older processors I can't still rely on TSC's of different cores without manual synchronization. Am I right?
Reply