- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello everyone!
I have to synchronize time between processors in a multicore system i.e. I have to calculate TSC differences of all processors relative to one of them.
I tried rdtsc() but it returned TSC of the current processor. Is there any way to get TSC from the necessary processor? Or may be I can define processor id somewhere and use an appropriate time stamp counter value.
Thanks in advance,
Roman
Link Copied
- « Previous
- Next »
76 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>...In addition I can submit for consideration a .log-file, where 50 consequtive program starts are logged...
These are very interesting results and let me analyze it. Thanks again, Roman!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>...where 50 consequtive program starts are logged...
Roman, why do you need to launch so many applications? Could you explain, please?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>... it's interesting to know if the values change from time to time and how they differ from each other...
What is next? How are you going to use it in a real life application?
PS: Thanks for the modified sources and I'll take a look at it some time next week.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't know yet... It's necessary to find out how function SetEvent ( Event_handle ) works at the low level, i.e. how many time it is spent for setting an event and for receiving the event by waiting threads. So it could help in more precise timing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>...It's necessary to find out how function SetEvent ( Event_handle ) works at the low level, i.e. how many time it is spent for setting
>>an event and for receiving the event by waiting threads...
You could easily do that and just add a call to RDTSC before the call to 'SetEvent' function and save the value. Later you can calculate what the difference is for every thread since you have TSC values in g_iRDTSCValueThreadStart array. Don't forget that there is a call to 'WaitForSingleObject' as well:
...
::WaitForSingleObject( g_hEvent, INFINITE );
g_iRDTSCValueThreadStart[(*piThreadID)-1] = __rdtsc();
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
But we have biased tick values in g_iRDTSCValueThreadStart. So it's important to take into consideration differences in ticks on different processors that we can't rely on yet (because we need to make sure of it).
Oh, I have another question... Why do we have such different Difference values in the log from one iteration to another? As I know processors couldn't desynchronize in little time... Maybe I'm wrong?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>...Why do we have such different Difference values in the log from one iteration to another?...
I think this is because WIndows Task Scheduler decides when exactly some thread has to be started and maintains execution of the thread later at some times that could not be controlled. I saw your numbers and they vary from 38 nano seconds ( the best value ) to hundreds of nano seconds. So, that multithreading processing is absolutely unpredictable even if different threads are executed on different CPUs. It is clear for me that you need to do some kind of calibration at the beginning of processing based on differences of TSC values.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>> So, that multithreading processing is absolutely unpredictable even if different threads are executed on different CPUs>>>
In order to prevent thread scheduling overhead it is possible to run code at DPC level on one CPU and put other CPU in some busy wait loop.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Iliya,
>>In order to prevent thread scheduling overhead it is possible to run code at DPC level on one CPU and put other CPU in some
>>busy wait loop.
Could you provide as more as possible technical details on how to do it ( that is, prevent thread scheduling overhead ) at DPC ( Deferred Procedure Calls, right? ) on some CPU?
It would be nice to see an example in C/C++ codes. However, I understand that DPCs are used in Kernel / Driver programming for WIndows platforms. Is that correct?
Thanks in advance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>Could you provide as more as possible technical details on how to do it ( that is, prevent thread scheduling overhead ) at DPC ( Deferred Procedure Calls, right? ) on some CPU?>>>
I will try to write simple device(do not have much experience with the drivers so bear with me:)) driver which will serve as an example.Thread scheduling won't be eliminated because driver code runs in arbitrary thread context, so DPC level code like scheduler and dispatcher will continue their work, but passive level scheduling will be preempted.
My idea is to run driver at passive level elevate IRQL to DPC level do some work on CPU0 when CPU1 is put into busy-wait loop and do it vice-versa.In such a scenario passive level code(user threads) will be preempted until IRQL won't drop to passive level.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>It would be nice to see an example in C/C++ codes. >>>
I think it can be done in following way.
-Raising the IRQL on current cpu.
-DPC will be created and other cpu IRQL will be raised
-Some work will be done on the first cpu and the time needed to complete it will be measured.
-At the same time second cpu will spin in busy-wait loop(which could be implemented as a nop instruction).
-When the first cpu completes it will be put in busy-wait loop and the second cpu will start executing some code.
Following Kernel mode functions will be used:
RaiseIRQL,LowerIRQL,KeGetCurrentIrql,KeGetCurrentProcessorNumber,KeNumberProcessors,KeSetTargetProcessorDpc.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>...Also, in about 2-3 weeks I'll be able to execute these tests on a new computer with a 3rd generation Intel CPU...
Already in transition to a new 64-bit system with i7-3840QM CPU and I hope to complete all setups and installs by the end of November.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is an example of what I was able to get on Ivy Bridge system:
...
Test-Case 4 - Retrieving RDTSC values for CPUs - 2
Threads 1 and 2 created
RDTSC values ( in CPU clocks ):
Iteration Thread 1 Thread 2 Difference
00 2167050799808 2167050799729 79
01 2167050799883 2167050799813 70
02 2167050799986 2167050799916 70
03 2167050800079 2167050800000 79
04 2167050800182 2167050800093 89
05 2167050800275 2167050800196 79
06 2167050800378 2167050800298 80
07 2167050800471 2167050800392 79
08 2167050800555 2167050800485 70
09 2167050800658 2167050800569 89
10 2167050800742 2167050800662 80
11 2167050800835 2167050800756 79
12 2167050800928 2167050800849 79
13 2167050801012 2167050800933 79
14 2167050801096 2167050801017 79
15 2167050801162 2167050801110 52
Statistics ( in clock cycles ):
Thread 1 started at 2167050799342
Thread 2 started at 2167050799309
Difference 33
Thread 1 completed at 2167050801330
Thread 2 completed at 2167050801260
Difference 70
dwThreadAMPrev[0]: 255 ( Processing Error if 0 )
dwThreadAMPrev[1]: 255 ( Processing Error if 0 )
...
So, in that test case both threads were very synchronized and as you can see the differences in RDTSC values never exceeded 90 clock cycles with the best value 52 clock cycles.
Note: Tested on a system with Intel Core i7-3840QM ( Ivy Bridge / 4 cores / 8 logical CPUs / ark.intel.com/compare/70846 )
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
- Next »