"50 prog starts": - Page 4

Roman_Oderov · ‎10-29-2012

Hello everyone! I have to synchronize time between processors in a multicore system i.e. I have to calculate TSC differences of all processors relative to one of them. I tried rdtsc() but it returned TSC of the current processor. Is there any way to get TSC from the necessary processor? Or may be I can define processor id somewhere and use an appropriate time stamp counter value. Thanks in advance, Roman

SergeyKostrov · ‎11-13-2012

>>...In addition I can submit for consideration a .log-file, where 50 consequtive program starts are logged... These are very interesting results and let me analyze it. Thanks again, Roman!

SergeyKostrov · ‎11-14-2012

>>...where 50 consequtive program starts are logged... Roman, why do you need to launch so many applications? Could you explain, please?

Roman_Oderov · ‎11-14-2012

"50 prog starts": I meant to say that it was the test program in which I had a cycle consist of 50 iterations. I think, that it's interesting to know if the values change from time to time and how they differ from each other. the code:

SergeyKostrov · ‎11-14-2012

>>... it's interesting to know if the values change from time to time and how they differ from each other... What is next? How are you going to use it in a real life application? PS: Thanks for the modified sources and I'll take a look at it some time next week.

Roman_Oderov · ‎11-14-2012

I don't know yet... It's necessary to find out how function SetEvent ( Event_handle ) works at the low level, i.e. how many time it is spent for setting an event and for receiving the event by waiting threads. So it could help in more precise timing.

SergeyKostrov · ‎11-15-2012

>>...It's necessary to find out how function SetEvent ( Event_handle ) works at the low level, i.e. how many time it is spent for setting >>an event and for receiving the event by waiting threads... You could easily do that and just add a call to RDTSC before the call to 'SetEvent' function and save the value. Later you can calculate what the difference is for every thread since you have TSC values in g_iRDTSCValueThreadStart array. Don't forget that there is a call to 'WaitForSingleObject' as well: ... ::WaitForSingleObject( g_hEvent, INFINITE ); g_iRDTSCValueThreadStart[(*piThreadID)-1] = __rdtsc(); ...

Roman_Oderov · ‎11-15-2012

But we have biased tick values in g_iRDTSCValueThreadStart. So it's important to take into consideration differences in ticks on different processors that we can't rely on yet (because we need to make sure of it). Oh, I have another question... Why do we have such different Difference values in the log from one iteration to another? As I know processors couldn't desynchronize in little time... Maybe I'm wrong?

SergeyKostrov · ‎11-15-2012

>>...Why do we have such different Difference values in the log from one iteration to another?... I think this is because WIndows Task Scheduler decides when exactly some thread has to be started and maintains execution of the thread later at some times that could not be controlled. I saw your numbers and they vary from 38 nano seconds ( the best value ) to hundreds of nano seconds. So, that multithreading processing is absolutely unpredictable even if different threads are executed on different CPUs. It is clear for me that you need to do some kind of calibration at the beginning of processing based on differences of TSC values.

Bernard · ‎11-16-2012

>>> So, that multithreading processing is absolutely unpredictable even if different threads are executed on different CPUs>>> In order to prevent thread scheduling overhead it is possible to run code at DPC level on one CPU and put other CPU in some busy wait loop.

SergeyKostrov · ‎11-16-2012

Hi Iliya, >>In order to prevent thread scheduling overhead it is possible to run code at DPC level on one CPU and put other CPU in some >>busy wait loop. Could you provide as more as possible technical details on how to do it ( that is, prevent thread scheduling overhead ) at DPC ( Deferred Procedure Calls, right? ) on some CPU? It would be nice to see an example in C/C++ codes. However, I understand that DPCs are used in Kernel / Driver programming for WIndows platforms. Is that correct? Thanks in advance.

Bernard · ‎11-16-2012

>>>Could you provide as more as possible technical details on how to do it ( that is, prevent thread scheduling overhead ) at DPC ( Deferred Procedure Calls, right? ) on some CPU?>>> I will try to write simple device(do not have much experience with the drivers so bear with me:)) driver which will serve as an example.Thread scheduling won't be eliminated because driver code runs in arbitrary thread context, so DPC level code like scheduler and dispatcher will continue their work, but passive level scheduling will be preempted. My idea is to run driver at passive level elevate IRQL to DPC level do some work on CPU0 when CPU1 is put into busy-wait loop and do it vice-versa.In such a scenario passive level code(user threads) will be preempted until IRQL won't drop to passive level.

Bernard · ‎11-19-2012

>>>It would be nice to see an example in C/C++ codes. >>> I think it can be done in following way. -Raising the IRQL on current cpu. -DPC will be created and other cpu IRQL will be raised -Some work will be done on the first cpu and the time needed to complete it will be measured. -At the same time second cpu will spin in busy-wait loop(which could be implemented as a nop instruction). -When the first cpu completes it will be put in busy-wait loop and the second cpu will start executing some code. Following Kernel mode functions will be used: RaiseIRQL,LowerIRQL,KeGetCurrentIrql,KeGetCurrentProcessorNumber,KeNumberProcessors,KeSetTargetProcessorDpc.

SergeyKostrov · ‎11-21-2012

>>...Also, in about 2-3 weeks I'll be able to execute these tests on a new computer with a 3rd generation Intel CPU... Already in transition to a new 64-bit system with i7-3840QM CPU and I hope to complete all setups and installs by the end of November.

SergeyKostrov · ‎05-21-2013

This is an example of what I was able to get on Ivy Bridge system: ... Test-Case 4 - Retrieving RDTSC values for CPUs - 2 Threads 1 and 2 created RDTSC values ( in CPU clocks ): Iteration Thread 1 Thread 2 Difference 00 2167050799808 2167050799729 79 01 2167050799883 2167050799813 70 02 2167050799986 2167050799916 70 03 2167050800079 2167050800000 79 04 2167050800182 2167050800093 89 05 2167050800275 2167050800196 79 06 2167050800378 2167050800298 80 07 2167050800471 2167050800392 79 08 2167050800555 2167050800485 70 09 2167050800658 2167050800569 89 10 2167050800742 2167050800662 80 11 2167050800835 2167050800756 79 12 2167050800928 2167050800849 79 13 2167050801012 2167050800933 79 14 2167050801096 2167050801017 79 15 2167050801162 2167050801110 52 Statistics ( in clock cycles ): Thread 1 started at 2167050799342 Thread 2 started at 2167050799309 Difference 33 Thread 1 completed at 2167050801330 Thread 2 completed at 2167050801260 Difference 70 dwThreadAMPrev[0]: 255 ( Processing Error if 0 ) dwThreadAMPrev[1]: 255 ( Processing Error if 0 ) ... So, in that test case both threads were very synchronized and as you can see the differences in RDTSC values never exceeded 90 clock cycles with the best value 52 clock cycles. Note: Tested on a system with Intel Core i7-3840QM ( Ivy Bridge / 4 cores / 8 logical CPUs / ark.intel.com/compare/70846 )

Synchronizing Time Stamp Counter