- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						Hello everyone!
I have to synchronize time between processors in a multicore system i.e. I have to calculate TSC differences of all processors relative to one of them.
I tried rdtsc() but it returned TSC of the current processor. Is there any way to get TSC from the necessary processor? Or may be I can define processor id somewhere and use an appropriate time stamp counter value.
Thanks in advance,
Roman
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
Link Copied
- « Previous
- Next »
		76 Replies
	
		
		
			
			
			
					
	
			- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						>>...In addition I can submit for consideration a .log-file, where 50 consequtive program starts are logged...
These are very interesting results and let me analyze it. Thanks again, Roman!
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						>>...where 50 consequtive program starts are logged...
Roman, why do you need to launch so many applications? Could you explain, please?
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						>>... it's interesting to know if the values change from time to time and how they differ from each other...
What is next? How are you going to use it in a real life application?
PS: Thanks for the modified sources and I'll take a look at it some time next week.
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						I don't know yet... It's necessary to find out how function SetEvent ( Event_handle ) works at the low level, i.e. how many time it is spent for setting an event and for receiving the event by waiting threads. So it could help in more precise timing.
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						>>...It's necessary to find out how function SetEvent ( Event_handle ) works at the low level, i.e. how many time it is spent for setting
>>an event and for receiving the event by waiting threads...
You could easily do that and just add a call to RDTSC before the call to 'SetEvent' function and save the value. Later you can calculate what the difference is for every thread since you have TSC values in g_iRDTSCValueThreadStart array. Don't forget that there is a call to 'WaitForSingleObject' as well:
...
	::WaitForSingleObject( g_hEvent, INFINITE );
	g_iRDTSCValueThreadStart[(*piThreadID)-1] = __rdtsc();
...
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						But we have biased tick values in g_iRDTSCValueThreadStart. So it's important to take into consideration differences in ticks on different processors that we can't rely on yet (because we need to make sure of it).
Oh, I have another question... Why do we have such different Difference values in the log from one iteration to another? As I know processors couldn't desynchronize in little time... Maybe I'm wrong?
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						>>...Why do we have such different Difference values in the log from one iteration to another?...
I think this is because WIndows Task Scheduler decides when exactly some thread has to be started and maintains execution of the thread later at some times that could not be controlled. I saw your numbers and they vary from 38 nano seconds ( the best value ) to hundreds of nano seconds. So, that multithreading processing is absolutely unpredictable even if different threads are executed on different CPUs. It is clear for me that you need to do some kind of calibration at the beginning of processing based on differences of TSC values.
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						>>> So, that multithreading processing is absolutely unpredictable even if different threads are executed on different CPUs>>>
In order to prevent thread scheduling overhead it is possible to run code at DPC level on one CPU and put other CPU in some busy wait loop.
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						Hi Iliya,
>>In order to prevent thread scheduling overhead it is possible to run code at DPC level on one CPU and put other CPU in some
>>busy wait loop.
Could you provide as more as possible technical details on how to do it ( that is, prevent thread scheduling overhead ) at DPC ( Deferred Procedure Calls, right? ) on some CPU?
It would be nice to see an example in C/C++ codes. However, I understand that DPCs are used in Kernel / Driver programming for WIndows platforms. Is that correct?
Thanks in advance.
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						>>>Could you provide as more as possible technical details on how to do it ( that is, prevent thread scheduling overhead ) at DPC ( Deferred Procedure Calls, right? ) on some CPU?>>>
I will try to write simple device(do not have much experience with the drivers so bear with me:)) driver which will serve as an example.Thread scheduling won't be eliminated because driver code runs in arbitrary thread context, so DPC level code like scheduler and dispatcher will continue their work, but passive level scheduling will be preempted.
My idea is to run driver at passive level elevate IRQL to DPC level do some work on CPU0 when CPU1 is put into busy-wait loop and do it vice-versa.In such a scenario passive level code(user threads) will be preempted until IRQL won't drop to passive level.
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						>>>It would be nice to see an example in C/C++ codes. >>>
I think it can be done in following way.
-Raising the IRQL on current cpu.
-DPC will be created and other cpu IRQL will be raised
-Some work will be done on the first cpu and the time needed to complete it will be measured.
-At the same time second cpu will spin in busy-wait loop(which could be implemented as a nop instruction).
-When the first cpu completes it will be put in busy-wait loop and the second cpu will start executing some code.
Following Kernel mode functions will be used:
RaiseIRQL,LowerIRQL,KeGetCurrentIrql,KeGetCurrentProcessorNumber,KeNumberProcessors,KeSetTargetProcessorDpc.
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						>>...Also, in about 2-3 weeks I'll be able to execute these tests on a new computer with a 3rd generation Intel CPU...
Already in transition to a new 64-bit system with i7-3840QM CPU and I hope to complete all setups and installs by the end of November.
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						This is an example of what I was able to get on Ivy Bridge system:
...
Test-Case 4 - Retrieving RDTSC values for CPUs - 2
Threads 1 and 2 created
RDTSC values ( in CPU clocks ):
Iteration       Thread 1        Thread 2        Difference
        00      2167050799808   2167050799729   79
        01      2167050799883   2167050799813   70
        02      2167050799986   2167050799916   70
        03      2167050800079   2167050800000   79
        04      2167050800182   2167050800093   89
        05      2167050800275   2167050800196   79
        06      2167050800378   2167050800298   80
        07      2167050800471   2167050800392   79
        08      2167050800555   2167050800485   70
        09      2167050800658   2167050800569   89
        10      2167050800742   2167050800662   80
        11      2167050800835   2167050800756   79
        12      2167050800928   2167050800849   79
        13      2167050801012   2167050800933   79
        14      2167050801096   2167050801017   79
        15      2167050801162   2167050801110   52
Statistics ( in clock cycles ):
        Thread 1 started at   2167050799342
        Thread 2 started at   2167050799309
        Difference            33
        Thread 1 completed at 2167050801330
        Thread 2 completed at 2167050801260
        Difference            70
        dwThreadAMPrev[0]: 255 ( Processing Error if 0 )
        dwThreadAMPrev[1]: 255 ( Processing Error if 0 )
...
So, in that test case both threads were very synchronized and as you can see the differences in RDTSC values never exceeded 90 clock cycles with the best value 52 clock cycles.
Note: Tested on a system with Intel Core i7-3840QM ( Ivy Bridge / 4 cores / 8 logical CPUs / ark.intel.com/compare/70846 )
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
 
					
				
				
			
		
					
					Reply
					
						
	
		
				
				
				
					
						
					
				
					
				
				
				
				
			
			Topic Options
			
				
					
	
			
		
	- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
- Next »