- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have written a simple example which differenciates between the apparent precision of system_clock, given by Count_Rate, and the actual precision available from the different values returned.
I have run this example on ifort Ver 11.1, which I have installed.
It shows that both CPU_TIME and SYSTEM_CLOCK have only 64 ticks per second, which is very poor precision available via the Fortran standard intrinsic routines.
Better precisoin is available ( see QueryPerformanceCounter) and should be provided in these intrinsic routines.
John
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry for creating a confusion.
I think that sleep() function called with arg == 0 can simulate such a behaviour when thread is stopped before quantum expires.One interesting question arises which is related to exact moment of quantum interval when the execution is postponed and how to control it programmatically.If new thread is created from within the main function thread and that new thread has priority rised to high and it is scheduled to run immediatly after creation so how (inside thread's function) or better when sleep(0) call will be executed in order for example to stop the execution after 1/2 of quantum expires.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have again been reviewed the information I have available on the accuracy of different timing routines for CPU or elapsed time. I have attached an updated set of fortran calls for 6 timing routines I have identified. I would recommend these as my best use of teh identified API routines. Any recomendations for improvement would be appreciated.
The timing test program has been improved to test each routine for about 5 seconds.
RDTSC requires an initialising routine to estimate the returned tick frequency which is the processor rate for my test machines.
I have identified 2 that are good for elapsed time : RDTSC and QueryPerformanceCounter
All other routines update their time value at 64 cycles per second.
It would be good if there was a more accurate CPU time routine, but I have not found it. I should see what OpenMP uses !
Again, I would recommend that SYSTEM_CLOCK should be fixed in ifort so that we can reliably use the Fortran intrinsic routine.
The following table summarises the performance of the 6 routines I have identified.
[plain]
Routine                 Ticks per  CPU cycles  Notes
                           second    per call 
RDTSC                    88514093          30  ticks at processor rate, accuracy limited by call rate 
QueryPerformanceCounter   2594669          47  possibly more robust than RDTSC 
GetTickCount                   64          14  fast, but poor precision 
system_clock                   64         325   
GetProcessTimes                64         386  poor precision but best identified for CPU 
CPU_Time                       64         387 
[/plain]
Ticks per second : is the number of unique time values returned in a second ( best accuracy that can be achieved )
CPU cycles per call : is the number of processor cycles per routine call ( call overhead )
John
( I am hoping the plain text preserves the courier font layout of the table)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you John great job.
@Sergey returning to your question I have simple multithreaded Win32 threads program which uses Sleep() function to terminate its currently running thread so such a action can simulate what I wrote in one of my previous post.So far I was unable and I do not know if it is possible to relinquish the cpu time at some point during the quantum interval.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sergey,
For elapsed time, RDTSC is the best for me as it takes 30 processor cycles and gives a high precision ( 88 million ticks per second, which is the call rate). While GetTickCount is faster to run ( only 14 processor cycles) it has very poor precision ( 64 ticks per second ) so it is not useful for reporting short elapsed time tests.
I have not tested the accuracy of these timers, over a short or long duration. For the types of testing I do, this is not as significant as there are many external distractions to the meaning of run times, such as other process interuptions. My aim has been to get an indication of relative elapsed times for different programming approaches.
Thats elapsed time, however when it comes to CPU time, the best has precision to only 1/64 second. I can not find anything with better precision.
When it comes to timing processes, and OpenMP coding, the elapsed time is what matters, while the CPU time to elapsed time ratio gives an indication of how many threads are effectively running simultaneously.
Unfortunately I have not achieved very good ratios for the OpenMP programs I have been developing. While I can get multiple threads to run, I am getting clashes in other areas. I'm being told cache clashes are my latest problem, so an efffective OpenMP solution, using ifort Ver 2011 is a way off.
John
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>While GetTickCount is faster to run ( only 14 processor cycles)>>>
Do you mean total time needed to execute this instruction from user mode stub through the switching to kernel mode?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There are a number of attributes of the timing routines I have investigated, including:
- How fast it runs: The number of processor cycles a call to this timing routines takes.
- How precise it is: How frequently the returned time measure is updated. This indicates how useful this timing can be for short duration events.
- How accurate it is: The accuracy of the reported time over a longer period. I have not concentrated on this aspect of performance.
My interest in how many processor cycles the call takes has not been concerned with what happens in the timing routine when it is called. Your discussion with Sergey about Kernel scheduler etc, which I understand is what is taking place in the timer routine, does not have a significant effect on the way I use these routines.
Over the last 20 years, processor rates have improved by over 1,000 times from 1 mhz to 3 ghz. Unfortunately the precision of some timers has not matched this improvement, to the extent that they now give poor performance for what program developers require of them.
The purpose of my post has been to:
- Highlight the poor performance of the standard Fortran intrinsics available in ifort,
- Identify there are better alternatives for SYSTEM_CLOCK, which I hope could be adopted into ifort, and
- Point out that I have not been able to locate a better routine for CPU_TIME.
I was hoping that someone in this Forum might know a suitable routine and be able to provide a simple fortran code example for ifort on how to use it. I remain hopeful someone might be able to help
John
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi John,
I am not questioning your findings I only asked it as a matter of interest.
Yes I agree with you than Fortran developer should not be concerned with internal implementation of some timing routine.It is not their task.The situation with the precision of system timers I think that low precision could be directly related in (some cases) to multimedia requirements of the modern OS and to system management(thread scheduling).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sergey Kostrov wrote:
Iliya,
I just looked at my sources and I found the following comment:
...
// Overhead of Sleep( 0 ): Debug~=1562 clocks / Release~=1525 clocks
...
So, it is clrear that CPU will do something during that period of time. Wouldn't be better to discuss all that C/C++ stuff in another thread in a different forum?
Yes that is true.I think that at the time of call to sleep function calling thread could be put immediately in standby state or it could run for some miniscule time period untill scheduling decision is made.What I have been able to understand that on multiprocessor system scheduler database is locked during finding the next runnable thread.So during the long processing time of sleep() database is locked and no other cpu can make scheduling decision.
If you are interested I can create new thread for this discussion,but which IDZ forum to choose for it?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I might be covering old ground here - but you mention your use of OMP. On ifort the implementation of OMP_GET_WTIME uses QueryPerformanceCounter.
There are differences in the requirements between SYSTEM_CLOCK and OMP_GET_WTIME in terms of their standard definitions - OMP_GET_WTIME is more relaxed in some ways (it is a thread specific wall time), so that might be part of the reason for the different implementation. (I see mention of system bugs on the QueryPerformanceCounter msdn page that would be problematic for SYSTEM_CLOCK.)
Further, Intel's docs ascribe a particular meaning to the zero SYSTEM_CLOCK time. I suspect if they were to change their implementation from using GetLocalTime to QueryPerformanceCounter they might have to lose that meaning. Not sure. If that was the case, that could annoy some users relying on the previously documented behaviour.
Again, this might have already been covered (or be obvious from your table) but CPU_TIME is implemented by calling GetProcessTimes and summing the user and kernel time. Given its definition I don't see how CPU_TIME could be implemented differently; then given the way the Windows scheduler works and the possibility for the program to have multiple threads on multiple processors, I think it is unrealistic to expect GetProcessTimes to have better precision than it does.
(The reason that GetTickCount is pretty snappy cycle wise is that the tick count is available in user space - no kernel mode transition there.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>(The reason that GetTickCount is pretty snappy cycle wise is that the tick count is available in user space - no kernel mode transition there.)>>>
Yes that's true.I have found a possible implementation of GetTickCount and this function accesses SharedUserData structure in its caller process address space hence the very fast execution time.I was simply confused by existence of KeGetTickCount which is used by drivers.
Thanks for va;uable information.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sergey Kostrov wrote:
>>...I can create new thread for this discussion,but which IDZ forum to choose for it?..
Since this is Not related to Intel software it would be nice to create in:
Watercooler Catchall
software.intel.com/en-us/forums/watercooler-catchall
threading forum is not necessarily restricted to Intel software if it concerns Intel platforms
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>Threading forum is not necessarily restricted to Intel software if it concerns Intel platforms>>>
Tim do you mean Threading Building Blocks forum?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
 
					
				
				
			
		
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page