- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have written a simple example which differenciates between the apparent precision of system_clock, given by Count_Rate, and the actual precision available from the different values returned.
I have run this example on ifort Ver 11.1, which I have installed.
It shows that both CPU_TIME and SYSTEM_CLOCK have only 64 ticks per second, which is very poor precision available via the Fortran standard intrinsic routines.
Better precisoin is available ( see QueryPerformanceCounter) and should be provided in these intrinsic routines.
John
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've not understood explanations on why Windows system_clock could not approach the performance of QueryPerformance or omp_get_wtime. This situation requires us to write applications to switch timers between linux and Windows.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was not trying to say that CPU_TIME and SYSTEM_ClOCK are the same, but that both these fortran standard routines use timing sources that are updated every 1/64th of a second.
For use as a timing routine with the speed of modern processors, this is a very poor precision to provide. Comparing 64 Hz with 3GHz does not look right to me.
I'm not sure of what system routines support the fortran standard routines but a more accurate solution should be provided.
QueryPerformanceCounter could be a better source for SYSTEM_CLOCK.
GETProcessTimes is the best source I have found for CPU_TIME, although I do not understand why these are limited to 1/64 sec accuracy or if this 64Hztick rate can be varied.
The instruction XRDTSC is also a possibility.
Providing a more reliable timer using the fortran standard routines would be the preferred solution.
As for compatibility between Linux and Windows, I'm sure there are other differences between the two implementations.
I understand that a lot of the problem relates to what is available from Microsoft API, but having a fast and accurate timer should be simpler than what is provided.
John
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve,
I was hoping that someone might be able to provide some more information on the available timers.
For elapsed time, there are better timers than apparently used, such as QueryPerformanceCounter. I don't know of any drawbacks and would recommend it's use for System_Clock.
However for CPU time, GetProcessTimes is the best I know of. MSDN is a bit vague about if the clock rate can be varied from 64 hz.
Does anyone have experience of overcoming this limitation ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What are you trying to do?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ian,
"Best" relates to:
precision,
call time overhead and
side effects.
From past experience, the precision is the most significant where QueryPerformanceCounter effectively provides a high precision of about 10^7 cycles per second, based mainly on the call time overhead.
Other timers with a precision of only 64 hz I would consider poor.
My reading of MSDN is that GetProcessTimes might have a side effect of slowing things down if the clock rate was changed. Unfortunately I don't know how to change the rate, or if it can be done. I may have misread the MSDN documentation.
I have been asking if anyone has any knowledge of this. I was hoping that the Visual Fortran developers might have some knowledge of this.
It just find it surprising that the best CPU precision we can get is updated at 64 hz. It must be accumulated somewhere more frequently than this.
I'm not sure where Steve finds 10 ms (100 Hz). If you don't mean this as about 64 Hz, please let me know where you find this difference.
Again, if anyone knows how to get CPU time to a higher precision, I'd like to know.
John
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ian,
Thanks for your question. I use it for timing programs mostly, where precision is especilly important.
I have a shifted subspace eigen solver I obtained from SAP80, where I time two stages of the solution; the matrix reduction or load case itterations. I use thetiming to estimate the relative duration of each stage. Based on this time Iestimate the convergence time with or without a shifted reduction. With such a crude precision on the timers, my convergence strategy does not work well with ifort. It does with other compilers.
While ifort's SYSTEM_TIME might report a high precision with Count_Rate, the tick reality is much different.
I thought that managing the difference between elapsed time (System_Time)and processor time (CPU_Time) was going to be interesting on a multi processor PC with ifort, but I havn't managed to get there yet.
John
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would appreciate your assistance.
The improvement provided hopefullycould be demonstrated by including in the test program elapse.f95I included above.
[bash] SUBROUTINE ELAPSE_SECOND (ELAPSE) ! ! Returns the total elapsed time in seconds ! based on QueryPerformanceCounter ! This is the fastest and most accurate timing routine ! real*8, intent (out) :: elapse ! STDCALL QUERYPERFORMANCECOUNTER 'QueryPerformanceCounter' (REF):LOGICAL*4 STDCALL QUERYPERFORMANCEFREQUENCY 'QueryPerformanceFrequency' (REF):LOGICAL*4 ! real*8 :: freq = 1 logical*4 :: first = .true. integer*8 :: start = 0 integer*8 :: num logical*4 :: ll ! integer*4 :: lute ! ! Calibrate this time using QueryPerformanceFrequency if (first) then num = 0 ll = QueryPerformanceFrequency (num) freq = 1.0d0 / dble (num) start = 0 ll = QueryPerformanceCounter (start) first = .false. ! call get_echo_unit (lute) ! WRITE (lute,*) 'Elapsed time counter :',num,' ticks per second' end if ! num = 0 ll = QueryPerformanceCounter (num) elapse = dble (num-start) * freq return end [/bash]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Repeat Offender,
Thanks for your changes to the code. I could not find any reference to the type T_Large_Integer in the ifort help.
I have attached the updated elapse2.f95 program which demonstrates the relative precision of QueryPerformanceCounter (2.6 mHz) in comparison to the intrinsic System_Clock (60 Hz).
I think it ticks the boxes in relation to both precision and call time overhead. I don't know of any side affects when taking this option.
I would recommend this as a better timing solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
use ifwin, only: QueryPerformanceCounter, QueryPerformanceFrequency
use ifwinty, only: T_LARGE_INTEGER
works
Les
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was surprised to see that your machine/OS is one of those that doesn't use RDTSC as the basis for QueryPerformanceCounter. You can use RDTSC directly from ifort if you wish:
[fortran]module setup use ifwin implicit none private public initialize, rdtsc, cp_rdtsc integer, parameter :: code32(1) = [-1866256113] integer, parameter :: code64(3) = [-1052233457,155721954,-1869560880] interface function rdtsc() integer(8) rdtsc end function rdtsc end interface pointer (cp_rdtsc,rdtsc) contains subroutine initialize integer code(*) pointer (ap,code) integer, parameter :: nbits = bit_size(ap) ap = VirtualAlloc(NULL,12_HANDLE,MEM_COMMIT,PAGE_EXECUTE_READWRITE) if(nbits == 32) then code(1:size(code32)) = code32 else code(1:size(code64)) = code64 end if cp_rdtsc = ap end subroutine initialize end module setup program main use setup implicit none integer i call initialize do i = 1, 10 write(*,*) rdtsc() end do end program main [/fortran]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the info in Large_Integer.
Also,thanks for the ifort example of using RDTSC. The problem I have always had withRDTSC is I don't have ready access to the clock rate. For another compiler, I have written a wrapper when for the first time it is used on the PC, I time RDTSC for 10 seconds and then store the calculated clock rate in a file c:\prosser_speed.ini. I then read from the file on subsequent runs.For most (all?) recent processors, this has been the rated speed of the processor, although I do not know of a direct way to get RDTSC_RATE.
While the routines you have discussed are elapsed time counters, are you aware of any more precise ways of retrieving accumulated CPU time of a process at better than 64Hz?
As I indicated before, I have been using the elapsed time for selecting alternative solution approaches while the program is running. I have been contemplating understanding how this approach might be applied to parallel applications. I was wanting to monitor both elapsed and CPU time and see if I could make sense of a strategy based on both times. At 64 Hz, the CPU precision is not there for the test examples I am using.
John
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your elapsed time approach has the issue that on a desktop system it will be influenced by things such as the user moving the mouse (etc.) or other background operating system activities.
When you described the problem you were trying to solve my initial though was that there must be some easier way of getting a measure of the computation effort required by a particular step than timing it - an iteration counter or similar.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have only a minimal amount of experience timing multithreaded code and what I did in that case was RDTSC before starting the threads and again after all threads were done. Timing the progress of individual threads seems like it could get noisy as other processes and threads kick each other out of cache as they move from core to core.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The system elapsed time clock does offer some simplicity to the definition of the measure. Ian, as you have noted there can always be problems with other processes runing. Virus checkers have long been aproblem, as is svchost.exe.
I too have a minimal amount of experence in writing multi-threaded or parallel code. I've been reading about it for many years and ifort is my first chance to see how it can work. I have been trying to understand how effective it is and what are the side affects. For a long time my Fortran code has been a sequential approach. The vector instruction set has been a much easier implementation and easier to understand.
We'll keep trying to learn! Thanks for your assistance.
John
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ifwin.mod is a large file which I presume defines the calling interface for many API routines.
Is there documentation of the fortran calling protocols to use with these routines ? I am interested in the routines:
GetTickCount
GetProcessTimes
GetCurrentProcess
I would like to know if they are used as subroutines or functions and the type and kind of each argument.
My apologies it this is a trivial question, but I could not find this information in "C:\Program Files (x86)\Intel\Compiler\11.1\054\Documentation\en_US\compiler_f\main_for.chm", ( which I notice is the documentation for the previous version of ifort that I am using. I should install the VS update !)
John
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To find out about GetTickCount, for example, I would google <b>GetTickCount msdn</b> and the first hit gives me useful documentation including the C prototype and the fact that you link to it via Kernel32.lib. If you then open up %INCLUDE%\kernel32.f90 with a text editor you can search for GetTickCount and see how ifort writes its interface, nonstandard because ifort doesn't provide a STDCALL companion processor for f2003 interoperability.
Sort of a hacker's method I suppose, but it works well for me.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Repeat Offender wrote:
To find out about GetTickCount, for example, I would google <b>GetTickCount msdn</b> and the first hit gives me useful documentation including the C prototype and the fact that you link to it via Kernel32.lib. If you then open up %INCLUDE%\kernel32.f90 with a text editor you can search for GetTickCount and see how ifort writes its interface, nonstandard because ifort doesn't provide a STDCALL companion processor for f2003 interoperability.
Sort of a hacker's method I suppose, but it works well for me.
If you are interested in exact machine code implementation of GetTickCount function I would advise to use IDA Pro disassembler.I suppose that this function might indirectly(when acting as a wrapper) access RTC clock.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page