precision of CPU_Time and System_Clock

John_Campbell · ‎07-04-2012

There have been a number of comments about the precision of the standard timing routines available in ifort.
I have written a simple example which differenciates between the apparent precision of system_clock, given by Count_Rate, and the actual precision available from the different values returned.
I have run this example on ifort Ver 11.1, which I have installed.
It shows that both CPU_TIME and SYSTEM_CLOCK have only 64 ticks per second, which is very poor precision available via the Fortran standard intrinsic routines.
Better precisoin is available ( see QueryPerformanceCounter) and should be provided in these intrinsic routines.

John

TimP · ‎07-05-2012

cpu_time can't be compared against QueryPerformance, as the latter doesn't separate process time.
I've not understood explanations on why Windows system_clock could not approach the performance of QueryPerformance or omp_get_wtime. This situation requires us to write applications to switch timers between linux and Windows.

Steven_L_Intel1 · ‎07-05-2012

Windows updates the "time of day" clock every 10ms - this is what SYSTEM_CLOCK uses. Yes, Windows has higher precision timers but they have drawbacks for general use.

John_Campbell · ‎07-05-2012

Steve,

I was not trying to say that CPU_TIME and SYSTEM_ClOCK are the same, but that both these fortran standard routines use timing sources that are updated every 1/64th of a second.
For use as a timing routine with the speed of modern processors, this is a very poor precision to provide. Comparing 64 Hz with 3GHz does not look right to me.
I'm not sure of what system routines support the fortran standard routines but a more accurate solution should be provided.
QueryPerformanceCounter could be a better source for SYSTEM_CLOCK.
GETProcessTimes is the best source I have found for CPU_TIME, although I do not understand why these are limited to 1/64 sec accuracy or if this 64Hztick rate can be varied.
The instruction XRDTSC is also a possibility.
Providing a more reliable timer using the fortran standard routines would be the preferred solution.
As for compatibility between Linux and Windows, I'm sure there are other differences between the two implementations.

I understand that a lot of the problem relates to what is available from Microsoft API, but having a fast and accurate timer should be simpler than what is provided.

John

John_Campbell · ‎07-15-2012

Steve,

I was hoping that someone might be able to provide some more information on the available timers.
For elapsed time, there are better timers than apparently used, such as QueryPerformanceCounter. I don't know of any drawbacks and would recommend it's use for System_Clock.
However for CPU time, GetProcessTimes is the best I know of. MSDN is a bit vague about if the clock rate can be varied from 64 hz.
Does anyone have experience of overcoming this limitation ?

IanH · ‎07-16-2012

You keep saying "best", "better" etc, but doesn't that depend on what you are trying to do?

What are you trying to do?

John_Campbell · ‎07-16-2012

Ian,

"Best" relates to:
precision,
call time overhead and
side effects.

From past experience, the precision is the most significant where QueryPerformanceCounter effectively provides a high precision of about 10^7 cycles per second, based mainly on the call time overhead.
Other timers with a precision of only 64 hz I would consider poor.
My reading of MSDN is that GetProcessTimes might have a side effect of slowing things down if the clock rate was changed. Unfortunately I don't know how to change the rate, or if it can be done. I may have misread the MSDN documentation.
I have been asking if anyone has any knowledge of this. I was hoping that the Visual Fortran developers might have some knowledge of this.

It just find it surprising that the best CPU precision we can get is updated at 64 hz. It must be accumulated somewhere more frequently than this.

I'm not sure where Steve finds 10 ms (100 Hz). If you don't mean this as about 64 Hz, please let me know where you find this difference.

Again, if anyone knows how to get CPU time to a higher precision, I'd like to know.

John

IanH · ‎07-16-2012

Are you timing your program, or timestamping data, or...?

John_Campbell · ‎07-16-2012

Ian,

Thanks for your question. I use it for timing programs mostly, where precision is especilly important.
I have a shifted subspace eigen solver I obtained from SAP80, where I time two stages of the solution; the matrix reduction or load case itterations. I use thetiming to estimate the relative duration of each stage. Based on this time Iestimate the convergence time with or without a shifted reduction. With such a crude precision on the timers, my convergence strategy does not work well with ifort. It does with other compilers.
While ifort's SYSTEM_TIME might report a high precision with Count_Rate, the tick reality is much different.
I thought that managing the difference between elapsed time (System_Time)and processor time (CPU_Time) was going to be interesting on a multi processor PC with ifort, but I havn't managed to get there yet.

John

John_Campbell · ‎07-24-2012

To provide a more accurate elapsed time timer, could someone provide a conversion of the following subroutine so that it will compile and run using ifort. Itshould return a much more accurate elapsed time than System_Time, by using the API routine QueryPerformanceCounter.
I would appreciate your assistance.
The improvement provided hopefullycould be demonstrated by including in the test program elapse.f95I included above.
[bash] SUBROUTINE ELAPSE_SECOND (ELAPSE) ! ! Returns the total elapsed time in seconds ! based on QueryPerformanceCounter ! This is the fastest and most accurate timing routine ! real*8, intent (out) :: elapse ! STDCALL QUERYPERFORMANCECOUNTER 'QueryPerformanceCounter' (REF):LOGICAL*4 STDCALL QUERYPERFORMANCEFREQUENCY 'QueryPerformanceFrequency' (REF):LOGICAL*4 ! real*8 :: freq = 1 logical*4 :: first = .true. integer*8 :: start = 0 integer*8 :: num logical*4 :: ll ! integer*4 :: lute ! ! Calibrate this time using QueryPerformanceFrequency if (first) then num = 0 ll = QueryPerformanceFrequency (num) freq = 1.0d0 / dble (num) start = 0 ll = QueryPerformanceCounter (start) first = .false. ! call get_echo_unit (lute) ! WRITE (lute,*) 'Elapsed time counter :',num,' ticks per second' end if ! num = 0 ll = QueryPerformanceCounter (num) elapse = dble (num-start) * freq return end [/bash]

JVanB · ‎07-24-2012

[fortran] SUBROUTINE ELAPSE_SECOND (ELAPSE) use ifwin, only: T_LARGE_INTEGER,QueryPerformanceCounter, QueryPerformanceFrequency ! ! Returns the total elapsed time in seconds ! based on QueryPerformanceCounter ! This is the fastest and most accurate timing routine ! real*8, intent (out) :: elapse ! ! STDCALL QUERYPERFORMANCECOUNTER 'QueryPerformanceCounter' (REF):LOGICAL*4 ! STDCALL QUERYPERFORMANCEFREQUENCY 'QueryPerformanceFrequency' (REF):LOGICAL*4 ! real*8 :: freq = 1 logical*4 :: first = .true. integer*8 :: start = 0 integer*8 :: num logical*4 :: ll type(T_LARGE_INTEGER) :: arg ! integer*4 :: lute ! ! Calibrate this time using QueryPerformanceFrequency if (first) then num = 0 ll = QueryPerformanceFrequency (arg) num = transfer(arg,num) freq = 1.0d0 / dble (num) start = 0 ll = QueryPerformanceCounter (arg) start = transfer(arg,start) first = .false. ! call get_echo_unit (lute) ! WRITE (lute,*) 'Elapsed time counter :',num,' ticks per second' end if ! num = 0 ll = QueryPerformanceCounter (arg) num = transfer(arg,num) elapse = dble (num-start) * freq return end program MAIN__ real*8 elapse call elapse_second(elapse) write(*,*) elapse end program MAIN__ [/fortran]

John_Campbell · ‎07-25-2012

Repeat Offender,

Thanks for your changes to the code. I could not find any reference to the type T_Large_Integer in the ifort help.
I have attached the updated elapse2.f95 program which demonstrates the relative precision of QueryPerformanceCounter (2.6 mHz) in comparison to the intrinsic System_Clock (60 Hz).
I think it ticks the boxes in relation to both precision and call time overhead. I don't know of any side affects when taking this option.

I would recommend this as a better timing solution.

Les_Neilson · ‎07-25-2012

T_LARGE_INTEGER is in ifwinty - in my IVFv11 version, I doubt it has been moved.

use ifwin, only: QueryPerformanceCounter, QueryPerformanceFrequency

use ifwinty, only: T_LARGE_INTEGER

works

Les

JVanB · ‎07-25-2012

T_LARGE_INTEGER is ifortspeak for the LARGE_INTEGER union that QueryPerfomanceFrequency and QueryPerformanceCounter want as a reference argument. Except for big-endian systems where the companion C processor doesn't have a 64-bit integer type, it's the same as INTEGER*8.

I was surprised to see that your machine/OS is one of those that doesn't use RDTSC as the basis for QueryPerformanceCounter. You can use RDTSC directly from ifort if you wish:

[fortran]module setup use ifwin implicit none private public initialize, rdtsc, cp_rdtsc integer, parameter :: code32(1) = [-1866256113] integer, parameter :: code64(3) = [-1052233457,155721954,-1869560880] interface function rdtsc() integer(8) rdtsc end function rdtsc end interface pointer (cp_rdtsc,rdtsc) contains subroutine initialize integer code(*) pointer (ap,code) integer, parameter :: nbits = bit_size(ap) ap = VirtualAlloc(NULL,12_HANDLE,MEM_COMMIT,PAGE_EXECUTE_READWRITE) if(nbits == 32) then code(1:size(code32)) = code32 else code(1:size(code64)) = code64 end if cp_rdtsc = ap end subroutine initialize end module setup program main use setup implicit none integer i call initialize do i = 1, 10 write(*,*) rdtsc() end do end program main [/fortran]

John_Campbell · ‎07-25-2012

Repeat Offender,

Thanks for the info in Large_Integer.

Also,thanks for the ifort example of using RDTSC. The problem I have always had withRDTSC is I don't have ready access to the clock rate. For another compiler, I have written a wrapper when for the first time it is used on the PC, I time RDTSC for 10 seconds and then store the calculated clock rate in a file c:\prosser_speed.ini. I then read from the file on subsequent runs.For most (all?) recent processors, this has been the rated speed of the processor, although I do not know of a direct way to get RDTSC_RATE.

While the routines you have discussed are elapsed time counters, are you aware of any more precise ways of retrieving accumulated CPU time of a process at better than 64Hz?
As I indicated before, I have been using the elapsed time for selecting alternative solution approaches while the program is running. I have been contemplating understanding how this approach might be applied to parallel applications. I was wanting to monitor both elapsed and CPU time and see if I could make sense of a strategy based on both times. At 64 Hz, the CPU precision is not there for the test examples I am using.

John

IanH · ‎07-25-2012

My understanding is that the per thread CPU time counters are only updated at the tick rate that the scheduler uses (which is what's behind the ~60 Hz frequency that you are seeing for anything that has some sort of dependence on the scheduler). So as far as I know - no.

Your elapsed time approach has the issue that on a desktop system it will be influenced by things such as the user moving the mouse (etc.) or other background operating system activities.

When you described the problem you were trying to solve my initial though was that there must be some easier way of getting a measure of the computation effort required by a particular step than timing it - an iteration counter or similar.

JVanB · ‎07-25-2012

When I am tuning code to see what is the fastest, the units of time have little meaning to me because I just compare the number of clock cycles and from that determine whether the changes I have made were actually an improvement and if so, whether the improvement is worth the risk and effort of incorporating the changes into the working code. Always do a loop with a few measurements so that you can see the signal and the noise. Even so, sometimes the OS throttles the processor and messes up your measurement.

I have only a minimal amount of experience timing multithreaded code and what I did in that case was RDTSC before starting the threads and again after all threads were done. Timing the progress of individual threads seems like it could get noisy as other processes and threads kick each other out of cache as they move from core to core.

John_Campbell · ‎07-25-2012

I think I agree with you both, that it is easy to get confused about what you are trying to do and what the timed measure is saying. When the code becomes multi-threaded, you can't be sure which thread is being timed and what else might be happening in other threads.
The system elapsed time clock does offer some simplicity to the definition of the measure. Ian, as you have noted there can always be problems with other processes runing. Virus checkers have long been aproblem, as is svchost.exe.

I too have a minimal amount of experence in writing multi-threaded or parallel code. I've been reading about it for many years and ifort is my first chance to see how it can work. I have been trying to understand how effective it is and what are the side affects. For a long time my Fortran code has been a sequential approach. The vector instruction set has been a much easier implementation and easier to understand.

We'll keep trying to learn! Thanks for your assistance.

John

John_Campbell · ‎02-06-2013

ifwin.mod is a large file which I presume defines the calling interface for many API routines.

Is there documentation of the fortran calling protocols to use with these routines ? I am interested in the routines:
GetTickCount
GetProcessTimes
GetCurrentProcess

I would like to know if they are used as subroutines or functions and the type and kind of each argument.
My apologies it this is a trivial question, but I could not find this information in "C:\Program Files (x86)\Intel\Compiler\11.1\054\Documentation\en_US\compiler_f\main_for.chm", ( which I notice is the documentation for the previous version of ifort that I am using. I should install the VS update !)

John

JVanB · ‎02-06-2013

To find out about GetTickCount, for example, I would google <b>GetTickCount msdn</b> and the first hit gives me useful documentation including the C prototype and the fact that you link to it via Kernel32.lib. If you then open up %INCLUDE%\kernel32.f90 with a text editor you can search for GetTickCount and see how ifort writes its interface, nonstandard because ifort doesn't provide a STDCALL companion processor for f2003 interoperability.

Sort of a hacker's method I suppose, but it works well for me.

Bernard · ‎02-06-2013

Repeat Offender wrote:

To find out about GetTickCount, for example, I would google <b>GetTickCount msdn</b> and the first hit gives me useful documentation including the C prototype and the fact that you link to it via Kernel32.lib. If you then open up %INCLUDE%\kernel32.f90 with a text editor you can search for GetTickCount and see how ifort writes its interface, nonstandard because ifort doesn't provide a STDCALL companion processor for f2003 interoperability.

Sort of a hacker's method I suppose, but it works well for me.

If you are interested in exact machine code implementation of GetTickCount function I would advise to use IDA Pro disassembler.I suppose that this function might indirectly(when acting as a wrapper) access RTC clock.