@jimdempsetatthecove, that - Page 2

dboggs · ‎11-15-2018

My program is a process simulation in which I need to post the result of each time step. The execution time of each step is negligible so I insert a delay, using SLEEPQQ, between each step. The required sleep time is typically .02 to .10 seconds. The results are unsatisfactory. Here's a short "do nothing" code that illustrates the problem:

PROGRAM PacingDemo
USE IFPORT
IMPLICIT NONE
INTEGER :: isec, isubcycle

PRINT *, 'Program PacingDemo'
PRINT *, 'Attempt to control the speed of execution...'
PRINT *, 'Press <Enter> to start...'
READ (*, *)
CALL SLEEPQQ (3000)
CALL BEEPQQ (1000, 500)
! Try to create 10 cycles paced at one per second
DO isec = 1, 10
DO isubcycle = 1, 100
CALL SLEEPQQ (10)
!PRINT *, ' Subcycle', isubcycle
END DO
PRINT *, 'Cycle', isec
END DO
CALL BEEPQQ (1000, 500)
END PROGRAM PacingDemo

I divide the time step into subcycles to get some variability in sleep periods in combination with the argument to SLEEPQQ, which needs milliseconds. To get a sleep time of 1 sec, for example, I can use 100 subcycles with a wait time of 10 milliseconds (as shown), or 1 subcycle with a wait time of 1000 ms, etc. Here's the results of various combinations:

No. subcycles, SLEEPQQ arg, measured elapsed time
1, 1000, 10.5 sec (5% error)
2, 500, 10.7 sec
5, 200, 10.9 sec (10% error)
10, 100, 11.4 sec
20, 50, 13.1 sec
50, 20, 16 sec (60% error)
100, 10, 17 sec
200, 5, 32 sec (320% error)
500, 2, 79 sec (790% error)
1000, 1, 156 sec (1500% error)

So, I would say that SLEEPQQ (n) only gives reasonable results for n > 1000 (one second), below which the accuracy seriously deteriorates, to the point of being ridiculous.

As an alternative I tried calling CPU_TIME (start_time) at the beginning of each step, then placing CPU_TIME (current_time) in an endless loop that exits only when current_time - start_time reaches the desired cycle time. The results were similar.

Surely there is a way to do this?

jimdempseyatthecove · ‎11-22-2018

Some versions of the Intel module libraries omitted the definition of TYPE T_LARGE_INTEGER. This definition should have been included when USE KERNEL32 is used alone, inserting USE IFWINTY in front of USE KERNEL32 should fix the problem, else define the type yourself.

Jim Dempsey

andrew_4619 · ‎11-22-2018

Kernel32 includes IFWINTY already. The definition I showed at #19 is cut/paste from IFWINTY. The problem is

DoneCounter = (PerformanceFrequency * N) / 1000 ! perform * before /

Wait_ms = int((DoneCounter - PerformanceCounter0) / 1000, selected_int_kind(Wait_ms))

Are invalid because because PerformanceFrequency and PerformanceCounter0 are not integers they are derived types. The definition in IFWINTY is one created by microsoft that was for systems that did not support 64bit integers. The easiest fix is to do some bit fiddling with var%highpart and var%lowpart to create an integer(8) variable....

andrew_4619 · ‎11-22-2018

Additional from MSDN, I guess Intel need to update this type in IFWINTY

"Remarks

The LARGE_INTEGER structure is actually a union. If your compiler has built-in support for 64-bit integers, use the QuadPart member to store the 64-bit integer. Otherwise, use the LowPart and HighPart members to store the 64-bit integer."

Steve_Lionel · ‎11-22-2018

The problem with updating the definition is that it would break existing code. It's easy enough to work around using TRANSFER, as I showed in post #8.

jimdempseyatthecove · ‎11-23-2018

I tend to use UNION with an integer(8) :: asInt64

Jim Dempsey

JVanB · ‎11-23-2018

I would simply use 64-bit integers and force the compiler to like it:

!DEC$ IF DEFINED (__INTEL_COMPILER)
!DEC$ ELSE
module ifwin
   use ISO_C_BINDING, only: BOOL => C_INT32_T, C_INT64_T
   implicit none
   private
   public BOOL

   type, bind(C), public:: T_LARGE_INTEGER
      integer(C_INT64_T) QuadPart
   end type T_LARGE_INTEGER

   public QueryPerformanceCounter
   interface
      function QueryPerformanceCounter(lpPerformanceCount) &
         bind(C,name='QueryPerformanceCounter')
         import
         implicit none
         integer(BOOL) QueryPerformanceCounter
!DEC$ ATTRIBUTES STDCALL :: QueryPerformanceCounter
!GCC$ ATTRIBUTES STDCALL :: QueryPerformanceCounter
         type(T_LARGE_INTEGER) lpPerformanceCount
      end function QueryPerformanceCounter
   end interface

   public QueryPerformanceFrequency
   interface
      function QueryPerformanceFrequency(lpFrequency) &
         bind(C,name='QueryPerformanceFrequency')
         import
         implicit none
!DEC$ ATTRIBUTES STDCALL :: QueryPerformanceFrequency
!GCC$ ATTRIBUTES STDCALL :: QueryPerformanceFrequency
         integer(BOOL) QueryPerformanceFrequency
         type(T_LARGE_INTEGER) lpFrequency
      end function QueryPerformanceFrequency
   end interface
end module ifwin
!DEC$ ENDIF

module M
   use ifwin, only: QueryPerformanceCounter, &
      QueryPerformanceFrequency, BOOL
   use ISO_C_BINDING, only: C_INT64_T
   implicit none
   private

   public QueryPerformanceCounter
   interface QueryPerformanceCounter
      procedure QueryPerformanceCounter
      function QueryPerformanceCounter64(lpPerformanceCount) &
         bind(C,name='QueryPerformanceCounter')
         import
         implicit none
!DEC$ ATTRIBUTES STDCALL :: QueryPerformanceCounter64
!GCC$ ATTRIBUTES STDCALL :: QueryPerformanceCounter64
         integer(BOOL) QueryPerformanceCounter64
         integer(C_INT64_T) lpPerformanceCount
      end function QueryPerformanceCounter64
   end interface QueryPerformanceCounter

   public QueryPerformanceFrequency
   interface QueryPerformanceFrequency
      procedure QueryPerformanceFrequency
      function QueryPerformanceFrequency64(lpFrequency) &
         bind(C,name='QueryPerformanceFrequency')
         import
         implicit none
!DEC$ ATTRIBUTES STDCALL :: QueryPerformanceFrequency64
!GCC$ ATTRIBUTES STDCALL :: QueryPerformanceFrequency64
         integer(BOOL) QueryPerformanceFrequency64
         integer(C_INT64_T) lpFrequency
      end function QueryPerformanceFrequency64
   end interface QueryPerformanceFrequency
end module M

program P
   use M
   use ISO_C_BINDING, only: C_INT64_T, C_DOUBLE
   use ifwin
   implicit none
   integer, parameter :: N = 10
   integer(C_INT64_T) Frequency
   integer(C_INT64_T) Time(0:N)
   real(C_DOUBLE) harvest(N)
   real(C_DOUBLE) total
   integer i
   integer(BOOL) result

   call random_seed
   call random_number(harvest)
   result = QueryPerformanceFrequency(Frequency)
   total = 0
   result = QueryPerformanceCounter(Time(0))
   do i = 1, N
      total = total+sqrt(harvest(i))
      result = QueryPerformanceCounter(Time(i))
   end do
   write(*,'(*(g0))') 'Total = ',total
   do i = 1, N
      write(*,'(*(g0))') 'Time(', i, ') = ', &
         (Time(i)-Time(i-1))/real(Frequency,C_DOUBLE)
   end do
end program P

Notes: This works in gfortran, but fails in ifort. Is this just because I have a really old version?

I thought the performance counter matched the CPU time stamp counter, but it's much slower, about 3.3 MHz on my system.

I still can't seem to change my avatar to a more current image. How can I do this?

Steve_Lionel · ‎11-23-2018

You need (for example):

!DEC$ ATTRIBUTES STDCALL,DECORATE,ALIAS:"QueryPerformanceCounter64" :: QueryPerformanceCounter64

Without the other two attributes, you get a downcased name. Of course, you didn't say HOW it failed, so I'm just latching onto the first thing I saw.

JVanB · ‎11-23-2018

Steve, I think you missed the BIND attribute for QueryPerformanceCounter64 because that part was wrapped around to the next line. That takes care of the DECORATE and ALIAS attributes already because in this case the compiler acts as if it were the companion C compiler with STDCALL enabled.

The bug seems to be due to the PROCEDURE statement in the interface body in question. Reproducer:

module M0
   implicit none
   type, bind(C) :: T1
      double precision x
   end type T1
   type, bind(C) :: T2
      double precision x
   end type T2
end module M0

function F(x) bind(C,name='F')
   use M0
   implicit none
   double precision F
   type(T1) x
   F = sqrt(x%x)
end function F

module M1
   use M0
   implicit none
   interface
      function F(x) bind(C,name='F')
         import
         implicit none
         double precision F
         type(T1) x
      end function F
   end interface
end module M1

module M2
   use M1
   implicit none
   interface F
      procedure F
      function G(x) bind(C,name='F')
         import
         implicit none
         double precision G
         type(T2) x
      end function G
   end interface F
end module M2

program P
   use M0
   use M1
   use M2
   implicit none
   write(*,*) F(T2(2.0d0))
end program P

First error message with ifort:

bug1.f90(37): error #6404: This name does not have a type, and must have an expl
icit type.   
      function G(x) bind(C,name='F')
---------------^

Steve_Lionel · ‎11-23-2018

RO, you are correct - I missed that.

dboggs · ‎11-24-2018

Many thanks to all who have responded. Some are beyond my understanding and I have to resist implementing them. Among the simpler ones, I have been studying the API WaitableTimer as suggested by Steve. This was quite easily done and it appeared to have very high resolution on the specified wait time. Here is the code I have been running:

PROGRAM TimerDemo
   
      IMPLICIT NONE
     ! Locals
      INTEGER  :: wait_time_msec
      INTEGER  :: clock_start, clock_end, clock_rate
      INTEGER  :: icycle, isub, nsub, ncalls
      REAL     :: wait_time_sec
      
      WRITE (*, *) 'Testing APIWaitableTimer, 10 seconds...'
      
      CALL SYSTEM_CLOCK (clock_start)
      CALL APIWaitableTimer (10000)
      CALL SYSTEM_CLOCK (Clock_end, clock_rate)
      wait_time_sec = REAL (clock_end - clock_start) / REAL (clock_rate)
      WRITE (*, *) 'Timer was signaled after ', wait_time_sec, ' seconds'
      PAUSE
      
!---- Set up a loop of 10 1-second cycles.
!     Subdivide each cycle into nsub events of length 1/nsub seconds to study
!     the timer accuracy for various intervals.
      nsub = 100
      wait_time_sec = 1. / nsub
      wait_time_msec = 1000 * wait_time_sec
      WRITE (*, '(A, I4, A, I4, A)') &
         ' Timing 10 cycles, each of ', nsub, ' events x ', &
         wait_time_msec, ' msec each'
      CALL SYSTEM_CLOCK (Clock_start, clock_rate)
      ncalls = 0
      DO icycle = 1, 10
         DO isub = 1, nsub
            ncalls = ncalls + 1
            CALL APIWaitableTimer (wait_time_msec)
         END DO ! next isub
         WRITE (*, *) 'Cycle ', icycle
      END DO ! next event
      
!---- Get the total wait time.
      CALL SYSTEM_CLOCK (Clock_end, clock_rate)
      wait_time_sec = REAL (clock_end - clock_start) / REAL (clock_rate)
      WRITE (*, *) 'Total time:', wait_time_sec, ' seconds'
      print *, 'The timer was called', ncalls, ' times'
      PAUSE
   END PROGRAM TimerDemo

   SUBROUTINE APIWaitableTimer (milliseconds)

!  Courtesy of Steve Lionel, IVF Fortran forum, 11/17/2018, and recast as a
!  subroutine. In response to my request for help:
!  "How to pace the speed of execution? SLEEPQQ is inaccurate."

      USE KERNEL32, only: HANDLE, T_LARGE_INTEGER, WAIT_OBJECT_0, TRUE, NULL, &
         CreateWaitableTimer, SetWaitableTimer, WaitForSingleObject, &
         INFINITE, GETLASTERROR
      IMPLICIT NONE
     ! Args
      INTEGER  :: milliseconds
     ! Locals
      INTEGER (HANDLE)  :: hTimer
      !!INTEGER  :: clock_start, clock_end, clock_rate
      REAL     :: wait_time
      TYPE (T_LARGE_INTEGER)  :: liDueTime ! 64-bit integer

!---- Body of Timer
      htimer = NULL
      !  Wait time in 100ns units (.1us, 10e-7 sec; negative for relative
      liDueTime = TRANSFER (-INT8(10000*milliseconds), T_LARGE_INTEGER(0,0))
      
!---- Create an unnamed waitable timer.
      htimer = CreateWaitableTimer (NULL, TRUE, NULL)
      IF (NULL == hTimer) THEN
         PRINT *, "CreateWaitableTimer failed with error=", GetLastError ()
         ERROR STOP
      END IF
      
      ! Set a timer to wait for designated milliseconds.
      ! Success results in nonzero return value.
      IF (SetWaitableTimer (hTimer, liDueTime, 0, NULL, NULL, 0) == 0) THEN
         print *, "SetWaitableTimer failed with error=", GetLastError ()
         ERROR STOP
      END IF
      
      ! Wait for the timer.
      IF (WaitForSingleObject (htimer, INFINITE) /= WAIT_OBJECT_0) THEN
         print *, 'WaitForSingleObject failed with error=', GetLastError ()
         ERROR STOP
      END IF
      
   END SUBROUTINE APIWaitableTimer

A sample output follows:

 Testing APIWaitableTimer, 10 seconds...
 Timer was signaled after    10.00300      seconds
Fortran Pause - Enter command<CR> or <CR> to continue.

 Timing 10 cycles, each of  100 events x   10 msec each
 Cycle            1
 Cycle            2
 Cycle            3
 Cycle            4
 Cycle            5
 Cycle            6
 Cycle            7
 Cycle            8
 Cycle            9
 Cycle           10
 Total time:   16.09300      seconds
 The timer was called        1000  times
Fortran Pause - Enter command<CR> or <CR> to continue.

Unfortunately as you can see, high resolution does not equate to accuracy. Calling the waitabletimer with a 10 sec delay produced an actual wait time of 10.003 seconds. Calling it 1000 times in a loop, with a requested wait time of 10 milliseconds each (more like what I really need to do) should result in a total time of 10 seconds, but the result was 16.1 seconds.

Having now run this test program a few times with requested wait times from 1000 msec to 1 msec, the performance is almost identical to that produced using SLEEPQQ, as described in my OP. Is this a clue to the problem? Are we getting anywhere? It's hard to believe that it can't be done.

jimdempseyatthecove · ‎11-25-2018

Please consider what was stated in post #3 and what was sketched in #16. If you want periodic intervals without time drift, then you must determine the amount of time spent between the entry points of your "sleep" method. IOW this is typically the compute time between sleeps, however this may additionally include preemption by other processes and/or paging by your application. You experienced 10.003 second interval, .003 was your "compute" overhead but bear in mind that some runtime between sleeps may exceed this. On Windows, the QueryPerformanceCounter essentially performs a _rdtsc (Read Timestamp Counter), and QueryPerformanceFrequency, tells you the number of ticks/second. For portable code, you might want to provide your own wrapper functions as target runtime platform may change (Windows, Linux, Intel, AMD, ARM, GKW). GKW == God Knows What

Jim Demspy

JVanB · ‎11-25-2018

@jimdempsetatthecove QueryPerformanceCounter no longer returns the value of the Time Stamp Counter because the performance frequency must stay constant even as the CPU changes speed. As I said in Quote #27, it works out to about 3.3 MHz on my PC.

@dboggs Try surrounding your timing code with

use ifwin, only: timeBeginPeriod, timeEndPeriod
!...
res32 = timeBeginPeriod(uPeriod=1)
! All your timing code goes here
res32 = timeEndPeriod(uPeriod=1)

That's supposed to set the granularity of the timer to 1 ms.

jimdempseyatthecove · ‎11-26-2018

RO,

From my understanding, earlier CPUs returned the clock tick for the core issuing the call. This was fine up until the Turbo Boost CPUs came out. Due to the issues with inconsistent timings, the source of the timestamp counter was changed to the Front Side Bus (FSB). I believe that unless your system is hibernating, the FSB remains at a constant frequency.

Here is a reference: https://stackoverflow.com/questions/3835111/whats-the-most-accurate-way-of-measuring-elapsed-time-in-a-modern-pc

Jim Dempsey

JVanB · ‎11-26-2018

@jimdempsetatthecove, that sure was an old reference: Jiin & Jisun were still a thing back then. Things have changed and there is no need to cite obsolete self-appointed authorities when one can just check:

module M
   use ISO_C_BINDING
   implicit none
   private
   integer, parameter, public :: BOOL = C_INT
   public QueryPerformanceFrequency
   interface
      function QueryPerformanceFrequency(lpFrequency) &
         bind(C,name='QueryPerformanceFrequency')
         import
         implicit none
         integer(BOOL) QueryPerformanceFrequency
!GCC$ ATTRIBUTES STDCALL:: QueryPerformanceFrequency
!DEC$ ATTRIBUTES STDCALL:: QueryPerformanceFrequency
         integer(C_INT64_T) lpFrequency
      end function QueryPerformanceFrequency
   end interface
end module M

program P
   use M
   use ISO_C_BINDING
   implicit none
   integer(BOOL) R
   integer(C_INT64_T) F
   R = QueryPerformanceFrequency(F)
   write(*,'(*(g0))') 'Frequency = ',F
end program P

Runs fine on both gfortran and ifort, with result of 3312643 Hz on my system. What does your system produce? You can invoke RDTSC from Fortran if you want; I posted code in the Linux forum at some point.

jimdempseyatthecove · ‎11-26-2018

RO,

Expanding on your code to compare QueryPerformanceCounter with _rdtsc:

module M
   use ISO_C_BINDING
   implicit none
   private
   integer, parameter, public :: BOOL = C_INT
   public QueryPerformanceFrequency
   public QueryPerformanceCounter
   public MyRDTSC
   
   interface
      function QueryPerformanceFrequency(lpFrequency) &
         bind(C,name='QueryPerformanceFrequency')
         import
         implicit none
         integer(BOOL) QueryPerformanceFrequency
!GCC$ ATTRIBUTES STDCALL:: QueryPerformanceFrequency
!DEC$ ATTRIBUTES STDCALL:: QueryPerformanceFrequency
         integer(C_INT64_T) lpFrequency
      end function QueryPerformanceFrequency
   end interface

      interface
      function QueryPerformanceCounter(lpCounter) &
         bind(C,name='QueryPerformanceCounter')
         import
         implicit none
         integer(BOOL) QueryPerformanceCounter
!GCC$ ATTRIBUTES STDCALL:: QueryPerformanceCounter
!DEC$ ATTRIBUTES STDCALL:: QueryPerformanceCounter
         integer(C_INT64_T) lpCounter
      end function QueryPerformanceCounter
      end interface

      interface
         function MyRDTSC() &
            bind(C,name='MyRDTSC')
            import
            implicit none
!**!GCC$ ATTRIBUTES STDCALL:: MyRDTSC
!**!DEC$ ATTRIBUTES STDCALL:: MyRDTSC
            integer(C_INT64_T) MyRDTSC
         end function MyRDTSC
   end interface
   
end module M

program P
   use M
   use ISO_C_BINDING
   implicit none
   integer(BOOL) R
   integer(C_INT64_T) F, tickQPC0,tickRDTSC0
   integer(C_INT64_T)    tickQPC1,tickRDTSC1
   R = QueryPerformanceFrequency(F)
   write(*,'(*(g0))') 'Frequency = ',F
   R = QueryPerformanceCounter(tickQPC0)
   tickRDTSC0 = MyRDTSC()
   call sleepqq(10000)
   R = QueryPerformanceCounter(tickQPC1)
   tickRDTSC1 = MyRDTSC()
   write(*,'(*(g0))') "tickQPC0 = ", tickQPC0
   write(*,'(*(g0))') "tickRDTSC0 = ",tickRDTSC0
   write(*,'(*(g0))') "tickQPC1 = ",tickQPC1
   write(*,'(*(g0))') "tickRDTSC1 = ",tickRDTSC1
   write(*,'(*(g0))') "ticks QPC = ",tickQPC1 - tickQPC0
   write(*,'(*(g0))') "ticks RDTSC = ",tickRDTSC1 - tickRDTSC0
   write(*,'(*(g0))') "ratio: ticks RDTSC / ticks QPC = ",(tickRDTSC1 - tickRDTSC0) / (tickQPC1 - tickQPC0)
   
end program P

// MyRDTSC
#include "stdafx.h"
extern "C" {
	__int64 MyRDTSC(void) { return __rdtsc(); }
}

Frequency = 3331250
tickQPC0 = 1721967783669
tickRDTSC0 = 1763294632544247
tickQPC1 = 1722001096041
tickRDTSC1 = 1763328744413420
ticks QPC = 33312372
ticks RDTSC = 34111869173
ratio: ticks RDTSC / ticks QPC = 1024

It seems like the performance counter ticks once every 1024 ticks of the TSC.

Jim Dempsey

dboggs · ‎11-27-2018

Wow. So much great information being exchanged here, its difficult to keep up.

Jim Dempsey, response to #32:

In post #3, the "amount of time the code is not waiting" is negligible, at least in the demo program(s) I have posted. They don't really do anything, and removing the SLEEPQQ statements results in program completion in the blink of an eye (.01 seconds)--way too fast to measure, and forget about removing this time from the sleep time.

"...as sketched in #16"--yes I tried this, but as reported in #18 I have never been able to compile it. Despite the many suiggestions (e.g. adding IFWINTY but it didn't help), I don't believe it will lead to solution of my problem so I have abandoned it.

Paul Curtis #12: the "actual delay time" was originally measured using a simple stopwatch while watching the screen. Since the process is designed to take 10 seconds, and the observed time was much longer, this simple technique is relatively accurate. Since then I have taken to measuring it using SYSTEM_CLOCK (see #31) but the results are nearly identical.

The good news is that after conjuring some of the many suggestions, I have come up with a very simple solution to my problem. Details in the next post...

dboggs · ‎11-27-2018

As demonstrated in the original OP, I have come to realize that the "error" in SLEEPQQ is set by the very limited resolution available from Windows, say 0.01 seconds. The overall error gets worse and worse as the number of calls increases. If the "error" had a zero mean value (some +0.01, some -0.01, some 0.00) or is "unbiased" these would tend to cancel so the overall effect should not be far off. But this does not appear to be the case--the error seems to always be ~0.01 sec. As I increase the no. of sub-events, resulting in CALL SLEEPQQ up to 1000 times (as in the example in the OP), the accumulated error becomes ~10 seconds--approximately what was observed. In the extreme reported case (1000 subcycles, 10,000 calls to SLEEPQQ), the error becomes ~100 seconds (compared to the observed value of 156 seconds--not really too bad).

The solution in my case should have been obvious: schedule the total elapsed time of each event, and wait to post the event until the overall elapsed time reaches that scheduled time. This way, the (biased) "error" only applies to the posting of each event, instead of the accumulation of all event errors up to that point. Here is my revised demo program:

PROGRAM TimerDemo3
   
      IMPLICIT NONE
     ! Locals
      INTEGER  :: wait_time_msec
      INTEGER  :: clock_start, clock_end, clock_rate, clock_now
      INTEGER  :: sec_elapsed, msec_schedule, msec_elapsed
      INTEGER  :: icycle, isub, nsub, nClockCalls
      REAL     :: wait_time_sec
      
      WRITE (*, *) 'Pacing execution according to elapsed time from start'
      WRITE (*, *) 'based on SYSTEM_CLOCK'
      
!---- Set up a loop of 10 1-second cycles.
!     Subdivide each cycle into nsub events of length 1/nsub seconds to study
!     the timer accuracy for various intervals.
      nsub = 100
      
      ! Duration of sub events:
      ! will be used to schedule all future events,
      ! NOT used to set a wait time for individual events. 
      wait_time_sec = 1. / nsub
      wait_time_msec = 1000 * wait_time_sec
      WRITE (*, '(A, I4, A, I4, A)') &
         ' Timing 10 cycles, each of ', nsub, ' events x ', &
         wait_time_msec, ' msec each'
      CALL SYSTEM_CLOCK (Clock_start, clock_rate)
      nClockCalls = 0
      DO icycle = 1, 10
         DO isub = 1, nsub
            !  :
            !  Do event calculations here
            !  :
            ! Elapsed time from start for completion of this sub event
            msec_schedule = icycle * isub * wait_time_msec
            ! Poll system clock until scheduled time arrives.
            DO
               !CALL SLEEPQQ (1) ! optional minimal delay to minimize busy work
                                 ! NOT RECOMMENDED
               CALL SYSTEM_CLOCK (clock_now)
               nClockCalls = nClockCalls + 1
               sec_elapsed = REAL (clock_now - clock_start) / REAL (clock_rate)
               msec_elapsed = 1000 * sec_elapsed
               IF (msec_elapsed >= msec_schedule) EXIT
            END DO
            ! Event was completed on schedule
            ! Post sub event results (not required in this demo)
         END DO ! next isub
         WRITE (*, *) 'Cycle ', icycle
      END DO ! next cycle
      
!---- Get the total wait time.
      CALL SYSTEM_CLOCK (Clock_end, clock_rate)
      wait_time_sec = REAL (clock_end - clock_start) / REAL (clock_rate)
      WRITE (*, *) 'Total time:', wait_time_sec, ' seconds'
      print *, 'The system clock was polled', nClockCalls, ' times'
      PAUSE
   END PROGRAM TimerDemo3

Here is an example run output:

 Pacing execution according to elapsed time from start
 based on SYSTEM_CLOCK
 Timing 10 cycles, each of  100 events x   10 msec each
 Cycle            1
 Cycle            2
 Cycle            3
 Cycle            4
 Cycle            5
 Cycle            6
 Cycle            7
 Cycle            8
 Cycle            9
 Cycle           10
 Total time:   10.00700      seconds
 The system clock was polled    24071282  times
Fortran Pause - Enter command<CR> or <CR> to continue.

In a 10-second process the error (reported by system_clock) was 0.007 seconds--the best that I can expect. Compare this with a time of 17 seconds in the OP! With additional runs, I have found that the reported time even with 1,000 sub-cycles (10,000 calls) the reported error is still within 0.01 seconds.

Note that this solution was achieved in a very short program, using only simple calls to the intrinsic subroutine SYSTEM_CLOCK.

Of course, the accuracy of system_clock is still very limited due to the resolution available in the windows timer, as Steve has pointed out. So the error in my first one or two cycles is still there. The bad thing is that this appears to be a biased error, so it accumulates as used over and over, thousands of times. The key is to implement the method in an algorithm to avoid that.

dboggs · ‎11-27-2018

The solution I have found is to SCHEDULE the posting time of each cycle according to elapsed time from the gitgo, and to wait until that time arrives before doing so. In this way the "error" previously experienced during each cycle--and which accumulates because it acts like a biased error not zero-mean--then applies only to each cycle posting instead of accumulating. Here is the new version of my program:

PROGRAM TimerDemo3
   
      IMPLICIT NONE
     ! Locals
      INTEGER  :: wait_time_msec
      INTEGER  :: clock_start, clock_end, clock_rate, clock_now
      INTEGER  :: sec_elapsed, msec_schedule, msec_elapsed
      INTEGER  :: icycle, isub, nsub, nClockCalls
      REAL     :: wait_time_sec
      
      WRITE (*, *) 'Pacing execution according to elapsed time from start'
      WRITE (*, *) 'based on SYSTEM_CLOCK'
      
!---- Set up a loop of 10 1-second cycles.
!     Subdivide each cycle into nsub events of length 1/nsub seconds to study
!     the timer accuracy for various intervals.
      nsub = 100
      
      ! Duration of sub events:
      ! will be used to schedule all future events,
      ! NOT used to set a wait time for individual events. 
      wait_time_sec = 1. / nsub
      wait_time_msec = 1000 * wait_time_sec
      WRITE (*, '(A, I4, A, I4, A)') &
         ' Timing 10 cycles, each of ', nsub, ' events x ', &
         wait_time_msec, ' msec each'
      CALL SYSTEM_CLOCK (Clock_start, clock_rate)
      nClockCalls = 0
      DO icycle = 1, 10
         DO isub = 1, nsub
            !  :
            !  Do event calculations here
            !  :
            ! Elapsed time from start for completion of this sub event
            msec_schedule = icycle * isub * wait_time_msec
            ! Poll system clock until scheduled time arrives.
            DO
               !CALL SLEEPQQ (1) ! optional minimal delay to minimize busy work
                                 ! NOT RECOMMENDED
               CALL SYSTEM_CLOCK (clock_now)
               nClockCalls = nClockCalls + 1
               sec_elapsed = REAL (clock_now - clock_start) / REAL (clock_rate)
               msec_elapsed = 1000 * sec_elapsed
               IF (msec_elapsed >= msec_schedule) EXIT
            END DO
            ! Event was completed on schedule
            ! Post sub event results (not required in this demo)
         END DO ! next isub
         WRITE (*, *) 'Cycle ', icycle
      END DO ! next cycle
      
!---- Get the total wait time.
      CALL SYSTEM_CLOCK (Clock_end, clock_rate)
      wait_time_sec = REAL (clock_end - clock_start) / REAL (clock_rate)
      WRITE (*, *) 'Total time:', wait_time_sec, ' seconds'
      print *, 'The system clock was polled', nClockCalls, ' times'
      PAUSE
   END PROGRAM TimerDemo3

Here is an example run, using 100 sub-cycles (1,000 cycle postings 1 sec each):

 Pacing execution according to elapsed time from start
 based on SYSTEM_CLOCK
 Timing 10 cycles, each of  100 events x   10 msec each
 Cycle            1
 Cycle            2
 Cycle            3
 Cycle            4
 Cycle            5
 Cycle            6
 Cycle            7
 Cycle            8
 Cycle            9
 Cycle           10
 Total time:   10.00300      seconds
 The system clock was polled    23972815  times
Fortran Pause - Enter command<CR> or <CR> to continue.

The "error" for this 10-second process is only 0.003 seconds. Compare this with an elapsed time of 17 seconds in the OP!

dboggs · ‎11-28-2018

Sorry about the duplicate posting. I thought I had lost the first one.