Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

How to pace the speed of execution? SLEEPQQ is inaccurate

dboggs
New Contributor I
2,805 Views

My program is a process simulation in which I need to post the result of each time step. The execution time of each step is negligible so I insert a delay, using SLEEPQQ, between each step. The required sleep time is typically .02 to .10 seconds. The results are unsatisfactory. Here's a short "do nothing" code that illustrates the problem: 

   PROGRAM PacingDemo
      USE IFPORT
      IMPLICIT NONE
      INTEGER  :: isec, isubcycle
      
      PRINT *, 'Program PacingDemo'
      PRINT *, 'Attempt to control the speed of execution...'
      PRINT *, 'Press <Enter> to start...'
      READ (*, *)
      CALL SLEEPQQ (3000)
      CALL BEEPQQ (1000, 500)
      ! Try to create 10 cycles paced at one per second
      DO isec = 1, 10
         DO isubcycle = 1, 100
            CALL SLEEPQQ (10)
            !PRINT *, '  Subcycle', isubcycle
         END DO
         PRINT *, 'Cycle', isec
      END DO
      CALL BEEPQQ (1000, 500)
END PROGRAM PacingDemo      

I divide the time step into subcycles to get some variability in sleep periods in combination with the argument to SLEEPQQ, which needs milliseconds. To get a sleep time of 1 sec, for example, I can use 100 subcycles with a wait time of 10 milliseconds (as shown), or 1 subcycle with a wait time of 1000 ms, etc. Here's the results of various combinations:

No. subcycles, SLEEPQQ arg, measured elapsed time
1, 1000, 10.5 sec (5% error)
2, 500, 10.7 sec
5, 200, 10.9 sec (10% error)
10, 100, 11.4 sec
20, 50, 13.1 sec
50, 20, 16 sec (60% error)
100, 10, 17 sec
200, 5, 32 sec (320% error)
500, 2, 79 sec (790% error)
1000, 1, 156 sec (1500% error)

So, I would say that SLEEPQQ (n) only gives reasonable results for n > 1000 (one second), below which the accuracy seriously deteriorates, to the point of being ridiculous.

As an alternative I tried calling CPU_TIME (start_time) at the beginning of each step, then placing CPU_TIME (current_time) in an endless loop that exits only when current_time - start_time reaches the desired cycle time. The results were similar.

Surely there is a way to do this? 

0 Kudos
39 Replies
Paul_Curtis
Valued Contributor I
1,799 Views

If you're running on Windows, try this:

RECURSIVE SUBROUTINE delay_ms (howmany)
    USE kernel32, ONLY: WaitForSingleObject, CloseHandle, CreateEvent !sleep
    IMPLICIT NONE
    INTEGER, INTENT(IN)    :: howmany
    INTEGER(HANDLE)     :: hEvent
    INTEGER             :: rval

    !    Win32 suspend function, allows the timeslice to be
    !    used by other threads until the timeout elapses
    IF (howmany <= 0) RETURN
    
    !   original method, seems to block all threads
    !   not just the calling thread
    !IF (howmany > 0) CALL sleep (howmany)

    !   new method, supposed to block only the calling thread
    hEvent = CreateEvent       (NULL_SECURITY_ATTRIBUTES,    &
                                   TRUE,                        & ! manual reset
                                FALSE,                      & ! initial status
                                NULL)                            ! unnamed    event object
    rval = WaitForSingleObject (hEvent, howmany)
    rval = CloseHandle         (hEvent)

END SUBROUTINE delay_ms

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,799 Views

An additional issue you have is if you want periodic intervals, you must take into account the time your code is not waiting  (and remove that from the wait time). On Windows consider using the ReadPerformanceCounter family of functions.

Jim Dempsey 

0 Kudos
Steve_Lionel
Honored Contributor III
1,799 Views

SLEEPQQ only promises that you'll sleep for at least the duration specified - it may be longer. The resolution of such timers on Windows isn't all that great (.01 seconds?) 

I don't understand Paul's code in post #2 - I think it is missing something important. https://docs.microsoft.com/en-us/windows/desktop/Sync/using-waitable-timer-objects shows how to do this sort of thing in C, and it could be translated into Fortran easily enough.

0 Kudos
Paul_Curtis
Valued Contributor I
1,799 Views

@Steve, I can assure you that my code works perfectly, I use it mostly to regulate serial communications, and I use it a lot; it is also threadsafe (which is why my routine is recursive).  The concept is pretty simple: start up a dummy event unlinked to anything and hence will never be signaled, and wait for the event to exit at its pre-selected time out period.

0 Kudos
Steve_Lionel
Honored Contributor III
1,799 Views

Ah, now I understand. Thanks.

0 Kudos
dboggs
New Contributor I
1,799 Views

Thanks Paul, for your very-easy-to-implement routine delay_ms (except, fyi, I had to remove the ONLY: clause because there were too many undefined variables in the CreateEvent function). Unfortunately, the performance I get is virtually the same as SLEEPQQ. For example, when I ask for a delay of 20 ms I get a delay of 33 ms; asking for 10 ms results in 17 ms. It's only accurate for delays of more than 1000 ms.

Steve: I thought this was an accuracy issue, not resolution, but perhaps you are right. The "accuracy" is terrible when I ask for a delay of 0.01 seconds, but it is still pretty bad (10% too long) when I ask for 0.5 sec. Thanks for the reference, though. If I knew anything about C and Windows I would convert it to Fortran as you suggest, but I would probably spend hours just working on the conversion, and I have become skeptical of how well it would work.

If the conversion is "easy," why doesn't Intel do it and offer up the result as a replacement for the crippled SLEEPQQ? BTW it is good to know that it "only promises sleep for at least the duration specified--it may be longer." Would have been good to know that. The documentation only states  "Delays execution of the program for a specified duration" with no further comments. Perhaps an upgrade is in order, including a warning of poor performance when the argument is less than around 2000? This appears to be a classic case of implied excessive resolution, well in excess of accuracy.

Sorry to sound like a chronic complainer about this, but it does seem like Fortran should provide a pacing function that works worth a hoot.

0 Kudos
Steve_Lionel
Honored Contributor III
1,799 Views

I think that SLEEPQQ isn't really the right tool for what you want.  Jim Dempsey's suggestion of the "performance counters" is what one usually turns to when precision and resolution matters.

Do I understand correctly that your simulation needs to simulate the wait intervals in real time? Maybe what you need to do here is determine at what time you need to proceed, then loop with a short-interval wait until you see that time is reached.

Here's a translation of the C program into Fortran, with a sample execution:

program Timer
    
    use kernel32

    implicit none

    ! Variables
    integer(HANDLE) :: hTimer
    type(T_LARGE_INTEGER) :: liDueTime
    integer clock_start, clock_end, clock_rate

    ! Body of Timer
    htimer = NULL
    ! 10 seconds in 100ns units, negative for relative
    liDueTime = TRANSFER(-100000000_8,T_LARGE_INTEGER(0,0))
    
    ! Create an unnamed waitable timer
    htimer = CreateWaitableTimer (NULL, TRUE, NULL)
    if (NULL == hTimer) then
        print *, "CreateWaitableTimer failed with error=",GetLastError()
        error stop
    end if
    
    print *, "Waiting for 10 seconds..."
    
    ! Check how long we actually wait
    call SYSTEM_CLOCK (clock_start)
    ! Set a timer to wait for 10 seconds
    if (SetWaitableTimer(hTimer, liDueTime, 0, NULL, NULL, 0) == 0) then
        print *, "SetWaitableTimer failed with error=",GetLastError()
        error stop
    end if
    
    ! Wait for the timer
    if (WaitForSingleObject(hTimer, INFINITE) /= WAIT_OBJECT_0) then
        print *, "WaitForSingleObject failed with error=",GetLastError()
        error stop
    end if
    
    call SYSTEM_CLOCK (clock_end, clock_rate)
    print '(a,F6.3,a)', "Timer was signaled after ",real(clock_end-clock_start)/real(clock_rate), &
        " seconds"

    end program Timer
 Waiting for 10 seconds...
Timer was signaled after 10.000 seconds

 

0 Kudos
Steve_Lionel
Honored Contributor III
1,799 Views

You could do the creation of the timer once. Then set the value of liDueTime and call SetWaitableTimer then WaitForSingleObject when you want to wait.

0 Kudos
JVanB
Valued Contributor II
1,799 Views

Are you posting the results to the screen or to a file? If the former you can use animation primitives to update as necessary.

0 Kudos
dboggs
New Contributor I
1,799 Views

Thanks Steve, I will give this a try.  "Determine at what time you need to proceed, then loop with a short-interval wait until you see that time is reached"--Yes, that's what I tried doing using CPU_TIME as described in my OP. But maybe SYSTEM_CLOCK will work better.

Repeat Offender: I am using both, posting some brief numerical results to a file and updating the screen graphics using drawing primitives, at the end of each calculation step. But these go much faster on screen than I want them to--for example, 40 steps/sec when I want them to go, say, 10 to 20 steps per second--hence the SLEEP period inserted each step. But tell me: what are these "animation primitives" you speak of?

 

0 Kudos
Paul_Curtis
Valued Contributor I
1,799 Views

@dboggs, how do you measure the "actual" delay time in your tests, and what supports the assumption that your calibration technique is more accurate than the timing routine being tested?  Windows is a message-pump opsys and does not provide deterministic timings for message transmission (there is an ongoing debate as to whether Windows can be considered a real-time opsys; it can, if your definition of real-time is fuzzy enough and synchronizations are not critical beyond a few ms).

I have used my delay_ms() routine, as written above, for decades, mostly with delays in the 10-200 ms range, to insert wait time following the sending of a communications query before listening for a device response, to accomodate the varying performances of actual serial devices which are much slower than any PC.  I have not bothered (or needed) to make an independent calibration, but use with actual devices shows immediate differences as the delay times are adjusted to match the device capabilities.

0 Kudos
JVanB
Valued Contributor II
1,799 Views

In OpenGL for example there is wglSwapIntervalEXT that sets the minimum number of vertical retraces between calls to SwapBuffers. This can cause your code to wait a pretty much uniform time between screen updates. For example, with a vertical frequency of 60 Hz and CALL wglSwapIntervalEXT(interval=1) you would get updates every 16.67 ms assuming your code can run fast enough to draw the back buffer in time.

Of course your updates are always going to be posted at some multiple of the vertical frequency so in your original post you seem to be running at 60 Hz. What is the resolution and frequency you run at? I usually run at 3840X2160@30 because I don't get 4:4:4 chroma at 3840X2160@60, and that's distracting for things like Fortran programming. For a 4K@60 video like https://www.youtube.com/watch?v=Pg8DbWNTEyI I have to switch to the latter mode; YouTube videos are low chroma anyhow, so it doesn't matter in that context.

0 Kudos
LRaim
New Contributor I
1,799 Views

In a dynamic simulation the user can use the SYNCRO  option to have execution time equal to clock time. This option becomes active if the actual execution time intervals become lower than clock intervals. The piece of code  is :
!
!... perform syncronization when required ...
!
      IF(KSYNCR .LE. 0)  GOTO 2100
!
      WTIME = SIMTIME2 - ETIME2
      IF(WTIME .LE. 0.100D00)  GOTO 2100
!... do not wait for less than 100 msec ...
!
!... waiting time in millisecs ....
      KWTIME = WTIME * 1.0D03
!
      IF(IAUX .GE. LEVA) WRITE(NOUT,7480) KWTIME
 7480 FORMAT(5X,'WAITING ',I6,' MSECS ')
!
      CALL TWAIT(KWTIME)
      IGTIME2 = 1
!
 2100 CONTINUE
!
the TWAIT subroutine is the following:
 


      SUBROUTINE TWAIT(IWTIME)
!+... ******************************************************************
!                                                                      I
!... * Function * ....                                                 I
!     to wait for a given amounts of time                              I
!                                                                      I
!... * Parameters * ...                                                I
!                                                                      I
! -- Name  ---- -Type--In/Out ---   Description                        I
! -------------------------------------------------------------------  I
!    IWTIMW      i*4    I      waiting time in millisecs               I
!                                                                      I
!+---------------------------------------------------------------------*
!                                                                      I
      use kernel32

      IMPLICIT NONE
      integer iwtime
!
!.... * LOCAL WORK AREAS * ....
!
      integer (DWORD) iwait
!
!.... * COMMON AREAS POOL * ....
!
!
!.... * LITERAL & NUMERICAL POOL * ....
!
!
!.... * PROCESSING * ....
!
      iwait = IWTIME
!
      Call Sleep(iwait)
!
      RETURN
      END
Regards

 

0 Kudos
Steve_Lionel
Honored Contributor III
1,799 Views

CPU_TIME is absolutely inappropriate for wall-clock timing - use SYSTEM_CLOCK.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,799 Views

Try something like this:

module mod_Timer
  use kernel32

  type(T_LARGE_INTEGER) :: PerformanceFrequency
  type(T_LARGE_INTEGER) :: PerformanceCounter0

  subroutine InitPerformanceCounter
    implicit none
    if(QueryPerformanceFrequency(PerformanceFrequency)) return
    print *, "Unable to run timer"
    stop
  end subroutine InitPerformanceCounter

  subroutine InitPerformanceCounter0
    implicit none
    if(QueryPerformanceCounter(PerformanceCounter0)) return
    print *, "Unable to run timer"
    stop
  end subroutine InitPerformanceCounter0

  subroutine WaitUntilIntervalN_ms(N)
    implicit none
    type(T_LARGE_INTEGER) :: DoneCounter
    integer :: Wait_ms
    DoneCounter = (PerformanceFrequency * N) / 1000 ! perform * before /
    Wait_ms = int((DoneCounter - PerformanceCounter0) / 1000, selected_int_kind(Wait_ms))
    if(Wait_ms > 0) call sleepqq(Wait_ms)
  end subroutine WaitUntilIntervalN_ms(N)
    
end module mod_Timer

program YourProgram
  use mod_Timer
  ...
  call InitPerformanceCounter
  ...
  call InitPerformanceCounter0	! Init T0
  iFramesPerSecond = 24
  do iFrame = 1, nFrames
    call BuildFrame
    ms = (iFrame * 1000) / iFramesPerSecond
    call WaitUntilIntervalN_ms(ms)
    call PaintFrame
  end do
  ...
end program YourProgram

The above is untested.

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,799 Views

Notes for the above:

Due to Windows peculiarities, your PaintFrame will not necessarily occur at the exact time interval. The PerformanceCounter will advance properly regardless of what Windows inflicts upon your program. Therefore the ms anniversary will be maintained (IOW no skew), and any frame jitter will not affect the time interval of the next frame (assuming the next interval has not passed since the prior iteration.

Jim Dempsey

0 Kudos
dboggs
New Contributor I
1,799 Views

Jim: I am trying to use your routine mod_timer, but I get a few compile errors. Nearly all seem to be caused by the declarations TYPE (T_LARGE_INTEGER) on variables PerformanceFrequency, PerformanceCounter0, and DoneCounter: there are several lines involving all of these that produce error "This binary operation is invalid for this data type (10 errors). Perhaps something is wrong with my USE KERNEL32?

I have tried my program as both a console app and a standard graphics app; the former gave a compile "error opening the compiled module file...check INCLUDE paths (which I find almost never have anything to do with include paths). That error does not occur when I build as a standard graphics app.

0 Kudos
andrew_4619
Honored Contributor II
1,799 Views

T_LARGE_INTEGER is a derived type. 

     TYPE T_LARGE_INTEGER
     SEQUENCE
       integer(DWORD) LowPart ! knowns  DWORD 
       integer(LONG) HighPart ! knowns  LONG 
     END TYPE

So the mathematical operations with variables of this type are not valid. You could only do operations with the elements of the type.

0 Kudos
dboggs
New Contributor I
1,799 Views

OK, how does this relate to Jim's code? Are you saying something is wrong or missing with it?

0 Kudos
mecej4
Honored Contributor III
1,755 Views

I suspect that T_LARGE_INTEGER is an Intel-defined type, as you can see in IFWINTY.F90. You can bring in its definition into your code by adding USE IFWINTY (or some other USE statement that pulls IFWINTY in).

0 Kudos
Reply