My program is a process simulation in which I need to post the result of each time step. The execution time of each step is negligible so I insert a delay, using SLEEPQQ, between each step. The required sleep time is typically .02 to .10 seconds. The results are unsatisfactory. Here's a short "do nothing" code that illustrates the problem:
INTEGER :: isec, isubcycle
PRINT *, 'Program PacingDemo'
PRINT *, 'Attempt to control the speed of execution...'
PRINT *, 'Press <Enter> to start...'
READ (*, *)
CALL SLEEPQQ (3000)
CALL BEEPQQ (1000, 500)
! Try to create 10 cycles paced at one per second
DO isec = 1, 10
DO isubcycle = 1, 100
CALL SLEEPQQ (10)
!PRINT *, ' Subcycle', isubcycle
PRINT *, 'Cycle', isec
CALL BEEPQQ (1000, 500)
END PROGRAM PacingDemo
I divide the time step into subcycles to get some variability in sleep periods in combination with the argument to SLEEPQQ, which needs milliseconds. To get a sleep time of 1 sec, for example, I can use 100 subcycles with a wait time of 10 milliseconds (as shown), or 1 subcycle with a wait time of 1000 ms, etc. Here's the results of various combinations:
No. subcycles, SLEEPQQ arg, measured elapsed time
1, 1000, 10.5 sec (5% error)
2, 500, 10.7 sec
5, 200, 10.9 sec (10% error)
10, 100, 11.4 sec
20, 50, 13.1 sec
50, 20, 16 sec (60% error)
100, 10, 17 sec
200, 5, 32 sec (320% error)
500, 2, 79 sec (790% error)
1000, 1, 156 sec (1500% error)
So, I would say that SLEEPQQ (n) only gives reasonable results for n > 1000 (one second), below which the accuracy seriously deteriorates, to the point of being ridiculous.
As an alternative I tried calling CPU_TIME (start_time) at the beginning of each step, then placing CPU_TIME (current_time) in an endless loop that exits only when current_time - start_time reaches the desired cycle time. The results were similar.
Surely there is a way to do this?
If you're running on Windows, try this:
RECURSIVE SUBROUTINE delay_ms (howmany)
USE kernel32, ONLY: WaitForSingleObject, CloseHandle, CreateEvent !sleep
INTEGER, INTENT(IN) :: howmany
INTEGER(HANDLE) :: hEvent
INTEGER :: rval
! Win32 suspend function, allows the timeslice to be
! used by other threads until the timeout elapses
IF (howmany <= 0) RETURN
! original method, seems to block all threads
! not just the calling thread
!IF (howmany > 0) CALL sleep (howmany)
! new method, supposed to block only the calling thread
hEvent = CreateEvent (NULL_SECURITY_ATTRIBUTES, &
TRUE, & ! manual reset
FALSE, & ! initial status
NULL) ! unnamed event object
rval = WaitForSingleObject (hEvent, howmany)
rval = CloseHandle (hEvent)
END SUBROUTINE delay_ms
An additional issue you have is if you want periodic intervals, you must take into account the time your code is not waiting (and remove that from the wait time). On Windows consider using the ReadPerformanceCounter family of functions.
SLEEPQQ only promises that you'll sleep for at least the duration specified - it may be longer. The resolution of such timers on Windows isn't all that great (.01 seconds?)
I don't understand Paul's code in post #2 - I think it is missing something important. https://docs.microsoft.com/en-us/windows/desktop/Sync/using-waitable-timer-objects shows how to do this sort of thing in C, and it could be translated into Fortran easily enough.
@Steve, I can assure you that my code works perfectly, I use it mostly to regulate serial communications, and I use it a lot; it is also threadsafe (which is why my routine is recursive). The concept is pretty simple: start up a dummy event unlinked to anything and hence will never be signaled, and wait for the event to exit at its pre-selected time out period.
Thanks Paul, for your very-easy-to-implement routine delay_ms (except, fyi, I had to remove the ONLY: clause because there were too many undefined variables in the CreateEvent function). Unfortunately, the performance I get is virtually the same as SLEEPQQ. For example, when I ask for a delay of 20 ms I get a delay of 33 ms; asking for 10 ms results in 17 ms. It's only accurate for delays of more than 1000 ms.
Steve: I thought this was an accuracy issue, not resolution, but perhaps you are right. The "accuracy" is terrible when I ask for a delay of 0.01 seconds, but it is still pretty bad (10% too long) when I ask for 0.5 sec. Thanks for the reference, though. If I knew anything about C and Windows I would convert it to Fortran as you suggest, but I would probably spend hours just working on the conversion, and I have become skeptical of how well it would work.
If the conversion is "easy," why doesn't Intel do it and offer up the result as a replacement for the crippled SLEEPQQ? BTW it is good to know that it "only promises sleep for at least the duration specified--it may be longer." Would have been good to know that. The documentation only states "Delays execution of the program for a specified duration" with no further comments. Perhaps an upgrade is in order, including a warning of poor performance when the argument is less than around 2000? This appears to be a classic case of implied excessive resolution, well in excess of accuracy.
Sorry to sound like a chronic complainer about this, but it does seem like Fortran should provide a pacing function that works worth a hoot.
I think that SLEEPQQ isn't really the right tool for what you want. Jim Dempsey's suggestion of the "performance counters" is what one usually turns to when precision and resolution matters.
Do I understand correctly that your simulation needs to simulate the wait intervals in real time? Maybe what you need to do here is determine at what time you need to proceed, then loop with a short-interval wait until you see that time is reached.
Here's a translation of the C program into Fortran, with a sample execution:
program Timer use kernel32 implicit none ! Variables integer(HANDLE) :: hTimer type(T_LARGE_INTEGER) :: liDueTime integer clock_start, clock_end, clock_rate ! Body of Timer htimer = NULL ! 10 seconds in 100ns units, negative for relative liDueTime = TRANSFER(-100000000_8,T_LARGE_INTEGER(0,0)) ! Create an unnamed waitable timer htimer = CreateWaitableTimer (NULL, TRUE, NULL) if (NULL == hTimer) then print *, "CreateWaitableTimer failed with error=",GetLastError() error stop end if print *, "Waiting for 10 seconds..." ! Check how long we actually wait call SYSTEM_CLOCK (clock_start) ! Set a timer to wait for 10 seconds if (SetWaitableTimer(hTimer, liDueTime, 0, NULL, NULL, 0) == 0) then print *, "SetWaitableTimer failed with error=",GetLastError() error stop end if ! Wait for the timer if (WaitForSingleObject(hTimer, INFINITE) /= WAIT_OBJECT_0) then print *, "WaitForSingleObject failed with error=",GetLastError() error stop end if call SYSTEM_CLOCK (clock_end, clock_rate) print '(a,F6.3,a)', "Timer was signaled after ",real(clock_end-clock_start)/real(clock_rate), & " seconds" end program Timer
Waiting for 10 seconds... Timer was signaled after 10.000 seconds
You could do the creation of the timer once. Then set the value of liDueTime and call SetWaitableTimer then WaitForSingleObject when you want to wait.
Thanks Steve, I will give this a try. "Determine at what time you need to proceed, then loop with a short-interval wait until you see that time is reached"--Yes, that's what I tried doing using CPU_TIME as described in my OP. But maybe SYSTEM_CLOCK will work better.
Repeat Offender: I am using both, posting some brief numerical results to a file and updating the screen graphics using drawing primitives, at the end of each calculation step. But these go much faster on screen than I want them to--for example, 40 steps/sec when I want them to go, say, 10 to 20 steps per second--hence the SLEEP period inserted each step. But tell me: what are these "animation primitives" you speak of?
@dboggs, how do you measure the "actual" delay time in your tests, and what supports the assumption that your calibration technique is more accurate than the timing routine being tested? Windows is a message-pump opsys and does not provide deterministic timings for message transmission (there is an ongoing debate as to whether Windows can be considered a real-time opsys; it can, if your definition of real-time is fuzzy enough and synchronizations are not critical beyond a few ms).
I have used my delay_ms() routine, as written above, for decades, mostly with delays in the 10-200 ms range, to insert wait time following the sending of a communications query before listening for a device response, to accomodate the varying performances of actual serial devices which are much slower than any PC. I have not bothered (or needed) to make an independent calibration, but use with actual devices shows immediate differences as the delay times are adjusted to match the device capabilities.
In OpenGL for example there is wglSwapIntervalEXT that sets the minimum number of vertical retraces between calls to SwapBuffers. This can cause your code to wait a pretty much uniform time between screen updates. For example, with a vertical frequency of 60 Hz and CALL wglSwapIntervalEXT(interval=1) you would get updates every 16.67 ms assuming your code can run fast enough to draw the back buffer in time.
Of course your updates are always going to be posted at some multiple of the vertical frequency so in your original post you seem to be running at 60 Hz. What is the resolution and frequency you run at? I usually run at 3840X2160@30 because I don't get 4:4:4 chroma at 3840X2160@60, and that's distracting for things like Fortran programming. For a 4K@60 video like https://www.youtube.com/watch?v=Pg8DbWNTEyI I have to switch to the latter mode; YouTube videos are low chroma anyhow, so it doesn't matter in that context.
In a dynamic simulation the user can use the SYNCRO option to have execution time equal to clock time. This option becomes active if the actual execution time intervals become lower than clock intervals. The piece of code is :
!... perform syncronization when required ...
IF(KSYNCR .LE. 0) GOTO 2100
WTIME = SIMTIME2 - ETIME2
IF(WTIME .LE. 0.100D00) GOTO 2100
!... do not wait for less than 100 msec ...
!... waiting time in millisecs ....
KWTIME = WTIME * 1.0D03
IF(IAUX .GE. LEVA) WRITE(NOUT,7480) KWTIME
7480 FORMAT(5X,'WAITING ',I6,' MSECS ')
IGTIME2 = 1
the TWAIT subroutine is the following:
!... * Function * .... I
! to wait for a given amounts of time I
!... * Parameters * ... I
! -- Name ---- -Type--In/Out --- Description I
! ------------------------------------------------------------------- I
! IWTIMW i*4 I waiting time in millisecs I
!.... * LOCAL WORK AREAS * ....
integer (DWORD) iwait
!.... * COMMON AREAS POOL * ....
!.... * LITERAL & NUMERICAL POOL * ....
!.... * PROCESSING * ....
iwait = IWTIME
CPU_TIME is absolutely inappropriate for wall-clock timing - use SYSTEM_CLOCK.
Try something like this:
module mod_Timer use kernel32 type(T_LARGE_INTEGER) :: PerformanceFrequency type(T_LARGE_INTEGER) :: PerformanceCounter0 subroutine InitPerformanceCounter implicit none if(QueryPerformanceFrequency(PerformanceFrequency)) return print *, "Unable to run timer" stop end subroutine InitPerformanceCounter subroutine InitPerformanceCounter0 implicit none if(QueryPerformanceCounter(PerformanceCounter0)) return print *, "Unable to run timer" stop end subroutine InitPerformanceCounter0 subroutine WaitUntilIntervalN_ms(N) implicit none type(T_LARGE_INTEGER) :: DoneCounter integer :: Wait_ms DoneCounter = (PerformanceFrequency * N) / 1000 ! perform * before / Wait_ms = int((DoneCounter - PerformanceCounter0) / 1000, selected_int_kind(Wait_ms)) if(Wait_ms > 0) call sleepqq(Wait_ms) end subroutine WaitUntilIntervalN_ms(N) end module mod_Timer program YourProgram use mod_Timer ... call InitPerformanceCounter ... call InitPerformanceCounter0 ! Init T0 iFramesPerSecond = 24 do iFrame = 1, nFrames call BuildFrame ms = (iFrame * 1000) / iFramesPerSecond call WaitUntilIntervalN_ms(ms) call PaintFrame end do ... end program YourProgram
The above is untested.
Notes for the above:
Due to Windows peculiarities, your PaintFrame will not necessarily occur at the exact time interval. The PerformanceCounter will advance properly regardless of what Windows inflicts upon your program. Therefore the ms anniversary will be maintained (IOW no skew), and any frame jitter will not affect the time interval of the next frame (assuming the next interval has not passed since the prior iteration.
Jim: I am trying to use your routine mod_timer, but I get a few compile errors. Nearly all seem to be caused by the declarations TYPE (T_LARGE_INTEGER) on variables PerformanceFrequency, PerformanceCounter0, and DoneCounter: there are several lines involving all of these that produce error "This binary operation is invalid for this data type (10 errors). Perhaps something is wrong with my USE KERNEL32?
I have tried my program as both a console app and a standard graphics app; the former gave a compile "error opening the compiled module file...check INCLUDE paths (which I find almost never have anything to do with include paths). That error does not occur when I build as a standard graphics app.
T_LARGE_INTEGER is a derived type.
TYPE T_LARGE_INTEGER SEQUENCE integer(DWORD) LowPart ! knowns DWORD integer(LONG) HighPart ! knowns LONG END TYPE
So the mathematical operations with variables of this type are not valid. You could only do operations with the elements of the type.