Timing Out Called External Program

pidaparthy__aditya · ‎07-01-2016

Hello,

Off the bat, I would like to mention that I am not using the intel visual fortran compiler, but rather Gfortran, but I haven't had much luck in finding active forums for Gfortran, thought I check here.

I have a program which calls an external .exe. This software is off the shelf and I am aware that it was also written in fortran. Lets call is xyz.exe. My program writes the input files for xyz.exe, and then calls it from within. I use the intrinsic procedure execute_command_line to run xyz.exe. Once it finishes running the simulation, my program performs the analysis, writes a new input file depending on the previous time-step results and calls xyz.exe again. This was I am able to run a dynamic simulation using a software which was actually written for steady state cases.

I usually set up a cumulative run overnight and go home. Come back in the morning to look at the results.

Now for the most part, xyz.exe is fairly reliable, but sometimes maybe once a week, it gets stuck in an iterative loop. The makers of xyz.exe have put in a limit of the max number of iterations, but they missed accounting for a scenario if their solver gets stuck within a particular iteration, which is what happens to me once a week, when I come back in the morning to see that the complete simulation has not been completed. It is a numerical issue because of finite precision and an extremely stiff system. It gets solved if I change the initial input by say about 0.001, which makes no difference for me in terms of simulation results, but xyz.exe now converges.

Now I want a way to time the run of xyz.exe and if reaches a threshold, say 300 seconds, terminate it. Now an average single time-step run of xyz.exe takes about 3-4 seconds, so if it has reached 300 seconds it effectively means it is stuck in an iterative loop. How can I do this?

I want to begin a time counter before calling xyz.exe, call xyz.exe using execute_command_line, terminate xyz.exe if it reaches 300 seconds, check if my time counter has reached 300, if it has modify the input by the trivial amount and re-run xyz.exe, hoping it would converge this time.

Any advice on how to do this would be quite helpful.

Thanks

AKP

andrew_4619 · ‎07-01-2016

do the execute_command_line with wait=.false. so it will not wait for the command to finish,

You need to have a means on knowing that xyz.exe is still running or alternatively is NOT finished that you can put into your wait loop. Do you have some plan for example is there a file that only appears after xyz is finished?

I am not sure what sleep function you have in gfortran Ifort has SLEEP and SLEEPQQ but there is also a windows sdk SLEEP function that can be called.

Steven_L_Intel1 · ‎07-01-2016

Most people use the comp.lang.fortran newsgroup to ask gfortran questions.

EXECUTE_COMMAND_LINE won't do what you want - it doesn't have that level of control. As Andrew says, you can tell it not to wait but then you have no way to ask if the program is finished.

Since you are on Windows (I assume), you could use the Windows API CreateProcess to run the program and then do timed waits for the process to exit. If it doesn't exit within your limit you can tell Windows to kill it. Intel Visual Fortran provides all the declarations you need to do this - I don't know about gfortran. You can see an example that shows most of the steps needed here.

pidaparthy__aditya · ‎07-04-2016

Hey Andrew,

Thanks for the response. When xyz.exe finishes executing it returns an output file. This output is .csv file. If xyz.exe does not complete it simulation, the output file does not get created. So I understand from your response that I should use execute_command_line with wait=.false. This way my program goes to the next line without waiting for xyz.exe to finish executing. Wait for a certain amount of time till the output file is created, keep checking during this time if the output file is created and go to the next step "as soon as" I get access to the output file, but re-run the loop if xyz.exe did not return the output file after the stipulated maximum wait time (say 300 seconds).

This sounds correct in principle. But I am not sure of the syntax to use to execute this sequence of commands. Gfortran does have the sleep function. I just need to give it an integer argument and the program pauses for those many seconds.

call sleep(seconds)

I am not sure how the sleep function would be exactly helpful in this regard. As it would pause the program. Usually xyz.exe takes about 3-4 seconds to complete, but on some time-steps it could reasonably take 10-15 seconds also to finish the simulation. So making the program pause for say 20 seconds when most of the time I need only 3-4 seconds would result in wastage of time.

I would be thankful if you could please help me with the required syntax to execute the sequence of commands described in the first paragraph.

Basically how can I get my program to be aware that the output file abc.csv exists as soon as it gets created.

Thanks

AKP

andrew_4619 wrote:

do the execute_command_line with wait=.false. so it will not wait for the command to finish,

You need to have a means on knowing that xyz.exe is still running or alternatively is NOT finished that you can put into your wait loop. Do you have some plan for example is there a file that only appears after xyz is finished?

I am not sure what sleep function you have in gfortran Ifort has SLEEP and SLEEPQQ but there is also a windows sdk SLEEP function that can be called.

pidaparthy__aditya · ‎07-04-2016

Hey Andrew,

I think I figured it out after I hit the submit button to my previous response. Please correct me if I am wrong.

run xyz.exe using the execute_command_line procedure with wait = .false.

start a time-counter using cpu_time in the next line of code.

begin a dowhile loop which executes till the counter is less than 300 seconds, all the while inquiring if the output file abc.csv exists or not using the INQUIRE function.

As soon as the file exist status becomes true, exit the do-while loop, if the file exist status does not become true even after 300 seconds, modify my input by the trivial amount as I had mentioned in my original post and rerun xyz.exe.

If xyz.exe finishes its run, go to the next line of my code as it does normally.

Does this sound correct to you?

Thanks

AKP

andrew_4619 · ‎07-04-2016

That sound like a plan my suggestion on sleep is to have a checking loop:

start timer

do

if timer > 300 exit loop with fail status

check if output file is created or is avialable for exclusive read / write if so exit loop with success status

sleep for 0.5 second to stop burning CPU doing zillions of pointless checks

enddo

TimP · ‎07-04-2016

If you call a short sleep function in the loop (so your CPU will be free for other tasks, like checking the forum, as well as completing your .csv file), little or no cpu time will be registered. Checking elapsed time (e.g. by system_clock or date_and_time) seems more appropriate.

pidaparthy__aditya · ‎07-04-2016

Andrew, Tim,

Thanks for the suggestions. You are correct, using the sleep function would make sure the program doesn't pointlessly keep inquiring the file status while xyz.exe is running. Putting in a short sleep function would improve the functioning of the algorithm without a major loss of computation time.

Tim,

Checking the elapsed time does seem more appropriate than using cpu_time.

AKP

jimdempseyatthecove · ‎07-04-2016

Or....

Since you know how many ms your SLEEPQQ is sleeping for (and the time accuracy for this is coarse), set your do loop to the known number of iterations:

secondsUntilError = 300
msSpinWait = 500
...
DO i=1, secondsUntillError * 1000 / msSpinWait
  INQUIRE(file=YourFileName, EXIST=success)
  if(success) exit
  sleepqq(msSpinWait)
ENDDO
if(.not. success) goto 9999 ! your error routine
...

*** Note, the program that creates the file should create to a temporary file name (e.g. xyz.tmp), close the file, then rename the file to the appropriate file name (e.g. xyz.csv). If the xyz.exe cannot do this directly, then you can run a BATCH script that runs xyz.exe then renames the file.

*** Note 2, the spawning program (above) must assure that any old file name of the same name does not exist prior to the spawn.

Although the above will work correctly should the xyz.exe take an abnormal exit without creating the output file, it will not handle the case correctly where xyz.exe gets hung in an infinite loop. Where "correctly" implies you intend for the spawning program is intended to recover and re-run the xyz.exe. IOW, the above does not know the process handle and would be unable to kill the spawned (hung) program. Using CreateProcess will permit you to obtain the process handle, and thus permit you to kill the runaway process.

*!* While you can fine the process handle via getting the handle of a window by name in title, you will not be assured that the window found is the desired window (e.g. you are running multiple simulations). CreateProcess is the safest way (on Windows).

Jim Dempsey

Steven_L_Intel1 · ‎07-04-2016

Just checking if the file exists may not be sufficient if the program is still writing to it. You could try opening it and see if that succeeds.

pidaparthy__aditya · ‎07-05-2016

Steve,

Thanks for the tip, I will put a short sleep to account for the time between abc.csv is created and it is released by xyz.exe. Normally xyz.exe writes fairly quickly into the output file. So half a second should suffice.

Everyone,

I am however facing a different issue. I am on the windows platform. Do I need to do anything to turn on asynchronous execution? I assumed that giving specifying wait=.false. would automatically activate asynchronous execution. However I notice that is not the way and my code is waiting for xyz.exe to finish execution before moving onto the next line.

Am I missing something here?

Thanks

AKP

TimP · ‎07-05-2016

You'll have little luck asking about gfortran details anywhere if you can't be specific about your version, and perhaps post a simple working example of your question. Did you try gcc-help mail list? Note that some popular versions of gfortran for Windows are totally unsupported, or even violate the terms of the license. It seems untopical to speculate further here.

pidaparthy__aditya · ‎07-05-2016

Tim,

Thank you for your input. I actually wasn't sure whether this was a fortran issue or a compiler issue. I am also checking on the comp.lang.fortran group.

I am using compiler collection of MinGW-win64 version 5.3.0.

I will check with the GCC-help mail lists too.

Thanks again.

AKP

andrew_4619 · ‎07-05-2016

As is suggested earlier if you use the win API createprocess you will have control over such issues and will also get the windows process ID of your spawned exe and will be able to use the Win API to monitor that process.

jimdempseyatthecove · ‎07-05-2016

>>I am however facing a different issue. I am on the windows platform. Do I need to do anything to turn on asynchronous execution? I assumed that giving specifying wait=.false. would automatically activate asynchronous execution. However I notice that is not the way and my code is waiting for xyz.exe to finish execution before moving onto the next line.

You can optionally:

CALL EXECUTE_COMMAND_LINE('START xyz.exe your options here')

or

CALL EXECUTE_COMMAND_LINE('START /MIN xyz.exe your options here')

From a cmd window issue START /? to get a list of options for the command. Note the /MIN option to START starts the application with a minimized window.

Jim Dempsey

pidaparthy__aditya · ‎07-05-2016

Andrew,

I think I will have to do that only now. I just checked on the comp.lang.fortran forum and got a concrete response from Arjen Markus. He tried it with both gfortran and ifort and noticed it does not wait with ifort but does wait with gfortran.

I had been avoiding going the createprocess route as I would have to spend some time figuring it out for a problem (xyz.exe getting stuck in a loop) which is rare but one which brings my simulation to a dead stop when it does arise. I had not actually foreseen this issue with xyz.exe while writing the original wrapper program and took the easy way out with execute_command_line. I suppose now I have to do it.

Andrew, Steve, Tim, Jim, thank you for the help. I will go through the old threads and responses on CreateProcess or start a different one if I face any issues using CreateProcess.

Have a good one!!

AKP

pidaparthy__aditya · ‎07-05-2016

Jim,

I think that is a good suggestion. Might result in a new window, right now xyz.exe runs within my original window, which was really convenient for me, but I suppose using /MIN would let me do other work while the simulation is running.

Thanks

AKP

jimdempseyatthecove · ‎07-05-2016

Note, with START you can also specify the window title. For example:

CALL EXECUTE_COMMAND_LINE('START "XYZ.EXE Started yyyymmhhssms" /MIN xyz.exe your options here')

Where your Fortran program generates the start time yyyymmhhssms, and remembers the window title. Then later, when xyz.exe appears hung, you can use the window title to obtain the process handle, which you can then use to perform the kill process.

This will permit you to have multiple simulations running concurrently, yet permit killing of hung process (provided no two simulations started at the same time).

CreateProcess would be the safest way.

Jim Dempsey

Steven_L_Intel1 · ‎07-05-2016

When I implemented EXECUTE_COMMAND_LINE for Intel Fortran, I found that I could not use the C "system" routine as it did not provide the needed "no wait" functionality, so I redid the Windows side with CreateProcess. On Linux and OS X it still uses system.