Saving, Restoring State

smedvedoff · ‎10-31-2007

We run simulations that might take 30 minutes to complete. We often want to look at activity near the end of the simulation. We'd like to be able to save the state ofthe simulation at a particular time, say 28 seconds, and then restart the simulation with that state, not having to run the first 28 seconds. I'm wondering if there's a way in IVF to snapshot and save memory, and to restore memory from the snapshot. It would be great if this could be done outside the IVF developer as well (e.g. have logic embedded in the executable to do the save/restore).

Thanks,

Steve

Steven_L_Intel1 · ‎11-01-2007

There is no automatic method to do this - you'll have to develop your own snapshot facility, saving the relevant information in a file that is periodicaly written and closed (or at least FLUSHed.) Depending on the kind and amount of data, you may find NAMELIST I/O to be helpful here.

jimdempseyatthecove · ‎11-01-2007

Steve,

The simulations I run take 30 hours, sometimes much more. As the other Steve mentioned, there is no reliable automatic way to make periodic snapshots.

Most simulations have an outer most loop, typically called an integration interval. At an appropriate place in the outer most loop one could call a routine to take a snapshot. The snapshots would be taken at specified integration time intervals. Then, to make use of these snapshot files, the code prior to entry into the outer most integration loop would check for an option switch that would indicate a resume from snapshot file (and which one).

In my simulations I found that periodic snapshots are not always suitable to get the snapshot at the right time in the simulation. Too high of snapshot frequency slows down the simulation and eats up hard drive space (about 10MB per snapshot of my simulations).

For my purposes I start up a separate thread that launches a dialog box. The dialog box displays data as the simulation progresses. Additionally I have a separate thread driving graphical output via the Array Visualizer. The dialog box contains additional controls such as a throttle for the integration step size and a button to specify "Take snap shot now". The throttle is use near a critical section and then a snapshot or a series of snap shots are taken as the problem area is approached. (there are a bunch of other buttons too)

The other nice thing with this arrangement is thedialog box can contain a Pause/Resume button, when paused and with the Array Visualizer I can examine not only the displayed graphical information but the underlaying data tables as well. If something looks interesting or flawed I can take a snapshot or abort a run.

Writing the snapshot and resume from snapshot functions are a bit of work but well worth the time. This fature will save you muchtime later on.

Jim Dempsey

smedvedoff · ‎11-05-2007

Jim,

Thanks for your response. The approach you described sounds good. I can see how the snapshot routine could save global variables, but how do you handle local variables? Do you have a technique for saving/restoring local variables in all of your routines? The only approach I can come up with is to have each routine write it's local variable values to some global block that can be written/read in the snapshot/resume routines. On a resume,each routine wouldhave to retrieveits stored values from the block before processing. Do you have a cleaner approach?

Thanks,

Steve

jimdempseyatthecove · ‎11-05-2007

Steve,

Consider

program foo
use foofoo
call GetCommandLineArgs()
if(ResumeFromSnapshot) then
call RestoreFromSnapshot()
else
call LoadSpecifiedFiles()
endif
done=.false.
do while(.not.done)
call Iteration()
if(TakeSnapshot) call doSnapshot()
end do
end program foo

Jim

smedvedoff · ‎11-05-2007

Jim,

In your doSnapshot() function, how do you store the values of local variables from all routines? That's the part I'm not seeing.

- Steve

Steven_L_Intel1 · ‎11-05-2007

You wouldn't do anything with local variables.

Typically programs with snapshot functions do the snapshots at well-defined locations, such as between iterations of some program-level loop. On a restore, the global state would be reloaded, including progress information (iteration count, etc.) and the loop restarted.

What you seem to be asking for is an operating-system level application snapshot, and that doesn't exist on Windows as far as I know.

smedvedoff · ‎11-05-2007

Thanks Steve. I understand what you and Jim are suggesting. There probably is a point in our processing where a snapshot of the global state would be sufficient for a restart. I'll try that approach.

- Steve