Trying to embed a parallel MPI call into a serial FORTAN main program

rdees · ‎04-03-2009

This is my first post to this forum, I hope that I'm posting to the proper place.

I've got two FORTRAN standalone programs that work together in a LINUX environment. The first is is serial main program, and the second is an MPI parallel program.I'm using MPI 1 not MPI 2. Currently I run them together like this:

================

PROGRAM MAIN
statement 1
statement 3
.
.
.
do i=1,5
call sys_system('mpirun -np4 program2.exe.exe',io)
enddo
.
.
statement 3
statement 4
stop
end

END OF MAIN PROGRAM

================

I would like to have program 2 embedded in the main program. I have an idea of how to do it, but, I don't think that it is a good solution. I hope that you can help.

My idea is to make program 2 a subroutine within the main program. If I do this, then I must execute the entire program with an mpirun -np XXX command. This is a problem because the main program will tie up too many processors.

Can I somehow make program 2 a module that can be called (using mpirun each time) repeatedly?

Ron_Green · ‎04-03-2009

I see exactly what you want. You have a serial section of code, a parallel section, followed by another serial section. And you don't want to waste the N nodes during the serial sections.

Well, one way is to split this into 3 programs:

PROGRAM MAIN
statement 1
statement 3
end program MAIN

Program two
...parallel code...
end program two

program three
statement 3
statement 4
stop
end

Then you'd
mpirun -n 1 main
mpirun -n N two
mpirun -n 1 three

If you have PBS or LSF batch queing available on your cluster you should investigate CHAINING or making dependent jobs. It's ideal for these situations where you have a series of jobs that are interdependent (don't want to run TWO if MAIN fails, for example. Nor do you want to run three if two fails.) CHAINING of dependent jobs is perfect for this scenario.

Now the downside is that you've chopped THREE from MAIN, so any setup and init in MAIN will need to be duplicated in THREE. So there is some duplication of code.

But you are right, if the work done in MAIN and THREE is substantial, you will not want to tie up N-1 nodes while all this serial nonsense is going on.

This is one possible solution I've seen for these cases.

ron

rdees · ‎04-03-2009

Quoting - Ronald Green (Intel)

I see exactly what you want. You have a serial section of code, a parallel section, followed by another serial section. And you don't want to waste the N nodes during the serial sections.

Well, one way is to split this into 3 programs:

PROGRAM MAIN
statement 1
statement 3
end program MAIN

Program two
...parallel code...
end program two

program three
statement 3
statement 4
stop
end

Then you'd
mpirun -n 1 main
mpirun -n N two
mpirun -n 1 three

If you have PBS or LSF batch queing available on your cluster you should investigate CHAINING or making dependent jobs. It's ideal for these situations where you have a series of jobs that are interdependent (don't want to run TWO if MAIN fails, for example. Nor do you want to run three if two fails.) CHAINING of dependent jobs is perfect for this scenario.

Now the downside is that you've chopped THREE from MAIN, so any setup and init in MAIN will need to be duplicated in THREE. So there is some duplication of code.

But you are right, if the work done in MAIN and THREE is substantial, you will not want to tie up N-1 nodes while all this serial nonsense is going on.

This is one possible solution I've seen for these cases.

ron

Thanks for your reply Ron.

I forgot to mention one important issue. I really want to end up with only one single executable program. I give out my executables to others, so, in my present state I must give them two executables. It would be much neater if I could give them only a single executable.

I also should of mentioned that there are many, many, many variables that the main program holds in memory (while it is waiting for the parallel program 2 to finish, so .... it would be tedious to write out all the variables before calling program2, and, then reading them back in.

Please let me know if you think of something more ...

Ron_Green · ‎04-03-2009

Ah, I see. Well, with those restrictions your first solution is the only one. You have to make the parallel code into a subroutine that is called from MAIN. And the code will have to use N processes for the entire time from MAIN to STOP. The MPIRUN or MPIEXEC allocations are fixed, you cannot grab 1 node, add N-1 at some later point, relinquish N-1 dynamically.

On the upside, including the parallel code into the MAIN as a subroutine will be quite easy to code. And afterwards you will only have one code and executable to manage with your users. Then over time perhaps find ways to speed up or parallelize the pre-processing serial code and the post-processing serial code.

ron