I have an mpi program that calls another mpi program (written by someone else) using a fortran system call
system('mpirun -np 2 ./mpi_prog.x')
When I run it (e.g. mpirun -np 4 ./driver.x) it crashes inside mpi_prog.x (at a barrier). When I build it using mpich it works fine though. Any hints on what might be wrong (I realize nested mpirun's are completely beyond the mpi standard and highly dependent on implementation)
NOTE: when I do something like:
mpirun -hosts hostA,hostB -perhost 1 -np 2 mpirun -np 4 hostname
it works perfectly fine and returns the correct output.
good that you realize that this violates the MPI standard in so many ways.
Luckily, there is a solution in MPI-2 and it's called MPI_COMM_SPAWN:
integer ierr,intercomm integer errcodes(2) call MPI_COMM_SPAWN("./mpi_prog.x","",2,MPI_INFO_NULL,0,MPI_COMM_WORLD,intercomm,errcodes,ierr) if (.not.all(errcodes.eq.MPI_SUCCESS)) then print*,'Error!' call MPI_Abort(MPI_COMM_WORLD,1,ierr) endif
Quite some magic happens in the MPI_Init routine, so I expect your NOTE seems to work fine because you're not running an actual MPI-program.