If I use mpiexec to run program A, which in turn runs program B in the background, when program A returns, mpiexec hangs until program B completes. For example, if program A is the following shell script:
and program B is the following shell script:
Then even though programA returns, mpiexec does not, until sleep completes or is killed. programA shows up as defunct.
On the other hand, directly running programA from the shell it does not show up as defunct.
What is mpiexec doing or waiting on that causes this, and is there any way around it?
Could you please specify how were you launching MPI programB from ProgramA; Were you using MPI_COMM_SPAWN or you were launching through a shell script (shell script starts a program that calls MPI_INIT)?
If possible, please provide a sample reproducer with your command line, that would help us answer better.
The launch command is simply:
mpiexec -np 1 programA
In this test case, there is in fact no MPI program. I am just using the launcher to spawn programA, which is just a shell script. In the real application, programA is an actual MPI program, which spawns programB in the background, intending to leave it when it completes. In the real application, all of the MPI stuff completes, but it is waiting and shows as defunct because of programB. The simple example I sent demonstrates the essence of the issue I think.
Sorry for the delay,
We are not sure whether this is expected behaviour or not.
We are discussing with the internal team and get back to you after cross-checking with MPI 3.1 standard.
Thanks for being patient.
The behaviour for launching inside a shell script is undefined but since you are asking for a way around it.
We are transferring this query to Subject Matters experts for better support.
I have discussed this with the team here.
This seems like standard linux process behavior... unclear if there is a workaround
Is there any way you could launch the exec instead of the script?