Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.
2276 Discussions

Processes created by MPI_Comm_spawn don't terminate before parent process finalization

fgpassos
Beginner
3,822 Views
I've created a child process by MPI_Comm_spawn and I need it really terminates (don't exist anymore) before parent process finalization. I can't find any reason to a child process still being alive after MPI_Finalize. It's a logical bug implementation? I mean, most of other mpi implementation doesn't present this behavior.

Tanks,
Fernanda
0 Kudos
1 Solution
Dmitry_K_Intel2
Employee
3,822 Views
Hi Fernanda,

Might be you just need to use MPI_Comm_disconnect()? Something like:
if(comm_parent == MPI_COMM_NULL){
MPI_Comm_rank (MPI_COMM_WORLD, &rank);
MPI_Comm_spawn(argv[0], &argv[1], 1, MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm, &errcodes);
MPI_Comm_disconnect(&intercomm);
sleep(15);

}
else{
MPI_Comm_rank (MPI_COMM_WORLD, &rank);
sleep(5);
MPI_Comm_disconnect(&comm_parent);
}


Regards!
Dmitry

View solution in original post

0 Kudos
8 Replies
Dmitry_K_Intel2
Employee
3,822 Views
Hi Fernanda,

Could you please clarify a bit or might be provide a test case.
Do you call MPI_Abort() to terminate the process. Do you terminate parent process by 'kill -signal'? Do you have anything in the code after MPI_Finalize()?
Strictly speaking, MPI_Finalize() is a collective operation and each process should call this function and MPI communication is not allowed after that.

Regards!
Dmitry
0 Kudos
fgpassos
Beginner
3,822 Views
Hi,

My code is simple:
#include
#include

int main(int argc, char ** argv)
{
int rank;
MPI_Comm comm_parent, intercomm;
int errcodes;

MPI_Init(&argc, &argv);
MPI_Comm_get_parent(&comm_parent);
if(comm_parent == MPI_COMM_NULL){
// Parent process
MPI_Comm_spawn(argv[0], &argv[1], 1, MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm, &errcodes);
sleep(15);
printf("parent finalizes\n");
}
else{
// Child process
sleep(5);
printf("child finalizes\n");
}
MPI_Finalize();
return(0);
}

I ran it with top in the same machine and in another terminal. I noticed that child process still stay in top after 5 seconds (in sleeping status). It only really terminates when parent process finalizes.
I also noticed that it's not happen with another mpi intel build (but same version). In this case, child process terminates correctly, after past 5 seconds (so, there's no child process in top).

At the first execution I used build/version: Intel MPI Library for Linux Version 4.0 Build 20100422 Platform Intel 64 64-bit applications.
The second one was: Intel MPI Library for Linux Version 4.0 Update 1 Build 20100818 Platform Intel 64 64-bit applications.

Ok. It seems to be a difference between both mpi intel builds.
Can anyone confirm this?

Thanks,
Fernanda
0 Kudos
Dmitry_K_Intel2
Employee
3,822 Views
Fernanda,

Could you try to use 'ps' utility instead?
$ mpiicc -o spawn spawn_test.c
$ mpiexec -n 1 ./spawn
In another terminal window:
$ ps ux | grep spawn

You'll see that child process exists till finalization in any implementation.

Regards!
Dmitry
0 Kudos
fgpassos
Beginner
3,822 Views
Hi,

I have to disagree. LAM/MPI and Open MPI, for instance, don't have this behavior. Child process does not exist after finalization. Besides, I didn't have this problem using Intel MPI 4.0 Update 1 too.

Using the same code posted previously and including "sleep(5);" before MPI_Comm_spawn call, I can prove it by these executions:

=====================================================
Using Open MPI 1.3.3:

[fgoliveira@rio1 testes]$ mpirun -V
mpirun (Open MPI) 1.3.3

Report bugs to http://www.open-mpi.org/community/help/
[fgoliveira@rio1 testes]$ mpicc teste_spawn.c -o teste_spawn
[fgoliveira@rio1 testes]$ mpirun -n 1 ./teste_spawn & sleep 2; ps ux | grep spawn; sleep 4; echo "-----"; ps ux | grep spawn; sleep 5; echo "-----"; ps ux | grep spawn;
[1] 17201
13169 17201 0.0 0.0 52276 2468 pts/1 S 13:32 0:00 mpirun -n 1 ./teste_spawn
13169 17203 1.0 0.0 92732 3536 pts/1 S 13:32 0:00 ./teste_spawn
13169 17205 0.0 0.0 61176 724 pts/1 S+ 13:32 0:00 grep spawn
-----
13169 17201 0.0 0.0 52276 2484 pts/1 S 13:32 0:00 mpirun -n 1 ./teste_spawn
13169 17203 0.6 0.1 92732 4128 pts/1 S 13:32 0:00 ./teste_spawn
13169 17207 2.0 0.1 92736 4108 pts/1 S 13:32 0:00 ./teste_spawn
13169 17209 0.0 0.0 61172 720 pts/1 S+ 13:33 0:00 grep spawn
child finalizes
-----
13169 17201 0.0 0.0 52276 2508 pts/1 S 13:32 0:00 mpirun -n 1 ./teste_spawn
13169 17203 0.3 0.1 92732 4128 pts/1 S 13:32 0:00 ./teste_spawn
13169 17231 0.0 0.0 61176 724 pts/1 S+ 13:33 0:00 grep spawn
[fgoliveira@rio1 testes]$ parent finalizes

[1]+ Done mpirun -n 1 ./teste_spawn
[fgoliveira@rio1 testes]$

=====================================================
Using MPI Intel 4.0:

[fgoliveira@gsn08 ~]$ mpirun -V
Intel MPI Library for Linux Version 4.0
Build 20100422 Platform Intel 64 64-bit applications
Copyright (C) 2003-2010 Intel Corporation. All rights reserved
[fgoliveira@gsn08 ~]$ mpicc teste_spawn.c -o teste_spawn
[fgoliveira@gsn08 ~]$ mpirun -n 1 ./teste_spawn & sleep 2; ps ux | grep spawn; sleep 4; echo "-----"; ps ux | grep spawn; sleep 5; echo "-----"; ps ux | grep spawn;
[1] 459
WARNING: Unable to read mpd.hosts or list of hosts isn't provided. MPI job will be run on the current machine only.
503 459 0.0 0.0 63948 1260 pts/0 S 13:32 0:00 /bin/bash /opt/intel/impi/4.0.0.028/intel64/bin/mpirun -n 1 ./teste_spawn
503 496 4.0 0.0 138684 9700 pts/0 S 13:32 0:00 python /opt/intel/impi/4.0.0.028/intel64/bin/mpiexec -n 1 ./teste_spawn
503 499 0.0 0.0 33932 2156 ? S 13:32 0:00 ./teste_spawn
503 501 0.0 0.0 63204 772 pts/0 S+ 13:32 0:00 grep spawn
-----
503 459 0.0 0.0 63948 1260 pts/0 S 13:32 0:00 /bin/bash /opt/intel/impi/4.0.0.028/intel64/bin/mpirun -n 1 ./teste_spawn
503 496 1.3 0.0 138684 9700 pts/0 S 13:32 0:00 python /opt/intel/impi/4.0.0.028/intel64/bin/mpiexec -n 1 ./teste_spawn
503 499 0.0 0.0 33932 2428 ? S 13:32 0:00 ./teste_spawn
503 504 0.0 0.0 33932 2408 ? S 13:32 0:00 ./teste_spawn
503 506 0.0 0.0 63204 772 pts/0 S+ 13:32 0:00 grep spawn
child finalizes
-----
503 459 0.0 0.0 63948 1260 pts/0 S 13:32 0:00 /bin/bash /opt/intel/impi/4.0.0.028/intel64/bin/mpirun -n 1 ./teste_spawn
503 496 0.7 0.0 138688 9704 pts/0 S 13:32 0:00 python /opt/intel/impi/4.0.0.028/intel64/bin/mpiexec -n 1 ./teste_spawn
503 499 0.0 0.0 33932 2428 ? S 13:32 0:00 ./teste_spawn
503 504 4.8 0.0 33932 2412 ? R 13:32 0:00 ./teste_spawn
503 509 0.0 0.0 63204 772 pts/0 S+ 13:32 0:00 grep spawn
[fgoliveira@gsn08 ~]$ parent finalizes

[1]+ Done mpirun -n 1 ./teste_spawn
[fgoliveira@gsn08 ~]$

=====================================================
Using MPI Intel 4.0 Update 1:

[fgoliveira@rio1 testes]$ mpirun -V
Intel MPI Library for Linux Version 4.0 Update 1
Build 20100818 Platform Intel 64 64-bit applications
Copyright (C) 2003-2010 Intel Corporation. All rights reserved
[fgoliveira@rio1 testes]$ mpicc teste_spawn.c -o teste_spawn
[fgoliveira@rio1 testes]$ mpirun -n 1 ./teste_spawn & sleep 2; ps ux | grep spawn; sleep 4; echo "-----"; ps ux | grep spawn; sleep 5; echo "-----"; ps ux | grep spawn;
[1] 17736
WARNING: Unable to read mpd.hosts or list of hosts isn't provided. MPI job will be run on the current machine only.
13169 17736 0.0 0.0 63996 1308 pts/1 S 13:55 0:00 /bin/bash /opt/intel/compilerpro-12.0.1.107/mpirt/bin/intel64/mpirun -n 1 ./teste_spawn
13169 17773 4.0 0.2 138768 9796 pts/1 S 13:55 0:00 python /opt/intel/compilerpro-12.0.1.107/mpirt/bin/intel64/mpiexec -n 1 ./teste_spawn
13169 17775 0.5 0.0 92740 3520 ? S 13:55 0:00 ./teste_spawn
13169 17778 0.0 0.0 61172 720 pts/1 S+ 13:55 0:00 grep spawn
-----
13169 17736 0.0 0.0 63996 1308 pts/1 S 13:55 0:00 /bin/bash /opt/intel/compilerpro-12.0.1.107/mpirt/bin/intel64/mpirun -n 1 ./teste_spawn
13169 17773 1.3 0.2 138768 9796 pts/1 S 13:55 0:00 python /opt/intel/compilerpro-12.0.1.107/mpirt/bin/intel64/mpiexec -n 1 ./teste_spawn
13169 17775 0.6 0.1 92740 4112 ? S 13:55 0:00 ./teste_spawn
13169 17780 2.0 0.1 92736 4108 ? S 13:55 0:00 ./teste_spawn
13169 17782 0.0 0.0 61176 724 pts/1 S+ 13:55 0:00 grep spawn
child finalizes
-----
13169 17736 0.0 0.0 63996 1308 pts/1 S 13:55 0:00 /bin/bash /opt/intel/compilerpro-12.0.1.107/mpirt/bin/intel64/mpirun -n 1 ./teste_spawn
13169 17773 0.7 0.2 138772 9800 pts/1 S 13:55 0:00 python /opt/intel/compilerpro-12.0.1.107/mpirt/bin/intel64/mpiexec -n 1 ./teste_spawn
13169 17775 0.3 0.1 92740 4112 ? S 13:55 0:00 ./teste_spawn
13169 17785 0.0 0.0 61176 728 pts/1 S+ 13:55 0:00 grep spawn
[fgoliveira@rio1 testes]$ parent finalizes

[1]+ Done mpirun -n 1 ./teste_spawn
[fgoliveira@rio1 testes]$


I'm being annoying because I need this functionality in Intel MPI. I see no reason why the child process continues to exist after its finalization.

Thanks,
Fernanda
0 Kudos
Dmitry_K_Intel2
Employee
3,823 Views
Hi Fernanda,

Might be you just need to use MPI_Comm_disconnect()? Something like:
if(comm_parent == MPI_COMM_NULL){
MPI_Comm_rank (MPI_COMM_WORLD, &rank);
MPI_Comm_spawn(argv[0], &argv[1], 1, MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm, &errcodes);
MPI_Comm_disconnect(&intercomm);
sleep(15);

}
else{
MPI_Comm_rank (MPI_COMM_WORLD, &rank);
sleep(5);
MPI_Comm_disconnect(&comm_parent);
}


Regards!
Dmitry
0 Kudos
fgpassos
Beginner
3,822 Views
Ok. It seems to work.
It is not the same solution as MPI/LAM and Open MPI, but it works.

I hope next version don't have this behavior as well as MPI Intel version 4 Update 1. It will be great!

Thanks!
Fernanda
0 Kudos
Dmitry_K_Intel2
Employee
3,822 Views
Ferbanda,
Please don't expect any changes related to behavior of MPI_Finalize(). The difference in behavior between different Intel MPI versions is very strange. I could not reproduce it with any version (even with upcoming 4.0 update 3) but I work on RHEL and it seems to me that you are using SuSe.

Regards!
Dmitry
0 Kudos
fgpassos
Beginner
3,822 Views
Ok, Dmitry.
I'm using CentOS, but I don't believe S.O. affects the results, in this case.
My opinion is that child process should not exist after finalization because I've been using LAM/MPI and Open MPI. I do not find any description about the process spawned finalization on MPI-forum. So, I don't know exactly what is correct.
Anyway, your solution can help me specifically in my implementation.
However, in your solution, if I want to use a communication between parent and child processes, I would have to implement a termination algorithm like sending a message of end to parent (then, parent process could use MPI_disconnect).

Fernanda
0 Kudos
Reply