Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2229 Discussions

How to free a MPI communicator created w MPI_Comm_spawn

Florentino_S_
Beginner
2,082 Views

Hi,

I'm trying to free a communicator created with this call:

int MPI_Comm_spawn(char *command, char *argv[], int maxprocs,

    MPI_Info info, int root, MPI_Comm comm,
    MPI_Comm *intercomm, int array_of_errcodes[]) <-- The comunicator created it's intercomm

As far as I know, according to the standard, MPI_Free is a collective operation, although they suggest to implement it locally, however on Intel MPI it's a collective operation (according to my own experience and to http://software.intel.com/sites/products/documentation/hpc/ics/itac/81/ITC_Reference_Guide/Freeing_Communicators.htm ).

However I have a problem here, father/spawners process/es will have a communicator which contains his sons, and the spawned processes/sons will have the communicator which contains the masters.

How I can free the communicator of the master with this layout? I know that I can create a new communicator with both sons and masters and free with that, but then that won't be the same communicator that I want to free.

Thanks beforehand,

0 Kudos
1 Solution
James_T_Intel
Moderator
2,082 Views

Hi Florentino,

If you want to free the spawned communicator, simply call MPI_Comm_free from all of the spawning ranks and from all of the spawned ranks.  You can call MPI_Comm_free from less ranks, but this will only remove the reference in that rank.  The communicator will exist until all references are freed, and is still usable as long as all necessary references remain.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

View solution in original post

0 Kudos
4 Replies
James_T_Intel
Moderator
2,083 Views

Hi Florentino,

If you want to free the spawned communicator, simply call MPI_Comm_free from all of the spawning ranks and from all of the spawned ranks.  You can call MPI_Comm_free from less ranks, but this will only remove the reference in that rank.  The communicator will exist until all references are freed, and is still usable as long as all necessary references remain.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

0 Kudos
Florentino_S_
Beginner
2,082 Views

Hi,

Okay, I test that, however my previous test a free in only one of the masters and it would hang. I'll check it tomorrow freeing from all the nodes. I understand that in the "sons" I have to free the MPI_Comm_get_parent (will try this tomorrow).

Thanks,

0 Kudos
James_T_Intel
Moderator
2,082 Views

Interesting.  I was able to launch 2 ranks that spawned 2 new ranks.  I could free the communicator from parent rank 1 and still send data from parent rank 0 to both child ranks.  I was then able to free the remaining communicator from the parent rank and from both child ranks.  I did not encounter a hang in any scenario.  That behavior might change on Linux*, I was testing on Windows*.

0 Kudos
Florentino_S_
Beginner
2,082 Views

Hi again, thanks for testing that.

You are right, it works perfectly. I also had a few tests in which it worked, but my "real app" (which uses MPI Comm spawn after a few layers of my "own" middle-ware) was using multi-level spawns (one spawned process uses mpi comm spawn), so I thought the problem was related to this, at the end I built a more complex test app and it was working correctly too.

After a few more tests I realised I had "small" bug in my software, I was passing a NULL parameter to the MPI_Comm_free MPI_Comm_free(NULL) in the second level of Spawns (aka, Master --> Spawned LV1 --> Spawn LV2 (this) );

I've tested that behaviour in my test app and there is something "very" strange:

1- If I use "export I_MPI_DEBUG=3", MPI_Comm_free(0) HANGS (and leaves quite a few un-killable zombie processes), 

2- If I don't use "export I_MPI_DEBUG=3", MPI_Comm_free(0) crashes, execution fails and no zombie processes are left. 

Maybe you want to take a look into this behaviour for correctness, anyway I understand that the problem was of my/user code.

 

I'm sorry for the inconveniences, after fixing my bug so the user code is "correct", everything is working correctly. 

 

 
Apart from this I have a minor bug with correct user code, I always receive a "*** glibc detected *** mpiexec.hydra: free(): invalid pointer: 0x00007fec54787ee8 ***" after my program finishes and IMPI library is cleaning when using multi-level spawns (you will see if you test the previous case I suppose). But that's not a problem (at least for me, although it looks ugly) as processes get killed correctly.

If you want to investigate this issues, I attach my test file.

mpiicc helloworld_x86.c -o hello.out.x86_64 -mt_mpi (mt_mpi is optional and not needed to reproduce the problem)

export I_MPI_DEBUG=3
export I_MPI_PIN_MODE=mpd
mpirun -n 1 -host $HOST_ADDRESS ./hello.out.x86_64

Regards (all these tests are done on Linux, but bugs here are minor).

0 Kudos
Reply