Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
1829 Discussions

Why does it take so long to complete MPI_Comm_spawn?

fgpassos
Beginner
69 Views

Hi all,

I've using MPI_Comm_spawn in my code to dynamic create only one process but it takes a long time to complete (about 15s on Intel Xeon E5620 2.40GHz). I'm doing anything else but to call MPI_Comm_spawn. My simple code is:

#include <mpi.h>
#include <stdio.h>

int main(int argc, char ** argv)
{
        int rank;
        MPI_Comm comm_parent, intercomm;
        int errcodes;
        double t0, t1;

        MPI_Init(&argc, &argv);
        MPI_Comm_get_parent(&comm_parent);

        if(comm_parent == MPI_COMM_NULL){
                // Parent process
                t0 = MPI_Wtime();
                MPI_Comm_spawn(argv[0], &argv[1], 1, MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm, &errcodes);
                t1 = MPI_Wtime();
                printf("Spawn time: %f\n", t1-t0);
        }
        else{
                // Child process
                sleep(5);
                printf("child created\n");
        }
        MPI_Finalize();
        return(0);
}

Compiling:

$ mpiicc teste_spawn2.c -o teste_spawn

Running: 

$ mpirun -n 1 -r ssh ./teste_spawn

Output:

Spawn time: 15.221280
child created

Does anyone know why?

Fernanda

0 Kudos
2 Replies
James_T_Intel
Moderator
69 Views

Hi Fernanda,

Please send the output from:

[plain]icc -V

mpirun -V

env | grep I_MPI

mpirun -n 1 -genv I_MPI_DEBUG 5 ./teste_spawn[/plain]

I'm getting a spawn time of approximately 0.21 s with icc 14.0.1.106 and IMPI 4.1 Update 2.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

James_T_Intel
Moderator
69 Views

Fernanda,

On additional testing, it appears that this occurs on every 8th rank launched on a node.  Our developers have stated that this is intentional, as a means of not overloading SSH connections.  As such, it will not be fixed.

Reply