Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
2275 Discussions

Why does it take so long to complete MPI_Comm_spawn?

fgpassos
Beginner
1,595 Views

Hi all,

I've using MPI_Comm_spawn in my code to dynamic create only one process but it takes a long time to complete (about 15s on Intel Xeon E5620 2.40GHz). I'm doing anything else but to call MPI_Comm_spawn. My simple code is:

#include <mpi.h>
#include <stdio.h>

int main(int argc, char ** argv)
{
        int rank;
        MPI_Comm comm_parent, intercomm;
        int errcodes;
        double t0, t1;

        MPI_Init(&argc, &argv);
        MPI_Comm_get_parent(&comm_parent);

        if(comm_parent == MPI_COMM_NULL){
                // Parent process
                t0 = MPI_Wtime();
                MPI_Comm_spawn(argv[0], &argv[1], 1, MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm, &errcodes);
                t1 = MPI_Wtime();
                printf("Spawn time: %f\n", t1-t0);
        }
        else{
                // Child process
                sleep(5);
                printf("child created\n");
        }
        MPI_Finalize();
        return(0);
}

Compiling:

$ mpiicc teste_spawn2.c -o teste_spawn

Running: 

$ mpirun -n 1 -r ssh ./teste_spawn

Output:

Spawn time: 15.221280
child created

Does anyone know why?

Fernanda

0 Kudos
2 Replies
James_T_Intel
Moderator
1,595 Views

Hi Fernanda,

Please send the output from:

[plain]icc -V

mpirun -V

env | grep I_MPI

mpirun -n 1 -genv I_MPI_DEBUG 5 ./teste_spawn[/plain]

I'm getting a spawn time of approximately 0.21 s with icc 14.0.1.106 and IMPI 4.1 Update 2.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

0 Kudos
James_T_Intel
Moderator
1,595 Views

Fernanda,

On additional testing, it appears that this occurs on every 8th rank launched on a node.  Our developers have stated that this is intentional, as a means of not overloading SSH connections.  As such, it will not be fixed.

0 Kudos
Reply