Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

Dynamically start MPI processes

jackyjngwn
Beginner
955 Views

Hi,

I have a master/slave type MPI program, and I'd like the master to dynamically spawn the slave processes. I tried MPI_Comm_spawn, but it seems that I could only start slave processes on nodes where mpd.py has already been started, ie., nodes specified in mpd.hosts. However, in my case, I'd like to assume that I don't know which nodes I will use when I start the program using mpirun. The nodes where the slave processes will run are determined at run-time.

I tried to use the hydra process manager in Intel 4.1, and MPI_Comm_spawn failed. Does hydra support spawning at all?

Could anyone give me some insight or advice on how to solve my problem? Thanks.

0 Kudos
2 Replies
James_T_Intel
Moderator
955 Views

Do you have a reproducer?  I have been able to use MPI_Comm_spawn with Hydra.  Let me check on the exact method for specifying a host outside of the provided host list for launching.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Consulting Engineer

0 Kudos
jackyjngwn
Beginner
955 Views

Thanks for replying.

I am attaching the test program I've been using. I compiled them using:

                mpiicpc -o master master.cpp

                mpiicpc -o worker worker.cpp

 

and ran it using:

                mpirun -n 1 -env MPI_UNIVERSE_SIZE 3 ./master.

 

The program completed successfully with impi 4.0.2.003, but when run with impi 4.1.1.036 on the same nodes, I got the following output:

 

                universe_size = 8

                node1:2d9d:  dapl_cma_connect: rdma_connect ERR -1 Function not implemented

                [0:node1] unexpected DAPL connection event 0x4006 from 7

                Assertion failed in file ../../dapl_poll_rc.c at line 1679: 0

                internal ABORT - process 0

                APPLICATION TERMINATED WITH THE EXIT STRING: Interrupt (signal 2)

 

I tried with mpdboot & mpiexec, and got the same error. So it's not the hydra manager's fault. Do you know what is wrong? Thanks.

 

0 Kudos
Reply