Intel® HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2088 Discussions

Dynamically start MPI processes



I have a master/slave type MPI program, and I'd like the master to dynamically spawn the slave processes. I tried MPI_Comm_spawn, but it seems that I could only start slave processes on nodes where has already been started, ie., nodes specified in mpd.hosts. However, in my case, I'd like to assume that I don't know which nodes I will use when I start the program using mpirun. The nodes where the slave processes will run are determined at run-time.

I tried to use the hydra process manager in Intel 4.1, and MPI_Comm_spawn failed. Does hydra support spawning at all?

Could anyone give me some insight or advice on how to solve my problem? Thanks.

0 Kudos
2 Replies

Do you have a reproducer?  I have been able to use MPI_Comm_spawn with Hydra.  Let me check on the exact method for specifying a host outside of the provided host list for launching.

James Tullos
Technical Consulting Engineer
Intel® Consulting Engineer

0 Kudos

Thanks for replying.

I am attaching the test program I've been using. I compiled them using:

                mpiicpc -o master master.cpp

                mpiicpc -o worker worker.cpp


and ran it using:

                mpirun -n 1 -env MPI_UNIVERSE_SIZE 3 ./master.


The program completed successfully with impi, but when run with impi on the same nodes, I got the following output:


                universe_size = 8

                node1:2d9d:  dapl_cma_connect: rdma_connect ERR -1 Function not implemented

                [0:node1] unexpected DAPL connection event 0x4006 from 7

                Assertion failed in file ../../dapl_poll_rc.c at line 1679: 0

                internal ABORT - process 0

                APPLICATION TERMINATED WITH THE EXIT STRING: Interrupt (signal 2)


I tried with mpdboot & mpiexec, and got the same error. So it's not the hydra manager's fault. Do you know what is wrong? Thanks.


0 Kudos