Community
cancel
Showing results for 
Search instead for 
Did you mean: 
372 Views

MPI spawn placement of processes

Hi

I am trying to spawn processes across nodes using intel mpi with the following code:

testmanager.py:

 

from mpi4py import MPI
import mpi4py
import sys
import argparse
import os
import distutils.spawn

def check_mpi():
    mpiexec_path, _ = os.path.split(distutils.spawn.find_executable("mpiexec"))
    for executable, path in mpi4py.get_config().items():
        if executable not in ['mpicc', 'mpicxx', 'mpif77', 'mpif90', 'mpifort']:
             continue
        if mpiexec_path not in path:
             raise ImportError("mpi4py may not be configured against the same version of 'mpiexec' that you are using. The 'mpiexec' path is {mpiexec_path} and mpi4py.get_config() returns:\n{mpi4py_config}\n".format(mpiexec_path=mpiexec_path, mpi4py_config=mpi4py.get_config()))
#        if 'Open MPI' not in MPI.get_vendor():
#           raise ImportError("mpi4py must have been installed against Open MPI in order for StructOpt to function correctly.")
        vendor_number = ".".join([str(x) for x in MPI.get_vendor()[1]])
        if vendor_number not in mpiexec_path:
           print(MPI.get_vendor(), mpiexec_path)
        print(MPI.get_vendor(), mpiexec_path)

           #raise ImportError("The MPI version that mpi4py was compiled against does not match the version of 'mpiexec'. mpi4py's version number is {}, and mpiexec's path is {}".format(MPI.get_vendor(), mpiexec_path))



def main():
#    parser = argparse.ArgumentParser()
#    parser.add_argument('worker_count', type=int)
    worker_count = 20
#    args = parser.parse_args()
    check_mpi()
    mpi_info = MPI.Info.Create()
    mpi_info.Set("add-hostfile", "slurm.hosts")
    mpi_info.Set("host", "slurm.hosts")

    #print("about to spawn")
    comm = MPI.COMM_SELF.Spawn(sys.executable,
                               args=['testworker.py'], maxprocs=worker_count,
                               info=mpi_info).Merge()
    process_rank = comm.Get_rank()
    process_count = comm.Get_size()
    process_host = MPI.Get_processor_name()
    print('manager',process_rank, process_count, process_host)

main()

testworker.py:

from mpi4py import MPI

def main():
    print("Spawned")
    comm = MPI.Comm.Get_parent().Merge()

    process_rank = comm.Get_rank()
    process_count = comm.Get_size()
    process_host = MPI.Get_processor_name()

    print('worker', process_rank,process_count,process_host)

main()

 

I would like to know how to distribute the spawned processes, as when I run the job as:

mpirun -hostfile slurm.hosts -np 1 python3 ./testmanager.py
 

with, for example, the following slurm.hosts:

 

node-105:16
node-114:16
node-127:16

I end up with the manager running on a single process on node-105, and the workers running on the other nodes. If I increase the number of workers beyond that of the total number of slots in the non-manager nodes then the job hangs. I want to be able to run on all available slots on the three nodes.

Thanks!

 

0 Kudos
2 Replies
Maksim_B_Intel
Employee
372 Views

mpi_info.Set("add-hostfile", "slurm.hosts")

is not a standard infokey, and does nothing. 

mpi_info.Set("host", "slurm.hosts")

Host infokey is for a hostname or comma-delimited list of them, not a filename. You add a non-existent node "slurm.hosts" and when processes are intended to start on it, you get a hang?

Normaly, mpirun extracts a node list from SLURM, so you might not need to set that infokey at all.

372 Views

Hi Maksim, 

 

Thanks for the reply - where have you found the standard list of mpi_info keys for Intel mpi? I couldn't find it (add-hostfile is an openmpi key, which I was trying).

Cheers!

 

Reply