- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
Hi
I am trying to spawn processes across nodes using intel mpi with the following code:
testmanager.py:
from mpi4py import MPI
import mpi4py
import sys
import argparse
import os
import distutils.spawn
def check_mpi():
mpiexec_path, _ = os.path.split(distutils.spawn.find_executable("mpiexec"))
for executable, path in mpi4py.get_config().items():
if executable not in ['mpicc', 'mpicxx', 'mpif77', 'mpif90', 'mpifort']:
continue
if mpiexec_path not in path:
raise ImportError("mpi4py may not be configured against the same version of 'mpiexec' that you are using. The 'mpiexec' path is {mpiexec_path} and mpi4py.get_config() returns:\n{mpi4py_config}\n".format(mpiexec_path=mpiexec_path, mpi4py_config=mpi4py.get_config()))
# if 'Open MPI' not in MPI.get_vendor():
# raise ImportError("mpi4py must have been installed against Open MPI in order for StructOpt to function correctly.")
vendor_number = ".".join([str(x) for x in MPI.get_vendor()[1]])
if vendor_number not in mpiexec_path:
print(MPI.get_vendor(), mpiexec_path)
print(MPI.get_vendor(), mpiexec_path)
#raise ImportError("The MPI version that mpi4py was compiled against does not match the version of 'mpiexec'. mpi4py's version number is {}, and mpiexec's path is {}".format(MPI.get_vendor(), mpiexec_path))
def main():
# parser = argparse.ArgumentParser()
# parser.add_argument('worker_count', type=int)
worker_count = 20
# args = parser.parse_args()
check_mpi()
mpi_info = MPI.Info.Create()
mpi_info.Set("add-hostfile", "slurm.hosts")
mpi_info.Set("host", "slurm.hosts")
#print("about to spawn")
comm = MPI.COMM_SELF.Spawn(sys.executable,
args=['testworker.py'], maxprocs=worker_count,
info=mpi_info).Merge()
process_rank = comm.Get_rank()
process_count = comm.Get_size()
process_host = MPI.Get_processor_name()
print('manager',process_rank, process_count, process_host)
main()
testworker.py:
from mpi4py import MPI
def main():
print("Spawned")
comm = MPI.Comm.Get_parent().Merge()
process_rank = comm.Get_rank()
process_count = comm.Get_size()
process_host = MPI.Get_processor_name()
print('worker', process_rank,process_count,process_host)
main()
I would like to know how to distribute the spawned processes, as when I run the job as:
mpirun -hostfile slurm.hosts -np 1 python3 ./testmanager.py
with, for example, the following slurm.hosts:
node-105:16 node-114:16 node-127:16
I end up with the manager running on a single process on node-105, and the workers running on the other nodes. If I increase the number of workers beyond that of the total number of slots in the non-manager nodes then the job hangs. I want to be able to run on all available slots on the three nodes.
Thanks!
Lien copié
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
mpi_info.Set("add-hostfile", "slurm.hosts")
is not a standard infokey, and does nothing.
mpi_info.Set("host", "slurm.hosts")
Host infokey is for a hostname or comma-delimited list of them, not a filename. You add a non-existent node "slurm.hosts" and when processes are intended to start on it, you get a hang?
Normaly, mpirun extracts a node list from SLURM, so you might not need to set that infokey at all.
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
Hi Maksim,
Thanks for the reply - where have you found the standard list of mpi_info keys for Intel mpi? I couldn't find it (add-hostfile is an openmpi key, which I was trying).
Cheers!
- S'abonner au fil RSS
- Marquer le sujet comme nouveau
- Marquer le sujet comme lu
- Placer ce Sujet en tête de liste pour l'utilisateur actuel
- Marquer
- S'abonner
- Page imprimable