Showing results for 
Search instead for 
Did you mean: 

Running MPI jobs from inside Singularity container with Intel MPI 2019.5

Hi All,

As per the recent webinar introducing new Intel MPI 2019 update 5 features, it is now in theory possible to include Intel MPI libaries, and call mpirun for a multi-node MPI job entirely inside a Singularity container, with no need to have Intel MPI installed outside the container. So instead of launching an MPI job in a container using an external MPI stack, like so:

     mpirun -n <nprocs> -perhost <procs_per_node> -hosts <hostlist> singularity exec <container_name> <path_to_executable_inside_container>

one should now be able to do:

    singularity exec <container_name> mpirun -n <nprocs> -perhost <procs_per_node> -hosts <hostlist> <path_to_executable_inside_container>

I have the Intel MPI 2019.5 libraries (as well as Intel run-time libraries for C++), plus libfabric, inside my container, along with sourcing the following in the container:

cat /.singularity.d/env/ 
# Custom environment shell code should follow
    source /opt/intel/bin/ intel64
    source /opt/intel/impi/2019.5.281/intel64/bin/ -ofi_internal=1 release

This is not working so far. Below I illustrate with a simple test, and run from inside the container (shell mode), and get the following error messages after about 20-30 seconds of the command just hanging with no output:

Singularity image.sif:~/singularity/fv3-upp-apps> export I_MPI_DEBUG=500
Singularity image.sif:~/singularity/fv3-upp-apps> export FI_PROVIDER=verbs
Singularity image.sif:~/singularity/fv3-upp-apps> export FI_VERBS_IFACE="ib0"
Singularity image.sif:~/singularity/fv3-upp-apps> export I_MPI_FABRICS=shm:ofi
Singularity image.sif:~/singularity/fv3-upp-apps> mpirun -n 78 -perhost 20 -hosts appro07,appro08,appro09,appro10 hostname 
[] check_exit_codes (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:114): unable to run proxy on appro07 (pid 109898)
[] poll_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:152): check exit codes error
[] HYD_dmx_poll_wait_for_proxy_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:205): poll for event error
[] HYD_bstrap_setup (../../../../../src/pm/i_hydra/libhydra/bstrap/src/intel/i_hydra_bstrap.c:731): error waiting for event
[] main (../../../../../src/pm/i_hydra/mpiexec/mpiexec.c:1919): error setting up the boostrap proxies

I also tried just calling mpirun using just one host (and only enough processes that fit on one host), with the same result.

Is there a specific list of dependencies (e.g. do I need openssh-clients installed?) to use this all-inside-the-container approach? I do not see anything within the Intel MPI 2019 upsate 5 Developer Reference about running with Singularity containers.


Thanks, Keith

0 Kudos
0 Replies