Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

Trying to use I_MPI_PIN_DOMAIN=socket

William_N_
Beginner
909 Views

I'm running on an IBM cluster with nodes that have dual socket Ivy Bridge processors and 2 Nvidia K40 Tesla cards.  I'm trying to run with 4 MPI ranks using Intel MPI 5 Update 2 with a single MPI rank for each socket.  I'm trying to learn how to do this by using a simple MPI Hello World program that prints out the host name, rank and cpu ID.  When I run with 2 MPI ranks, my simple program works as expected.  When I run with 4 MPI ranks and use the mpirun that comes with Intel MPI, all 4 ranks run on the same node that I launched from.  I am doing this interactively and get a set of two nodes using the following command:

qsub -I -l nodes=2,ppn=16 -q k20

I am using the following commands to run my program:

source /opt/intel/bin/compilervars.sh intel64; \
source /opt/intel/impi_latest/intel64/bin/mpivars.sh; \
export I_MPI_DAPL_PROVIDER=ofa-v2-mlx4_0-1u; \
/opt/intel/impi_latest/intel64/bin/mpirun -genv I_MPI_PIN=1 -genv I_MPI_PIN_DOMAIN=socket -n 4 hw_ibm_impi

If I use a different qsub command, i.e. qsub -I -l nodes=2,ppn=2 -q k20, the program runs as expected with 2 ranks on each node.  But that does not seem the right way to get my node allocation if I want to also run threads from each MPI rank.  Also, using my initial qsub command, I can run with 32 ranks and 16 ranks per host and the application runs as expected.

I can also try using the Intel mpiexec command instead of mpirun and I get the following result:

source /opt/intel/bin/compilervars.sh intel64; \
source /opt/intel/impi_latest/intel64/bin/mpivars.sh; \
export I_MPI_DAPL_PROVIDER=ofa-v2-mlx4_0-1u; \
/opt/intel/impi_latest/intel64/bin/mpiexec -genv I_MPI_PIN=1 -genv I_MPI_PIN_DOMAIN=socket -n 4 hw_ibm_impi
mpiexec_ibm-011: cannot connect to local mpd (/tmp/mpd2.console_username); possible causes:
  1. no mpd is running on this host
  2. an mpd is running but was started without a "console" (-n option)

Any ideas why this is not working?  Am I not using I_MPI_PIN_DOMAIN correctly?  Could there be something messed up with the Intel MPI installation on the cluster?  Or some problem with the installation of the scheduler?

Thanks,

Dave

 

0 Kudos
3 Replies
TimP
Honored Contributor III
909 Views

Ppn=16 requires 16 ranks on your first node before allocating ranks (processes) to the next node, so I'm not surprised at the result you got. Ppn=2 seems to express your intent.

the default I_mpi_pin_domain =auto normally works well for mpi_funneled  mode when threads per rank is set by omp_num_threads.  You don't even need to call mpi_init_thread although mpi standard says you should.  If you do odd things like mismatch of numbers of threads and  cores it could be useful to check what Intel openmp sees by kmp_affinity =verbose and mpi by i_mpi_debug=5

Setting i_mpi_pin_domain=socket allows ranks and threads to move within the socket where they start but doesn't appear to do anything for your expressed intent.

in order to use mpiexec if your installation supports, you would first run mpdboot.

0 Kudos
William_N_
Beginner
909 Views

Is anyone able to provide some help on how to do what I described in my original post or to provide advice on how to troubleshoot the problem?  It seems that I should be able to reserve all the cores on a node and then run with a smaller number of MPI ranks than cores so that I can then run with threads on the other cores in an MPI+X fashion.  I have done that on other clusters using other implementations of MPI.  I need to somehow figure out how to resolve this issue.

Thanks for any help,

Dave

 

0 Kudos
Gergana_S_Intel
Employee
909 Views

Hey Dave,

If I understand correctly, you want to specify how many MPI ranks to run on a single node.  By default, Intel MPI will use up all cores available on a machine before going to the next one on the list.  That's why the 4 MPI ranks you start are all put on the same host.

To overwrite this behavior, use the -perhost option for mpirun or set the I_MPI_PERHOST environment variable to an integer value.  In the run script you provide, you can either do:

export I_MPI_DAPL_PROVIDER=ofa-v2-mlx4_0-1u; \
export I_MPI_PERHOST=2; \
/opt/intel/impi_latest/intel64/bin/mpirun -genv I_MPI_PIN=1 -genv I_MPI_PIN_DOMAIN=socket -n 4 hw_ibm_impi

or:

export I_MPI_DAPL_PROVIDER=ofa-v2-mlx4_0-1u; \
/opt/intel/impi_latest/intel64/bin/mpirun -perhost 2 -genv I_MPI_PIN=1 -genv I_MPI_PIN_DOMAIN=socket -n 4 hw_ibm_impi

Both are valid and will put 2 MPI processes on each node.

Let me know if this helps what you're trying to do.

Regards,
~Gergana

0 Kudos
Reply