I am very confused with the Intel MPI and OpenMP mapping and I was hopping someone could help me to understand. I'll describe my problem with an example.
I want to run an application (LAMMPS) which uses MPI and OpenMP. The way that OpenMP is used here is via SMT and through the LAMMPS Intel package. The target architecture (compute node) is a 2x socket, 32 core x socket, 2 SMT x core AMD chips.
* Full node (64 MPI tasks, 2 OMP threads)
export OMP_NUM_THREAD=2
mpirun -np 64 -pk intel 0 omp 2 -sf intel ./lmp
I think the above is correct, but how can I achieve the following:
* Half node ( split into 2 sockets, 16 each)
export OMP_NUM_THREADS=2
mpirun -np 32 -pk intel 0 omp 2 -sf intel ...
* Half node (all in 1 socket)
export OMP_NUM_THREADS=2
mpirun -np 32 -pk intel 0 omp 2 -sf intel ./lmp
I have been playing with the following variables, but I have not yet been able to do it:
I_MPI_PIN_DOMAIN
I_MPI_PIN_PROCESSOR_EXCLUDE_LIST
I_MPI_PIN_PROCESSOR_LIST
KMP_AFFINITY
Link Copied
Hi Micheal,
I am sorry for the late reply.
As you have mentioned that you wanted to distribute the MPI ranks based on sockets.
As per your two use cases :
I) Half node ( split into 2 sockets, 16 each)
In this case, as you want to distribute 16 ranks on each socket by launching a total of 32 ranks. There is no need to do pinning as this the default behavior.
2)Half node (all in 1 socket)
To launch all threads in a single socket try this command
I_MPI_PIN_DOMAIN=socket:compact I_MPI_DEBUG=10 OMP_NUM_THREADS=2 mpirun -n 32 -ppn 2 ./a.out
For more information on how to use I_MPI_PIN_DOMAIN, please refer (Interoperability with OpenMP* API (intel.com))
Let us know if you face problems.
Regards
Prasanth
For more complete information about compiler optimizations, see our Optimization Notice.