Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

MPI Rank placement

jimdempseyatthecove
Honored Contributor III
803 Views

I am experimenting with MPI (world) across different systems. One system is a Xeon Phi KNL (64 cores, 256 logical processors) called KNL, and the other system is an E5-2620v2 (6 cores, 12 logical processors) called Thor.

What I intended to do is do is to partition the KNL into 2 ranks, and then run the 3rd rank on the E5-2620v2.

mpirun -n 3 -ppn 1 -hosts KNL,KNL,Thor program arg arg arg

This launches 2 ranks on KNL and 1 rank on Thor as expected

So far, so good.

Rank 0 was located on KNL
Rank 1 on Thor
Rank 2 on KNL

Aren't the nodes associated in the -hosts order?

What is the recommended way to control rank placement?

When I try 8 on KNL and 1 on Thor, the distribution is quite goofy. (likely my issue)

I tried using the : separator to sequester the KNL on left side and Thor on right side of :, but mpirun choked on the comman line.

(I am a noob at this)

Jim Dempsey

 

0 Kudos
3 Replies
jimdempseyatthecove
Honored Contributor III
803 Views

Using

mpirun -n 8 -hosts KNL program args : -n 1 -hosts Thor program args

seems to work. Getting a good grasp on the nuances of mpirun will require some time.

Jim Demspey

 

0 Kudos
Paulius_V_
Beginner
803 Views

Hey Jim, do you by any change have insight into how the ranks are placed inside the KNL node? As in, will it try to place one rank per physical core or first populate all logical cores of a physical core or is it completely random?

Thanks

0 Kudos
jimdempseyatthecove
Honored Contributor III
803 Views

The application run on KNL (and Xeon host) is both MPI and OpenMP. The BIOS configuration on KNL was set to present 4 NUMA compute nodes with MCDRAM set as cache (each node having its own LLC of 1/4 of MCDRAM). Under this configuration, the default MPI distribution on the KNL would be sparse resulting in 2 cores/ranks per NUMA node on the KNL running the MPI ranks, then each rank using OpenMP to run 32 threads (OpenMP 4.0 tasks). The Xeon Host (Thor) launched 1 process, then ran 12 OpenMP threads within that rank.

Jim Dempsey

0 Kudos
Reply