I am experimenting with MPI (world) across different systems. One system is a Xeon Phi KNL (64 cores, 256 logical processors) called KNL, and the other system is an E5-2620v2 (6 cores, 12 logical processors) called Thor.
What I intended to do is do is to partition the KNL into 2 ranks, and then run the 3rd rank on the E5-2620v2.
mpirun -n 3 -ppn 1 -hosts KNL,KNL,Thor program arg arg arg
This launches 2 ranks on KNL and 1 rank on Thor as expected
So far, so good.
Rank 0 was located on KNL
Rank 1 on Thor
Rank 2 on KNL
Aren't the nodes associated in the -hosts order?
What is the recommended way to control rank placement?
When I try 8 on KNL and 1 on Thor, the distribution is quite goofy. (likely my issue)
I tried using the : separator to sequester the KNL on left side and Thor on right side of :, but mpirun choked on the comman line.
(I am a noob at this)
mpirun -n 8 -hosts KNL program args : -n 1 -hosts Thor program args
seems to work. Getting a good grasp on the nuances of mpirun will require some time.
Hey Jim, do you by any change have insight into how the ranks are placed inside the KNL node? As in, will it try to place one rank per physical core or first populate all logical cores of a physical core or is it completely random?
The application run on KNL (and Xeon host) is both MPI and OpenMP. The BIOS configuration on KNL was set to present 4 NUMA compute nodes with MCDRAM set as cache (each node having its own LLC of 1/4 of MCDRAM). Under this configuration, the default MPI distribution on the KNL would be sparse resulting in 2 cores/ranks per NUMA node on the KNL running the MPI ranks, then each rank using OpenMP to run 32 threads (OpenMP 4.0 tasks). The Xeon Host (Thor) launched 1 process, then ran 12 OpenMP threads within that rank.