- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am experimenting with MPI (world) across different systems. One system is a Xeon Phi KNL (64 cores, 256 logical processors) called KNL, and the other system is an E5-2620v2 (6 cores, 12 logical processors) called Thor.
What I intended to do is do is to partition the KNL into 2 ranks, and then run the 3rd rank on the E5-2620v2.
mpirun -n 3 -ppn 1 -hosts KNL,KNL,Thor program arg arg arg
This launches 2 ranks on KNL and 1 rank on Thor as expected
So far, so good.
Rank 0 was located on KNL
Rank 1 on Thor
Rank 2 on KNL
Aren't the nodes associated in the -hosts order?
What is the recommended way to control rank placement?
When I try 8 on KNL and 1 on Thor, the distribution is quite goofy. (likely my issue)
I tried using the : separator to sequester the KNL on left side and Thor on right side of :, but mpirun choked on the comman line.
(I am a noob at this)
Jim Dempsey
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Using
mpirun -n 8 -hosts KNL program args : -n 1 -hosts Thor program args
seems to work. Getting a good grasp on the nuances of mpirun will require some time.
Jim Demspey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey Jim, do you by any change have insight into how the ranks are placed inside the KNL node? As in, will it try to place one rank per physical core or first populate all logical cores of a physical core or is it completely random?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The application run on KNL (and Xeon host) is both MPI and OpenMP. The BIOS configuration on KNL was set to present 4 NUMA compute nodes with MCDRAM set as cache (each node having its own LLC of 1/4 of MCDRAM). Under this configuration, the default MPI distribution on the KNL would be sparse resulting in 2 cores/ranks per NUMA node on the KNL running the MPI ranks, then each rank using OpenMP to run 32 threads (OpenMP 4.0 tasks). The Xeon Host (Thor) launched 1 process, then ran 12 OpenMP threads within that rank.
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page