Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Donners__John
Beginner
35 Views

-perhost parameter forgotten after first iteration over all hosts

Dear developers,

the round-robin placement forgets about the perhost parameter once it iterated over all hosts in the hostfile.
This was tested with Intel MPI 2019.1.

My hostfile looks like:

node551
node552

And when I start a small job, I get:

I_MPI_DEBUG=4 I_MPI_PIN_DOMAIN=core mpirun -f hostfile -n 8 -perhost 2  ./a.out
[0] MPI startup(): libfabric version: 1.7.0a1-impi
[0] MPI startup(): libfabric provider: verbs;ofi_rxm
[0] MPI startup(): Rank    Pid      Node name  Pin cpu
[0] MPI startup(): 0       377136   node551   {0,40}
[0] MPI startup(): 1       377137   node551   {1,41}
[0] MPI startup(): 2       151304   node552   {0,40}
[0] MPI startup(): 3       151305   node552   {1,41}
[0] MPI startup(): 4       377138   node551   {2,42}
[0] MPI startup(): 5       151306   node552   {2,42}
[0] MPI startup(): 6       377139   node551   {3,43}
[0] MPI startup(): 7       151307   node552   {3,43}

ranks 0-3 are distributed as expected, but ranks 4-7 are distributed across the hosts as if the perhost parameter is reset to 1.

0 Kudos
1 Reply
James_T_Intel
Moderator
35 Views

We have a fix for this targeted for Intel® MPI Library 2019 Update 2.

Reply