Software Archive
Read-only legacy content

mpi program

Jen_B_
Beginner
643 Views

is it possible to run an mpi program with processes on cpu and mic. for example, if I have two processes, say rank=0 and rank=1, is it possible to run rank=0 on cpu and rank=1 on mic? if yes, could you show an example. thanks.

0 Kudos
5 Replies
TimP
Honored Contributor III
643 Views

https://software.intel.com/sites/default/files/article/336139/using-intel-mpi-on-intel-xeon-phi-coprosessor-systems-v1.3.pdf

https://software.intel.com/en-us/articles/how-to-run-intel-mpi-on-xeon-phi

In order to make this useful, at least on the coprocessor side, the ranks will need to use multiple threads per rank so as to balance the work rate on host vs. coprocessor and avoid running out of memory.  That subject still seems too controversial to be posted on IDZ.

0 Kudos
Jen_B_
Beginner
643 Views

Thanks a lot!

0 Kudos
McCalpinJohn
Honored Contributor III
643 Views

I don't know why anyone would consider the combination of "symmetric mode" (some MPI tasks on host(s) and some on Xeon Phi) with hybrid parallelism (MPI + threads) to be controversial?  

In addition to the many excellent references on the Intel web site, you might find the Xeon Phi training material at TACC to be useful.   Look for "symmetric" in the course materials at https://www.tacc.utexas.edu/user-services/training/course-materials.  ; The TACC training material is somewhat site-specific, but it also includes more of the low-level, step-by-step details than some of the more generic references.

 

0 Kudos
TaylorIoTKidd
New Contributor I
643 Views

John,

Thanks for the reference.

Regards
--
Taylor
 

0 Kudos
TimP
Honored Contributor III
643 Views

I see in the reference John provided a discussion of the case of 16 single thread MPI ranks on host (presumably dual 8-core CPU) and 4 ranks of 30 threads on coprocessor.  This is a useful example beyond what has been accepted for posting on IDZ, even if it's relatively unambitious in terms of accelerating performance vs. host alone.  It's a good starting point; some jobs ought to scale up to 6 ranks of 30 threads each on coprocessor (depending on number of cores available). 

The MPI pinning settings ought to reserve a group of cores for each rank and distribute the threads evenly across them.

0 Kudos
Reply