Software Archive
Read-only legacy content
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
17060 Discussions

mpi program

Jen_B_
Beginner
670 Views

is it possible to run an mpi program with processes on cpu and mic. for example, if I have two processes, say rank=0 and rank=1, is it possible to run rank=0 on cpu and rank=1 on mic? if yes, could you show an example. thanks.

0 Kudos
5 Replies
TimP
Honored Contributor III
670 Views

https://software.intel.com/sites/default/files/article/336139/using-intel-mpi-on-intel-xeon-phi-coprosessor-systems-v1.3.pdf

https://software.intel.com/en-us/articles/how-to-run-intel-mpi-on-xeon-phi

In order to make this useful, at least on the coprocessor side, the ranks will need to use multiple threads per rank so as to balance the work rate on host vs. coprocessor and avoid running out of memory.  That subject still seems too controversial to be posted on IDZ.

0 Kudos
Jen_B_
Beginner
670 Views

Thanks a lot!

0 Kudos
McCalpinJohn
Honored Contributor III
670 Views

I don't know why anyone would consider the combination of "symmetric mode" (some MPI tasks on host(s) and some on Xeon Phi) with hybrid parallelism (MPI + threads) to be controversial?  

In addition to the many excellent references on the Intel web site, you might find the Xeon Phi training material at TACC to be useful.   Look for "symmetric" in the course materials at https://www.tacc.utexas.edu/user-services/training/course-materials.  ; The TACC training material is somewhat site-specific, but it also includes more of the low-level, step-by-step details than some of the more generic references.

 

0 Kudos
TaylorIoTKidd
New Contributor I
670 Views

John,

Thanks for the reference.

Regards
--
Taylor
 

0 Kudos
TimP
Honored Contributor III
670 Views

I see in the reference John provided a discussion of the case of 16 single thread MPI ranks on host (presumably dual 8-core CPU) and 4 ranks of 30 threads on coprocessor.  This is a useful example beyond what has been accepted for posting on IDZ, even if it's relatively unambitious in terms of accelerating performance vs. host alone.  It's a good starting point; some jobs ought to scale up to 6 ranks of 30 threads each on coprocessor (depending on number of cores available). 

The MPI pinning settings ought to reserve a group of cores for each rank and distribute the threads evenly across them.

0 Kudos
Reply