- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
is it possible to run an mpi program with processes on cpu and mic. for example, if I have two processes, say rank=0 and rank=1, is it possible to run rank=0 on cpu and rank=1 on mic? if yes, could you show an example. thanks.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
https://software.intel.com/en-us/articles/how-to-run-intel-mpi-on-xeon-phi
In order to make this useful, at least on the coprocessor side, the ranks will need to use multiple threads per rank so as to balance the work rate on host vs. coprocessor and avoid running out of memory. That subject still seems too controversial to be posted on IDZ.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks a lot!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't know why anyone would consider the combination of "symmetric mode" (some MPI tasks on host(s) and some on Xeon Phi) with hybrid parallelism (MPI + threads) to be controversial?
In addition to the many excellent references on the Intel web site, you might find the Xeon Phi training material at TACC to be useful. Look for "symmetric" in the course materials at https://www.tacc.utexas.edu/user-services/training/course-materials. ; The TACC training material is somewhat site-specific, but it also includes more of the low-level, step-by-step details than some of the more generic references.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
John,
Thanks for the reference.
Regards
--
Taylor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I see in the reference John provided a discussion of the case of 16 single thread MPI ranks on host (presumably dual 8-core CPU) and 4 ranks of 30 threads on coprocessor. This is a useful example beyond what has been accepted for posting on IDZ, even if it's relatively unambitious in terms of accelerating performance vs. host alone. It's a good starting point; some jobs ought to scale up to 6 ranks of 30 threads each on coprocessor (depending on number of cores available).
The MPI pinning settings ought to reserve a group of cores for each rank and distribute the threads evenly across them.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page