Software Archive
Read-only legacy content
17061 Discussions

MPI and the KMP_PLACE_THREADS OpenMP affinity variable

james_B_8
Beginner
1,235 Views

Read about this new feature on here:

http://software.intel.com/en-us/blogs/2013/02/15/new-kmp-place-threads-openmp-affinity-variable-in-update-2-compiler

In this, the author describes two separate processes offloading to different parts of the MIC, effectively partitioning the MIC cores into 2.

What I would like to know is if it is possible to do this with say 2 different MPI processes. Could we either have it so that different MPI processes have different environmental variables i.e. each has its own unique KMP_PLACE_THREADS value?

Or ideally can I set this in the program as a call to some API?

(yes this is the same question I posted in the link, but I figured it would get more coverage if I posted it here too.)

0 Kudos
7 Replies
james_B_8
Beginner
1,235 Views

Sorry this is a duplicate post. Delete me :)

0 Kudos
Silva__Rafael
Beginner
1,235 Views

I have the same problem. No one here have any idea? I need to set the offset depending on the MPI rank, and the MPI process are running on host processor, offloading to Xeon Phi, but I need this offload is done to different cores, depending on the rank. I'm surprised I haven't found a way to do that.

0 Kudos
TimP
Honored Contributor III
1,235 Views
Usually done by specifying a Mic_kmp_place_threads for each host rank
0 Kudos
Silva__Rafael
Beginner
1,235 Views

How? In the case of MPI processes launched with "mpirun", the different ranks has the same environment, so how can I specify MIC_KMP_PLACE_THREADS differently for different ranks?

0 Kudos
TimP
Honored Contributor III
1,235 Views
Set -env for each rank example in my own and other white papers as well as reinders Jeffers book. I'll repost a URL later.
0 Kudos
TimP
Honored Contributor III
1,235 Views

Apparently, I didn't update my white paper on this "new" topic for more than a year (since my MIC access became restricted to the early B0 model).  I will do so, including a sample mpirun command adapted from Jeffers Reinders book (which itself isn't easy to search), and post to my google site page.  Jeffers and Reinders went to press before kmp_place_threads was released, so you will see there how it was done without that environment variable.   For "industrial strength" examples you may want to take advantage of the ability of mpirun configfile.

You should have noticed that we discussed the subject at https://software.intel.com/en-us/forums/topic/507595 Expert advice was given in that forum thread.  Back then I still thought there was a chance that Intel would support on-line publishing on these subjects, including OpenMP 4.  Since then,  the people responsible have taken a position that there isn't sufficient demand for literature on MIC or on programming languages other than Microsoft style C++.  So much for Cilk(tm) Plus.  

Perhaps you noticed the wry comments by Michael Klemm et al in their kindle book "optimizing hpc applications with intel cluster tools" about how poor the Kindle search capabilities are.

0 Kudos
TimP
Honored Contributor III
1,235 Views

You might note that the advice for VTune mentions how to use a configfile to specify that VTune collects only for 1 or 2 ranks, in its examples of using settings which differ among ranks.  It shouldn't be considered unusual to take advantage of that capability.

I discuss the multiple rank offload affinity in my Parallel Optimiization Environment section on https://sites.google.com/site/tprincesite/

0 Kudos
Reply