- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have an MPI application wherein I want to a have multithreaded section that can be utilized by hyperthreads.
For example, suppose I have 4 MPI ranks running on 4 different cores on MIC. I want to utilize two hyperthreads from each of these 4 cores for my mutithreaded section (which basically contains just two omp sections).
I'm exporting the following variables:
export MIC_ENV_PREFIX=PHI
export PHI_KMP_AFFINITY=balanced
export PHI_KMP_PLACE_THREADS=4c,2t
export PHI_OMP_NUM_THREADS=2
export OMP_NUM_THREADS=2
When I start my application, I see the following on top utility:
- CPU Command
- 1 my_app
- 62 my_app
- 123 my_app
- 184 my_app
These are the 4 MPI processes.
After switching to thread view, I get:
- CPU Command
- 1 my_app
- 2 my_app
- 5 my_app
- 62 my_app
- 63 my_app
- 65 my_app
- 123 my_app
- 124 my_app
- 125 my_app
- 184 my_app
- 185 my_app
- 186 my_app
What I would like is to spawn the threads in the same core as the parent process. (For eg, Process running on CPU 1 spawn two threads on CPU 2 and 3 so that they are in the same core).
Also is there a way to utilize the parent process as one of the threads and spawn just one extra thread.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I believe (though I admit, I haven't tested it), that if you use I_MPI_PIN_DOMAIN ( http://software.intel.com/sites/products/documentation/hpc/ics/impi/41/win/Reference_Manual/Interoperability_with_OpenMP.htm ) that will set up the kernel affinity masks so that when the process starts the OpenMP runtime will naturally be restricted to the slice of the machine that MPI has allocated for it. You can then use normal OpenMP affinity controls which will apply within that slice of the machine.
My point is that the settings he gave to OpenMP (allocating more than one core, requesting fewer threads than cores and setting a balanced affinity) is guaranteed not to place threads on the same core.
The settings he's using have the feel of a "magic card deck" that has been passed down from history and that is now being used a long way from where it made sense :-)
(I'm sure you had some of those on IBM machines back in the day; I know I did...)
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You are explicitly telling OpenMP to use 2 threads in each of four cores (PHI_KMP_PLACE_THREADS=4c,2t), then, having reserved space for eight threads you tell it only to use two of them (PHI_OMP_NUM_THREADS=2), and to place those threads in a balanced way across the four available cores (PHI_KMP_AFFINITY=balanced). So you're telling the OpenMP runtime to place the two threads on separate cores. Then you seem to be complaining that that is what it's doing...
If you want to keep the two threads on the same core, then just say so, Either be more explicit about the resources you allocate and simply allocate 2 hardware threads on one core (PHI_KMP_PLACE_THREADS=1c,2t), or use an affinity that places threads near each other (PHI_KMP_AFFINITY=compact). My preference is to allocate only the resources you intend to use and then let the runtime use them all. (So don't use both KMP_PLACE_THREADS and OMP_NUM_THREADS at the same time).
"Also is there a way to utilize the parent process as one of the threads and spawn just one extra thread?"
Yes, that is what happens anyway. If you write #pragma omp parallel numthreads(2) that will execute on the thread that encounters it and one other thread. (You may be getting confused by the "wakeup" thread that the runtime always creates, but which waits on a kernel timer most of the time).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jim,
The problem as I see it here is as Ambuj described, he is using MPI (i.e. running separate processes for each MPI rank on the same KNC processor). For each process, then he wishes to create an OpenMP thread pool, restricted to the MPI rank's pinned core. I do not think that there is a combination of environment variables that can do this (directly). However, and this I have NOT tried myself, if the MPI run, runs a script instead of the executable, that script could set environment variables visible to the context of the process running the (run from) the script. The script can receive the rank number, then set the appropriate environment variables from the rank number, then launch the MPI application.
This all depends upon mpirun being able to run a script.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I believe (though I admit, I haven't tested it), that if you use I_MPI_PIN_DOMAIN ( http://software.intel.com/sites/products/documentation/hpc/ics/impi/41/win/Reference_Manual/Interoperability_with_OpenMP.htm ) that will set up the kernel affinity masks so that when the process starts the OpenMP runtime will naturally be restricted to the slice of the machine that MPI has allocated for it. You can then use normal OpenMP affinity controls which will apply within that slice of the machine.
My point is that the settings he gave to OpenMP (allocating more than one core, requesting fewer threads than cores and setting a balanced affinity) is guaranteed not to place threads on the same core.
The settings he's using have the feel of a "magic card deck" that has been passed down from history and that is now being used a long way from where it made sense :-)
(I'm sure you had some of those on IBM machines back in the day; I know I did...)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yeah, I_MPI_PIN_DOMAIN=core does the trick for hyperthreads. Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the feedback. Glad it's working for you now.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page