Software Archive
Read-only legacy content
17061 Discussions

Binding hyperthreads to the same core as MPI processes

Ambuj_P_
Beginner
1,040 Views

I have an MPI application wherein I want to a have multithreaded section that can be utilized by hyperthreads.

For example, suppose I have 4 MPI ranks running on 4 different cores on MIC. I want to utilize two hyperthreads from each of these 4 cores for my mutithreaded section (which basically contains just two omp sections).

I'm exporting the following variables:
export MIC_ENV_PREFIX=PHI
export PHI_KMP_AFFINITY=balanced
export PHI_KMP_PLACE_THREADS=4c,2t
export PHI_OMP_NUM_THREADS=2
export OMP_NUM_THREADS=2

When I start my application, I see the following on top utility:

  • CPU    Command
  • 1          my_app 
  • 62        my_app 
  • 123      my_app 
  • 184      my_app

These are the 4 MPI processes. 

After switching to thread view, I get:

  • CPU    Command
  • 1          my_app 
  • 2          my_app 
  • 5          my_app 
  • 62        my_app 
  • 63        my_app 
  • 65        my_app 
  • 123      my_app 
  • 124      my_app 
  • 125      my_app 
  • 184      my_app
  • 185      my_app
  • 186      my_app

What I would like is to spawn the threads in the same core as the parent process. (For eg, Process running on CPU 1 spawn two threads on CPU 2 and 3 so that they are in the same core).

Also is there a way to utilize the parent process as one of the threads and spawn just one extra thread.

 

0 Kudos
1 Solution
James_C_Intel2
Employee
1,040 Views

I believe (though I admit, I haven't tested it), that if you use I_MPI_PIN_DOMAIN ( http://software.intel.com/sites/products/documentation/hpc/ics/impi/41/win/Reference_Manual/Interoperability_with_OpenMP.htm ) that will set up the kernel affinity masks so that when the process starts the OpenMP runtime will naturally be restricted to the slice of the machine that MPI has allocated for it. You can then use normal OpenMP affinity controls which will apply within that slice of the machine.

My point is that the settings he gave to OpenMP (allocating more than one core, requesting fewer threads than cores and setting a balanced affinity) is guaranteed not to place threads on the same core. 

The settings he's using have the feel of a "magic card deck" that has been passed down from history and that is now being used a long way from where it made sense :-)

(I'm sure you had some of those on IBM machines back in the day; I know I did...)

View solution in original post

0 Kudos
5 Replies
James_C_Intel2
Employee
1,040 Views

You are explicitly telling OpenMP to use 2 threads in each of four cores (PHI_KMP_PLACE_THREADS=4c,2t), then, having reserved space for eight threads you tell it only to use two of them (PHI_OMP_NUM_THREADS=2), and to place those threads in a balanced way across the four available cores (PHI_KMP_AFFINITY=balanced). So you're telling the OpenMP runtime to place the two threads on separate cores. Then you seem to be complaining that that is what it's doing...

If you want to keep the two threads on the same core, then just say so, Either be more explicit about the resources you allocate and simply allocate 2 hardware threads on one core (PHI_KMP_PLACE_THREADS=1c,2t), or use an affinity that places threads near each other (PHI_KMP_AFFINITY=compact). My preference is to allocate only the resources you intend to use and then let the runtime use them all. (So don't use both KMP_PLACE_THREADS and OMP_NUM_THREADS at the same time).

"Also is there a way to utilize the parent process as one of the threads and spawn just one extra thread?"

Yes, that is what happens anyway. If you write #pragma omp parallel numthreads(2) that will execute on the thread that encounters it and one other thread. (You may be getting confused by the "wakeup" thread that the runtime always creates, but which waits on a kernel timer most of the time).

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,040 Views

Jim,

The problem as I see it here is as Ambuj described, he is using MPI (i.e. running separate processes for each MPI rank on the same KNC processor). For each process, then he wishes to create an OpenMP thread pool, restricted to the MPI rank's pinned core. I do not think that there is a combination of environment variables that can do this (directly). However, and this I have NOT tried myself, if the MPI run, runs a script instead of the executable, that script could set environment variables visible to the context of the process running the (run from) the script. The script can receive the rank number, then set the appropriate environment variables from the rank number, then launch the MPI application.

This all depends upon mpirun being able to run a script.

Jim Dempsey

0 Kudos
James_C_Intel2
Employee
1,041 Views

I believe (though I admit, I haven't tested it), that if you use I_MPI_PIN_DOMAIN ( http://software.intel.com/sites/products/documentation/hpc/ics/impi/41/win/Reference_Manual/Interoperability_with_OpenMP.htm ) that will set up the kernel affinity masks so that when the process starts the OpenMP runtime will naturally be restricted to the slice of the machine that MPI has allocated for it. You can then use normal OpenMP affinity controls which will apply within that slice of the machine.

My point is that the settings he gave to OpenMP (allocating more than one core, requesting fewer threads than cores and setting a balanced affinity) is guaranteed not to place threads on the same core. 

The settings he's using have the feel of a "magic card deck" that has been passed down from history and that is now being used a long way from where it made sense :-)

(I'm sure you had some of those on IBM machines back in the day; I know I did...)

0 Kudos
Ambuj_P_
Beginner
1,040 Views

Yeah, I_MPI_PIN_DOMAIN=core does the trick for hyperthreads. Thanks.

0 Kudos
James_C_Intel2
Employee
1,040 Views

Thanks for the feedback. Glad it's working for you now.

0 Kudos
Reply