Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
56 Views

Questions about Threads/core affinity and OpenMp

Jump to solution

I don't understand how we can use these things together :

GOMP_CPU_AFFINITY / KMP_AFFINITY, OMP_PLACES and OMP_PROC_BIND.

In my project, i would like associate core to calculate specifics things. I have a big code with very big arrays (unidimentionnal for 10^6-9 values for each). I have 2 sockets and 28 physics cores (1 node). For some cases, i must use 1-2 big arrays in loops and i think it's intelligent to use 2 sockets for having a big memory cache. And sometimes, One socket could be enough without using QPI which is overhead.

I must have a strategy with openmp's places because it's more simple i think to organize that. Unfortunately, i don't understand the difference between KMP_AFFINITY (if i take the example intel) and OMP_PLACES / OMP_PROC_BIND. I don't know if we can combinate these methods. My feeling is the both do a little the same thing.

Thanks for your future answers

ps : I use fortran language in my code. I have intel compiler and gfortran.

0 Kudos

Accepted Solutions
Highlighted
Black Belt
56 Views

KMP_AFFINITY is an Intel specific predecessor of the portable standard OMP_PLACES and OMP_PROC_BIND.  You can compare the effect of whatever combination you like by setting KMP_AFFINITY=verbose . KMP_AFFINITY probably takes precedence over the OMP_ settings if there is a conflict.

If you can use static scheduling, you would expect to see best performance by pinning your work evenly across cores and sockets, even if  you don't have enough work to use all the cores.  The overhead of QPI is not as much a concern as the question of whether a thread moves to remote memory, or moves among cores too frequently to take advantage of cache.  You may not be able to guess without careful performance checks whether restricting a small number of threads to a single socket would be best in cases of dynamic scheduling. It might be so if your job alternates between serial (single thread) and parallel.  You do have the possibility of running simultaneous independent tasks reasonably efficiently by restricting each to a different socket.

libgomp which comes with gfortran will ignore KMP_AFFINITY or even the OMP_ or GOMP_ affinity settings, as those aren't implemented for Windows.  Intel libiomp5 doesn't support gcc/gfortran on Windows, unlike linux libiomp5,.  So, you should see more advantage in using ifort with correct affinity settings than you would on linux, although there are situations where even libiomp5 is not as efficient on Windows as on linux.

View solution in original post

0 Kudos
2 Replies
Highlighted
Black Belt
57 Views

KMP_AFFINITY is an Intel specific predecessor of the portable standard OMP_PLACES and OMP_PROC_BIND.  You can compare the effect of whatever combination you like by setting KMP_AFFINITY=verbose . KMP_AFFINITY probably takes precedence over the OMP_ settings if there is a conflict.

If you can use static scheduling, you would expect to see best performance by pinning your work evenly across cores and sockets, even if  you don't have enough work to use all the cores.  The overhead of QPI is not as much a concern as the question of whether a thread moves to remote memory, or moves among cores too frequently to take advantage of cache.  You may not be able to guess without careful performance checks whether restricting a small number of threads to a single socket would be best in cases of dynamic scheduling. It might be so if your job alternates between serial (single thread) and parallel.  You do have the possibility of running simultaneous independent tasks reasonably efficiently by restricting each to a different socket.

libgomp which comes with gfortran will ignore KMP_AFFINITY or even the OMP_ or GOMP_ affinity settings, as those aren't implemented for Windows.  Intel libiomp5 doesn't support gcc/gfortran on Windows, unlike linux libiomp5,.  So, you should see more advantage in using ifort with correct affinity settings than you would on linux, although there are situations where even libiomp5 is not as efficient on Windows as on linux.

View solution in original post

0 Kudos
Highlighted
Beginner
56 Views

Thank you for your reply. I will try that.

0 Kudos