Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

OpenMP Affinity for Nested Levels

jimdempseyatthecove
Honored Contributor III
1,611 Views

I have an application that may benefit with nested parallelism.

The CPU has a mix of high performance cores and efficiency cores.

An example configuration has 4 main things, easily assigned to affinity pinned CPUs

KMP_HW_SUBSET=4ceff0

or

KMP_HW_SUBSET=ceff0 (any of the 8 high performance cores)

 

Now, as the simulation runs, the "things" grow. At some point, it is determined, that a particular thing can benefit from nested parallelism.

When nested is engaged for a particular OpenMP thread, it spawns a new thread team.

How do I control the affinity of these additional threads?

Are they pinned to the same core(s) as the spawning thread?

IIF I chose KMP_HW_SUBSET=4ceff0

If so, they would be tripping over each other.

IIF I chose KMP_HW_SUBSET=ceff0 (any of the 8 high performance cores)

Would the new team nest level (sans its master thread) be excluded from the ceff0 cores as they are assigned to the main-level threads.

 

There appears to be no guidance on this matter in the reference manual.

 

To add to the complication, the application spawns a front-end, which includes live charts and graphs of the simulation running. It would be preferred for these/this thread to be not affinity pinned (let run on efficiency core when all performance cores are busy).

 

Note, while I know I can use:

 

    SetThreadAffinityMask(hThread, ulpAffinityMask)

 

To get what I want, this does mean I must query the CPU as to which cores are performance and which are efficiency.

 

While I can use: OMP_NUM_THREADS=4,2

for 4 main level threads, with each main thread able to construct a team up to 2 threads.

There is no specification of thread placement.

 

OMP_PROC_BIND 

Reference states: Sets the thread affinity policy to be used for parallel regions at the corresponding nested level.

But there is no indication as to how you specify a nest level.

 

OMP_PLACES

Doesn't have a means of specifying the nest level.

 

Any assistance on this will be appreciated.

 

Jim Dempsey

 

0 Kudos
1 Solution
TobiasK
Moderator
1,329 Views

hi @jimdempseyatthecove

 

I understand you concern, but as far as I know, there is currently no interest in bringing NUMA aware pinning to OpenMP. I found my small sample program that demonstrates what I had in mind to somewhat deal with the situation. Note I tested it only on Linux, and you have to set OMP_PLACES=cores then it should show you the master and slave threads in the first OpenMP region are pinned with a gap, while the threads in the nested region from the slave thread are pinned close to this second thread.

 

program omp_nested_pinning
  use omp_lib
  implicit none
  integer :: i,active_levels

  active_levels=omp_get_max_active_levels()

  if(active_levels.lt.2)then
     write(*,*) 'warning, OMP_MAX_ACTIVE_LEVELS is set to',active_levels
     write(*,*) 'setting OMP_MAX_ACTIVE_LEVELS to 2 now'
     call omp_set_max_active_levels(2)
  end if
  write(*,*) 'print outside'
  call print_places()
  

  !$omp parallel num_threads(2) proc_bind(spread)
  if(omp_get_thread_num().eq.0)then
     write(*,*) 'print master'
     call print_places()
  end if
  !$omp barrier
  if(omp_get_thread_num().eq.1)then
     write(*,*) 'print slave'
     call print_places()
     !$omp parallel num_threads(2) proc_bind(close)
     if(omp_get_thread_num().eq.0)then
        write(*,*) 'print master in nested region'
        call print_places()
     end if
     !$omp barrier
     if(omp_get_thread_num().eq.1)then
        write(*,*) 'print slave in nested region'
        call print_places()
     end if
     !$omp end parallel


  end if
  !$omp end parallel
contains
  subroutine print_places()
    implicit none
    integer, allocatable :: place_proc_ids(:)
    integer :: place_num_procs,num_places,place_num

    write(*,*)' OpenMP: get_num_threads', omp_get_num_threads()
    write(*,*)' OpenMP: get_max_threads', omp_get_max_threads()
    write(*,*)' OpenMP: get_thread_num', omp_get_thread_num()
    write(*,*)' OpenMP: get_place_num', omp_get_place_num()
    write(*,*)' OpenMP: get_place_num', omp_get_place_num()
    num_places=omp_get_num_places()
    do place_num=0,num_places-1
       place_num_procs=omp_get_place_num_procs(place_num)
       allocate(place_proc_ids(place_num_procs))
       call omp_get_place_proc_ids(place_num,place_proc_ids)
       write(*,*) 'OpenMP: place_num',place_num,'place_num_procs',place_num_procs
       write(*,*) 'OpenMP: place_proc_ids',place_proc_ids
       deallocate(place_proc_ids)
    end do
  end subroutine print_places

end program omp_nested_pinning

 

 

OPENMP DISPLAY ENVIRONMENT BEGIN
   _OPENMP='201611'
  [host] OMP_AFFINITY_FORMAT='OMP: pid %P tid %i thread %n bound to OS proc set {%A}'
  [host] OMP_ALLOCATOR='omp_default_mem_alloc'
  [host] OMP_CANCELLATION='FALSE'
  [host] OMP_DEBUG='disabled'
  [host] OMP_DEFAULT_DEVICE='0'
  [host] OMP_DISPLAY_AFFINITY='FALSE'
  [host] OMP_DISPLAY_ENV='TRUE'
  [host] OMP_DYNAMIC='FALSE'
  [host] OMP_MAX_ACTIVE_LEVELS='1'
  [host] OMP_MAX_TASK_PRIORITY='0'
  [host] OMP_NESTED: deprecated; max-active-levels-var=1
  [host] OMP_NUM_TEAMS='0'
  [host] OMP_NUM_THREADS='4'
  [host] OMP_PLACES='cores'
  [host] OMP_PROC_BIND='close'
  [host] OMP_SCHEDULE='static'
  [host] OMP_STACKSIZE='4M'
  [host] OMP_TARGET_OFFLOAD=DEFAULT
  [host] OMP_TEAMS_THREAD_LIMIT='0'
  [host] OMP_THREAD_LIMIT='2147483647'
  [host] OMP_TOOL='enabled'
  [host] OMP_TOOL_LIBRARIES: value is not defined
  [host] OMP_TOOL_VERBOSE_INIT: value is not defined
  [host] OMP_WAIT_POLICY='PASSIVE'
OPENMP DISPLAY ENVIRONMENT END


 warning, OMP_MAX_ACTIVE_LEVELS is set to           1
 setting OMP_MAX_ACTIVE_LEVELS to 2 now
 print outside
  OpenMP: get_num_threads           1
  OpenMP: get_max_threads           4
  OpenMP: get_thread_num           0
  OpenMP: get_place_num           0
  OpenMP: get_place_num           0
 OpenMP: place_num           0 place_num_procs           2
 OpenMP: place_proc_ids           0          40
 OpenMP: place_num           1 place_num_procs           2
 OpenMP: place_proc_ids           1          41
 OpenMP: place_num           2 place_num_procs           2
 OpenMP: place_proc_ids           2          42
 OpenMP: place_num           3 place_num_procs           2
 OpenMP: place_proc_ids           3          43
 OpenMP: place_num           4 place_num_procs           2
 OpenMP: place_proc_ids           4          44
 OpenMP: place_num           5 place_num_procs           2
 OpenMP: place_proc_ids           5          45
 OpenMP: place_num           6 place_num_procs           2
 OpenMP: place_proc_ids           6          46
 OpenMP: place_num           7 place_num_procs           2
 OpenMP: place_proc_ids           7          47
 OpenMP: place_num           8 place_num_procs           2
 OpenMP: place_proc_ids           8          48
 OpenMP: place_num           9 place_num_procs           2
 OpenMP: place_proc_ids           9          49
 OpenMP: place_num          10 place_num_procs           2
 OpenMP: place_proc_ids          10          50
 OpenMP: place_num          11 place_num_procs           2
 OpenMP: place_proc_ids          11          51
 OpenMP: place_num          12 place_num_procs           2
 OpenMP: place_proc_ids          12          52
 OpenMP: place_num          13 place_num_procs           2
 OpenMP: place_proc_ids          13          53
 OpenMP: place_num          14 place_num_procs           2
 OpenMP: place_proc_ids          14          54
 OpenMP: place_num          15 place_num_procs           2
 OpenMP: place_proc_ids          15          55
 OpenMP: place_num          16 place_num_procs           2
 OpenMP: place_proc_ids          16          56
 OpenMP: place_num          17 place_num_procs           2
 OpenMP: place_proc_ids          17          57
 OpenMP: place_num          18 place_num_procs           2
 OpenMP: place_proc_ids          18          58
 OpenMP: place_num          19 place_num_procs           2
 OpenMP: place_proc_ids          19          59
 OpenMP: place_num          20 place_num_procs           2
 OpenMP: place_proc_ids          20          60
 OpenMP: place_num          21 place_num_procs           2
 OpenMP: place_proc_ids          21          61
 OpenMP: place_num          22 place_num_procs           2
 OpenMP: place_proc_ids          22          62
 OpenMP: place_num          23 place_num_procs           2
 OpenMP: place_proc_ids          23          63
 OpenMP: place_num          24 place_num_procs           2
 OpenMP: place_proc_ids          24          64
 OpenMP: place_num          25 place_num_procs           2
 OpenMP: place_proc_ids          25          65
 OpenMP: place_num          26 place_num_procs           2
 OpenMP: place_proc_ids          26          66
 OpenMP: place_num          27 place_num_procs           2
 OpenMP: place_proc_ids          27          67
 OpenMP: place_num          28 place_num_procs           2
 OpenMP: place_proc_ids          28          68
 OpenMP: place_num          29 place_num_procs           2
 OpenMP: place_proc_ids          29          69
 OpenMP: place_num          30 place_num_procs           2
 OpenMP: place_proc_ids          30          70
 OpenMP: place_num          31 place_num_procs           2
 OpenMP: place_proc_ids          31          71
 OpenMP: place_num          32 place_num_procs           2
 OpenMP: place_proc_ids          32          72
 OpenMP: place_num          33 place_num_procs           2
 OpenMP: place_proc_ids          33          73
 OpenMP: place_num          34 place_num_procs           2
 OpenMP: place_proc_ids          34          74
 OpenMP: place_num          35 place_num_procs           2
 OpenMP: place_proc_ids          35          75
 OpenMP: place_num          36 place_num_procs           2
 OpenMP: place_proc_ids          36          76
 OpenMP: place_num          37 place_num_procs           2
 OpenMP: place_proc_ids          37          77
 OpenMP: place_num          38 place_num_procs           2
 OpenMP: place_proc_ids          38          78
 OpenMP: place_num          39 place_num_procs           2
 OpenMP: place_proc_ids          39          79
 print master
  OpenMP: get_num_threads           2
  OpenMP: get_max_threads           4
  OpenMP: get_thread_num           0
  OpenMP: get_place_num           0
  OpenMP: get_place_num           0
 OpenMP: place_num           0 place_num_procs           2
 OpenMP: place_proc_ids           0          40
 OpenMP: place_num           1 place_num_procs           2
 OpenMP: place_proc_ids           1          41
 OpenMP: place_num           2 place_num_procs           2
 OpenMP: place_proc_ids           2          42
 OpenMP: place_num           3 place_num_procs           2
 OpenMP: place_proc_ids           3          43
 OpenMP: place_num           4 place_num_procs           2
 OpenMP: place_proc_ids           4          44
 OpenMP: place_num           5 place_num_procs           2
 OpenMP: place_proc_ids           5          45
 OpenMP: place_num           6 place_num_procs           2
 OpenMP: place_proc_ids           6          46
 OpenMP: place_num           7 place_num_procs           2
 OpenMP: place_proc_ids           7          47
 OpenMP: place_num           8 place_num_procs           2
 OpenMP: place_proc_ids           8          48
 OpenMP: place_num           9 place_num_procs           2
 OpenMP: place_proc_ids           9          49
 OpenMP: place_num          10 place_num_procs           2
 OpenMP: place_proc_ids          10          50
 OpenMP: place_num          11 place_num_procs           2
 OpenMP: place_proc_ids          11          51
 OpenMP: place_num          12 place_num_procs           2
 OpenMP: place_proc_ids          12          52
 OpenMP: place_num          13 place_num_procs           2
 OpenMP: place_proc_ids          13          53
 OpenMP: place_num          14 place_num_procs           2
 OpenMP: place_proc_ids          14          54
 OpenMP: place_num          15 place_num_procs           2
 OpenMP: place_proc_ids          15          55
 OpenMP: place_num          16 place_num_procs           2
 OpenMP: place_proc_ids          16          56
 OpenMP: place_num          17 place_num_procs           2
 OpenMP: place_proc_ids          17          57
 OpenMP: place_num          18 place_num_procs           2
 OpenMP: place_proc_ids          18          58
 OpenMP: place_num          19 place_num_procs           2
 OpenMP: place_proc_ids          19          59
 OpenMP: place_num          20 place_num_procs           2
 OpenMP: place_proc_ids          20          60
 OpenMP: place_num          21 place_num_procs           2
 OpenMP: place_proc_ids          21          61
 OpenMP: place_num          22 place_num_procs           2
 OpenMP: place_proc_ids          22          62
 OpenMP: place_num          23 place_num_procs           2
 OpenMP: place_proc_ids          23          63
 OpenMP: place_num          24 place_num_procs           2
 OpenMP: place_proc_ids          24          64
 OpenMP: place_num          25 place_num_procs           2
 OpenMP: place_proc_ids          25          65
 OpenMP: place_num          26 place_num_procs           2
 OpenMP: place_proc_ids          26          66
 OpenMP: place_num          27 place_num_procs           2
 OpenMP: place_proc_ids          27          67
 OpenMP: place_num          28 place_num_procs           2
 OpenMP: place_proc_ids          28          68
 OpenMP: place_num          29 place_num_procs           2
 OpenMP: place_proc_ids          29          69
 OpenMP: place_num          30 place_num_procs           2
 OpenMP: place_proc_ids          30          70
 OpenMP: place_num          31 place_num_procs           2
 OpenMP: place_proc_ids          31          71
 OpenMP: place_num          32 place_num_procs           2
 OpenMP: place_proc_ids          32          72
 OpenMP: place_num          33 place_num_procs           2
 OpenMP: place_proc_ids          33          73
 OpenMP: place_num          34 place_num_procs           2
 OpenMP: place_proc_ids          34          74
 OpenMP: place_num          35 place_num_procs           2
 OpenMP: place_proc_ids          35          75
 OpenMP: place_num          36 place_num_procs           2
 OpenMP: place_proc_ids          36          76
 OpenMP: place_num          37 place_num_procs           2
 OpenMP: place_proc_ids          37          77
 OpenMP: place_num          38 place_num_procs           2
 OpenMP: place_proc_ids          38          78
 OpenMP: place_num          39 place_num_procs           2
 OpenMP: place_proc_ids          39          79
 print slave
  OpenMP: get_num_threads           2
  OpenMP: get_max_threads           4
  OpenMP: get_thread_num           1
  OpenMP: get_place_num          20
  OpenMP: get_place_num          20
 OpenMP: place_num           0 place_num_procs           2
 OpenMP: place_proc_ids           0          40
 OpenMP: place_num           1 place_num_procs           2
 OpenMP: place_proc_ids           1          41
 OpenMP: place_num           2 place_num_procs           2
 OpenMP: place_proc_ids           2          42
 OpenMP: place_num           3 place_num_procs           2
 OpenMP: place_proc_ids           3          43
 OpenMP: place_num           4 place_num_procs           2
 OpenMP: place_proc_ids           4          44
 OpenMP: place_num           5 place_num_procs           2
 OpenMP: place_proc_ids           5          45
 OpenMP: place_num           6 place_num_procs           2
 OpenMP: place_proc_ids           6          46
 OpenMP: place_num           7 place_num_procs           2
 OpenMP: place_proc_ids           7          47
 OpenMP: place_num           8 place_num_procs           2
 OpenMP: place_proc_ids           8          48
 OpenMP: place_num           9 place_num_procs           2
 OpenMP: place_proc_ids           9          49
 OpenMP: place_num          10 place_num_procs           2
 OpenMP: place_proc_ids          10          50
 OpenMP: place_num          11 place_num_procs           2
 OpenMP: place_proc_ids          11          51
 OpenMP: place_num          12 place_num_procs           2
 OpenMP: place_proc_ids          12          52
 OpenMP: place_num          13 place_num_procs           2
 OpenMP: place_proc_ids          13          53
 OpenMP: place_num          14 place_num_procs           2
 OpenMP: place_proc_ids          14          54
 OpenMP: place_num          15 place_num_procs           2
 OpenMP: place_proc_ids          15          55
 OpenMP: place_num          16 place_num_procs           2
 OpenMP: place_proc_ids          16          56
 OpenMP: place_num          17 place_num_procs           2
 OpenMP: place_proc_ids          17          57
 OpenMP: place_num          18 place_num_procs           2
 OpenMP: place_proc_ids          18          58
 OpenMP: place_num          19 place_num_procs           2
 OpenMP: place_proc_ids          19          59
 OpenMP: place_num          20 place_num_procs           2
 OpenMP: place_proc_ids          20          60
 OpenMP: place_num          21 place_num_procs           2
 OpenMP: place_proc_ids          21          61
 OpenMP: place_num          22 place_num_procs           2
 OpenMP: place_proc_ids          22          62
 OpenMP: place_num          23 place_num_procs           2
 OpenMP: place_proc_ids          23          63
 OpenMP: place_num          24 place_num_procs           2
 OpenMP: place_proc_ids          24          64
 OpenMP: place_num          25 place_num_procs           2
 OpenMP: place_proc_ids          25          65
 OpenMP: place_num          26 place_num_procs           2
 OpenMP: place_proc_ids          26          66
 OpenMP: place_num          27 place_num_procs           2
 OpenMP: place_proc_ids          27          67
 OpenMP: place_num          28 place_num_procs           2
 OpenMP: place_proc_ids          28          68
 OpenMP: place_num          29 place_num_procs           2
 OpenMP: place_proc_ids          29          69
 OpenMP: place_num          30 place_num_procs           2
 OpenMP: place_proc_ids          30          70
 OpenMP: place_num          31 place_num_procs           2
 OpenMP: place_proc_ids          31          71
 OpenMP: place_num          32 place_num_procs           2
 OpenMP: place_proc_ids          32          72
 OpenMP: place_num          33 place_num_procs           2
 OpenMP: place_proc_ids          33          73
 OpenMP: place_num          34 place_num_procs           2
 OpenMP: place_proc_ids          34          74
 OpenMP: place_num          35 place_num_procs           2
 OpenMP: place_proc_ids          35          75
 OpenMP: place_num          36 place_num_procs           2
 OpenMP: place_proc_ids          36          76
 OpenMP: place_num          37 place_num_procs           2
 OpenMP: place_proc_ids          37          77
 OpenMP: place_num          38 place_num_procs           2
 OpenMP: place_proc_ids          38          78
 OpenMP: place_num          39 place_num_procs           2
 OpenMP: place_proc_ids          39          79
 print master in nested region
  OpenMP: get_num_threads           2
  OpenMP: get_max_threads           4
  OpenMP: get_thread_num           0
  OpenMP: get_place_num          20
  OpenMP: get_place_num          20
 OpenMP: place_num           0 place_num_procs           2
 OpenMP: place_proc_ids           0          40
 OpenMP: place_num           1 place_num_procs           2
 OpenMP: place_proc_ids           1          41
 OpenMP: place_num           2 place_num_procs           2
 OpenMP: place_proc_ids           2          42
 OpenMP: place_num           3 place_num_procs           2
 OpenMP: place_proc_ids           3          43
 OpenMP: place_num           4 place_num_procs           2
 OpenMP: place_proc_ids           4          44
 OpenMP: place_num           5 place_num_procs           2
 OpenMP: place_proc_ids           5          45
 OpenMP: place_num           6 place_num_procs           2
 OpenMP: place_proc_ids           6          46
 OpenMP: place_num           7 place_num_procs           2
 OpenMP: place_proc_ids           7          47
 OpenMP: place_num           8 place_num_procs           2
 OpenMP: place_proc_ids           8          48
 OpenMP: place_num           9 place_num_procs           2
 OpenMP: place_proc_ids           9          49
 OpenMP: place_num          10 place_num_procs           2
 OpenMP: place_proc_ids          10          50
 OpenMP: place_num          11 place_num_procs           2
 OpenMP: place_proc_ids          11          51
 OpenMP: place_num          12 place_num_procs           2
 OpenMP: place_proc_ids          12          52
 OpenMP: place_num          13 place_num_procs           2
 OpenMP: place_proc_ids          13          53
 OpenMP: place_num          14 place_num_procs           2
 OpenMP: place_proc_ids          14          54
 OpenMP: place_num          15 place_num_procs           2
 OpenMP: place_proc_ids          15          55
 OpenMP: place_num          16 place_num_procs           2
 OpenMP: place_proc_ids          16          56
 OpenMP: place_num          17 place_num_procs           2
 OpenMP: place_proc_ids          17          57
 OpenMP: place_num          18 place_num_procs           2
 OpenMP: place_proc_ids          18          58
 OpenMP: place_num          19 place_num_procs           2
 OpenMP: place_proc_ids          19          59
 OpenMP: place_num          20 place_num_procs           2
 OpenMP: place_proc_ids          20          60
 OpenMP: place_num          21 place_num_procs           2
 OpenMP: place_proc_ids          21          61
 OpenMP: place_num          22 place_num_procs           2
 OpenMP: place_proc_ids          22          62
 OpenMP: place_num          23 place_num_procs           2
 OpenMP: place_proc_ids          23          63
 OpenMP: place_num          24 place_num_procs           2
 OpenMP: place_proc_ids          24          64
 OpenMP: place_num          25 place_num_procs           2
 OpenMP: place_proc_ids          25          65
 OpenMP: place_num          26 place_num_procs           2
 OpenMP: place_proc_ids          26          66
 OpenMP: place_num          27 place_num_procs           2
 OpenMP: place_proc_ids          27          67
 OpenMP: place_num          28 place_num_procs           2
 OpenMP: place_proc_ids          28          68
 OpenMP: place_num          29 place_num_procs           2
 OpenMP: place_proc_ids          29          69
 OpenMP: place_num          30 place_num_procs           2
 OpenMP: place_proc_ids          30          70
 OpenMP: place_num          31 place_num_procs           2
 OpenMP: place_proc_ids          31          71
 OpenMP: place_num          32 place_num_procs           2
 OpenMP: place_proc_ids          32          72
 OpenMP: place_num          33 place_num_procs           2
 OpenMP: place_proc_ids          33          73
 OpenMP: place_num          34 place_num_procs           2
 OpenMP: place_proc_ids          34          74
 OpenMP: place_num          35 place_num_procs           2
 OpenMP: place_proc_ids          35          75
 OpenMP: place_num          36 place_num_procs           2
 OpenMP: place_proc_ids          36          76
 OpenMP: place_num          37 place_num_procs           2
 OpenMP: place_proc_ids          37          77
 OpenMP: place_num          38 place_num_procs           2
 OpenMP: place_proc_ids          38          78
 OpenMP: place_num          39 place_num_procs           2
 OpenMP: place_proc_ids          39          79
 print slave in nested region
  OpenMP: get_num_threads           2
  OpenMP: get_max_threads           4
  OpenMP: get_thread_num           1
  OpenMP: get_place_num          21
  OpenMP: get_place_num          21
 OpenMP: place_num           0 place_num_procs           2
 OpenMP: place_proc_ids           0          40
 OpenMP: place_num           1 place_num_procs           2
 OpenMP: place_proc_ids           1          41
 OpenMP: place_num           2 place_num_procs           2
 OpenMP: place_proc_ids           2          42
 OpenMP: place_num           3 place_num_procs           2
 OpenMP: place_proc_ids           3          43
 OpenMP: place_num           4 place_num_procs           2
 OpenMP: place_proc_ids           4          44
 OpenMP: place_num           5 place_num_procs           2
 OpenMP: place_proc_ids           5          45
 OpenMP: place_num           6 place_num_procs           2
 OpenMP: place_proc_ids           6          46
 OpenMP: place_num           7 place_num_procs           2
 OpenMP: place_proc_ids           7          47
 OpenMP: place_num           8 place_num_procs           2
 OpenMP: place_proc_ids           8          48
 OpenMP: place_num           9 place_num_procs           2
 OpenMP: place_proc_ids           9          49
 OpenMP: place_num          10 place_num_procs           2
 OpenMP: place_proc_ids          10          50
 OpenMP: place_num          11 place_num_procs           2
 OpenMP: place_proc_ids          11          51
 OpenMP: place_num          12 place_num_procs           2
 OpenMP: place_proc_ids          12          52
 OpenMP: place_num          13 place_num_procs           2
 OpenMP: place_proc_ids          13          53
 OpenMP: place_num          14 place_num_procs           2
 OpenMP: place_proc_ids          14          54
 OpenMP: place_num          15 place_num_procs           2
 OpenMP: place_proc_ids          15          55
 OpenMP: place_num          16 place_num_procs           2
 OpenMP: place_proc_ids          16          56
 OpenMP: place_num          17 place_num_procs           2
 OpenMP: place_proc_ids          17          57
 OpenMP: place_num          18 place_num_procs           2
 OpenMP: place_proc_ids          18          58
 OpenMP: place_num          19 place_num_procs           2
 OpenMP: place_proc_ids          19          59
 OpenMP: place_num          20 place_num_procs           2
 OpenMP: place_proc_ids          20          60
 OpenMP: place_num          21 place_num_procs           2
 OpenMP: place_proc_ids          21          61
 OpenMP: place_num          22 place_num_procs           2
 OpenMP: place_proc_ids          22          62
 OpenMP: place_num          23 place_num_procs           2
 OpenMP: place_proc_ids          23          63
 OpenMP: place_num          24 place_num_procs           2
 OpenMP: place_proc_ids          24          64
 OpenMP: place_num          25 place_num_procs           2
 OpenMP: place_proc_ids          25          65
 OpenMP: place_num          26 place_num_procs           2
 OpenMP: place_proc_ids          26          66
 OpenMP: place_num          27 place_num_procs           2
 OpenMP: place_proc_ids          27          67
 OpenMP: place_num          28 place_num_procs           2
 OpenMP: place_proc_ids          28          68
 OpenMP: place_num          29 place_num_procs           2
 OpenMP: place_proc_ids          29          69
 OpenMP: place_num          30 place_num_procs           2
 OpenMP: place_proc_ids          30          70
 OpenMP: place_num          31 place_num_procs           2
 OpenMP: place_proc_ids          31          71
 OpenMP: place_num          32 place_num_procs           2
 OpenMP: place_proc_ids          32          72
 OpenMP: place_num          33 place_num_procs           2
 OpenMP: place_proc_ids          33          73
 OpenMP: place_num          34 place_num_procs           2
 OpenMP: place_proc_ids          34          74
 OpenMP: place_num          35 place_num_procs           2
 OpenMP: place_proc_ids          35          75
 OpenMP: place_num          36 place_num_procs           2
 OpenMP: place_proc_ids          36          76
 OpenMP: place_num          37 place_num_procs           2
 OpenMP: place_proc_ids          37          77
 OpenMP: place_num          38 place_num_procs           2
 OpenMP: place_proc_ids          38          78
 OpenMP: place_num          39 place_num_procs           2
 OpenMP: place_proc_ids          39          79

So you see the thread 1 in thread 1's parallel region, is pinned to place 21.

Now if you change the number of threads of the initial parallel region to the number of NUMA-domains you should get for each subsequent parallel region threads inside the same NUMA domain.

View solution in original post

7 Replies
TobiasK
Moderator
1,532 Views

Hi @jimdempseyatthecove 

Nested openmp, that's always interesting:)

Do you want to use the hyperthreads or the e-cores only in the nested region? Or do you only want to use the P-cores always?



0 Kudos
jimdempseyatthecove
Honored Contributor III
1,517 Views

In this case, I wish to use the P-cores.

While I can set the process affinity to the P-cores (1st 8 in my case) then let the main thread team, and its team members's thread teams (at nest level 1) fend for themselves, this seams a bit inefficient for maintaining data in cache (when threads migrate within the process affinity).

 

It would be nice, if the OpenMP, environment and/or API (omp_....) could facilitate the selection.

 

In the general case, let's look at a 2 socket system running an application such as mine.

 

Suppose at program start,  there are two or say four collections that need to be processed. These collections can grow parodically during processing. 

At the beginning, it may be desirable in the two-collection case to place each collection on a different socket this can be done with environment variable settings. During execution, as a collection grows, at some point, it becomes advantageous to parallelize the collection via nested parallelism. It is advantageous to bind the new thread team to the socket ie affinity of spawning thread, such that L3 cache can be shared amongst the halves of the collection.

 

Considering that OpenMP has offload directives that one can specify where (which accelerator) to process a region, I am surprised that no effort was made to permit a directive to specify a preference as to which socket or NUMA node to spawn a nest level onto.

 

On an unrelated topic.

 

Windows Task Manager what is "CPU n - Parked" relate to?

 

Jim Dempsey

0 Kudos
TobiasK
Moderator
1,490 Views

@jimdempseyatthecove 

yes you are right, numa aware OpenMP is really something which is basically non-existent.
I guess the reason is that almost all applications are using MPI to handle the numa placement. So in your case, the MPI rank would be placed on each socket and spawns, as needed, openmp threads within it's CPU set.

 

I once thought about the same problem and I think it's best to do the following:
For the first openmp level specify proc bind spread, either by environment or at each parallel region you create. For the nested regions you specify proc bind close. For specifying the omp places where the threads get pinned it's not relevant if they are nested threads or not, it's simply counting. E.g. if you have 8 places, e.g. core id's 0-7,  you start the first openmp region with two threads and each thread later spawns a nested region with 4 threads then all 8 places are in use. The first two threads will pin to place 0 and 4 due to proc bind spread and the nested threads will additionally use 1,2,3 and 5,6,7.

At least that is what I remember from my experiments at that time. Let me check if I have some example code around to show the behavior.

 

Regarding offload to GPUs, yes that's also more complicated. But again, (Intel) MPI is of help here. We support pinning a set of ranks to a particular set of GPUs, so that inside a MPI rank you only see the GPU that is available. I think that is again much simpler than handling the device selection from within OpenMP. With OpenMP you simply don't know if the GPU id is now connected to the socket you are running your threads on etc. Intel MPI has a logic to determine which GPU is connected to which socket / numa domain and can then pin to the closest GPU available.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,478 Views

FWIW

 

Years ago, I wrote a threading system called QuickThread. It didn't get much traction, and so I abandoned it. It was C++ but the earliest phases also included Fortran. Fortran was dropped because at that time you had to hand write all the interfaces (no auto gen interfaces at that time). At program start, the initialization code would discover the topology of the system (NUMA levels and nodes, L3's, L2's and L1's). The thread team invoking statements could optionally include a placement argument. For example,

parallel_for(
  M0$, // same socket as spawing thread
  (intptr_t)0, nTiles,
  [&](intptr_t iBegin, intptr_t iEnd)
{
  ...
  // nested within socket
  parallel_for(
    L2$, // same L2 cach as spawing thread
    ...

And a bunch of other features. FIFO and LIFO queues, low priority queues, I/O (high level) queues and such, blocking tasks, non-blocking tasks, waiting on other tasks, etc.... The potential users would rather stick with OpenMP or TBB.

The Fortran support syntax didn't have templates, instead you called a dispatch function:

subroutine QuickThreadTest
    use SimpleArray_mod
    use QuickThreadInterfaces
    implicit none

    ! Local Variables
    ! Stack local control structure
    type(T_QuickThreadControlStructure) :: qtControl
    integer :: i
    
    ! code
    do i=1, NumberOfIterations
        ! Slice the array processing by number of worker threads
        ! across range of 1 to size of array A (B, and C)
        call QuickThreadQueueDo(qtControl, DoSimpleArraySlice, 1, size(A))
        if(isTemporal) call QuickThreadWaitTillDone(qtControl)
    end do
    if(.not. isTemporal) call QuickThreadWaitTillDone(qtControl)
end subroutine QuickThreadTest

! DoSimpleArraySlice
! Perform work on slice of arrays A, B, and C
subroutine DoSimpleArraySlice(iFrom, iTo)
    use SimpleArray_mod
    implicit none
!DEC$ ATTRIBUTES VALUE :: iFrom
    integer :: iFrom
!DEC$ ATTRIBUTES VALUE :: iTo
    integer :: iTo
    
    ! Local Variables
    integer :: j
    
    do j=iFrom, iTo
        A(j) = B(j) + C(j)
    end do
end subroutine DoSimpleArraySlice

Where you can insert the placement in the qtControl structure.

 

A case of good design - but no interest.

 

The foundation of this threading system is such that the "placement" could also include Performance core and Efficiency core selection.

 

I think without much work, the OpenMP directives in both ifx and icx could incorporate clauses for placement and then use the task scheduler I built for QuickThread. Some appropriate name would have to be chosen. That would be a start, then it might be desirable to add in the other features (asychronous procedures, I/O level, FIFO, LIFO, completion procedures, etc...).

 

Jim

TobiasK
Moderator
1,330 Views

hi @jimdempseyatthecove

 

I understand you concern, but as far as I know, there is currently no interest in bringing NUMA aware pinning to OpenMP. I found my small sample program that demonstrates what I had in mind to somewhat deal with the situation. Note I tested it only on Linux, and you have to set OMP_PLACES=cores then it should show you the master and slave threads in the first OpenMP region are pinned with a gap, while the threads in the nested region from the slave thread are pinned close to this second thread.

 

program omp_nested_pinning
  use omp_lib
  implicit none
  integer :: i,active_levels

  active_levels=omp_get_max_active_levels()

  if(active_levels.lt.2)then
     write(*,*) 'warning, OMP_MAX_ACTIVE_LEVELS is set to',active_levels
     write(*,*) 'setting OMP_MAX_ACTIVE_LEVELS to 2 now'
     call omp_set_max_active_levels(2)
  end if
  write(*,*) 'print outside'
  call print_places()
  

  !$omp parallel num_threads(2) proc_bind(spread)
  if(omp_get_thread_num().eq.0)then
     write(*,*) 'print master'
     call print_places()
  end if
  !$omp barrier
  if(omp_get_thread_num().eq.1)then
     write(*,*) 'print slave'
     call print_places()
     !$omp parallel num_threads(2) proc_bind(close)
     if(omp_get_thread_num().eq.0)then
        write(*,*) 'print master in nested region'
        call print_places()
     end if
     !$omp barrier
     if(omp_get_thread_num().eq.1)then
        write(*,*) 'print slave in nested region'
        call print_places()
     end if
     !$omp end parallel


  end if
  !$omp end parallel
contains
  subroutine print_places()
    implicit none
    integer, allocatable :: place_proc_ids(:)
    integer :: place_num_procs,num_places,place_num

    write(*,*)' OpenMP: get_num_threads', omp_get_num_threads()
    write(*,*)' OpenMP: get_max_threads', omp_get_max_threads()
    write(*,*)' OpenMP: get_thread_num', omp_get_thread_num()
    write(*,*)' OpenMP: get_place_num', omp_get_place_num()
    write(*,*)' OpenMP: get_place_num', omp_get_place_num()
    num_places=omp_get_num_places()
    do place_num=0,num_places-1
       place_num_procs=omp_get_place_num_procs(place_num)
       allocate(place_proc_ids(place_num_procs))
       call omp_get_place_proc_ids(place_num,place_proc_ids)
       write(*,*) 'OpenMP: place_num',place_num,'place_num_procs',place_num_procs
       write(*,*) 'OpenMP: place_proc_ids',place_proc_ids
       deallocate(place_proc_ids)
    end do
  end subroutine print_places

end program omp_nested_pinning

 

 

OPENMP DISPLAY ENVIRONMENT BEGIN
   _OPENMP='201611'
  [host] OMP_AFFINITY_FORMAT='OMP: pid %P tid %i thread %n bound to OS proc set {%A}'
  [host] OMP_ALLOCATOR='omp_default_mem_alloc'
  [host] OMP_CANCELLATION='FALSE'
  [host] OMP_DEBUG='disabled'
  [host] OMP_DEFAULT_DEVICE='0'
  [host] OMP_DISPLAY_AFFINITY='FALSE'
  [host] OMP_DISPLAY_ENV='TRUE'
  [host] OMP_DYNAMIC='FALSE'
  [host] OMP_MAX_ACTIVE_LEVELS='1'
  [host] OMP_MAX_TASK_PRIORITY='0'
  [host] OMP_NESTED: deprecated; max-active-levels-var=1
  [host] OMP_NUM_TEAMS='0'
  [host] OMP_NUM_THREADS='4'
  [host] OMP_PLACES='cores'
  [host] OMP_PROC_BIND='close'
  [host] OMP_SCHEDULE='static'
  [host] OMP_STACKSIZE='4M'
  [host] OMP_TARGET_OFFLOAD=DEFAULT
  [host] OMP_TEAMS_THREAD_LIMIT='0'
  [host] OMP_THREAD_LIMIT='2147483647'
  [host] OMP_TOOL='enabled'
  [host] OMP_TOOL_LIBRARIES: value is not defined
  [host] OMP_TOOL_VERBOSE_INIT: value is not defined
  [host] OMP_WAIT_POLICY='PASSIVE'
OPENMP DISPLAY ENVIRONMENT END


 warning, OMP_MAX_ACTIVE_LEVELS is set to           1
 setting OMP_MAX_ACTIVE_LEVELS to 2 now
 print outside
  OpenMP: get_num_threads           1
  OpenMP: get_max_threads           4
  OpenMP: get_thread_num           0
  OpenMP: get_place_num           0
  OpenMP: get_place_num           0
 OpenMP: place_num           0 place_num_procs           2
 OpenMP: place_proc_ids           0          40
 OpenMP: place_num           1 place_num_procs           2
 OpenMP: place_proc_ids           1          41
 OpenMP: place_num           2 place_num_procs           2
 OpenMP: place_proc_ids           2          42
 OpenMP: place_num           3 place_num_procs           2
 OpenMP: place_proc_ids           3          43
 OpenMP: place_num           4 place_num_procs           2
 OpenMP: place_proc_ids           4          44
 OpenMP: place_num           5 place_num_procs           2
 OpenMP: place_proc_ids           5          45
 OpenMP: place_num           6 place_num_procs           2
 OpenMP: place_proc_ids           6          46
 OpenMP: place_num           7 place_num_procs           2
 OpenMP: place_proc_ids           7          47
 OpenMP: place_num           8 place_num_procs           2
 OpenMP: place_proc_ids           8          48
 OpenMP: place_num           9 place_num_procs           2
 OpenMP: place_proc_ids           9          49
 OpenMP: place_num          10 place_num_procs           2
 OpenMP: place_proc_ids          10          50
 OpenMP: place_num          11 place_num_procs           2
 OpenMP: place_proc_ids          11          51
 OpenMP: place_num          12 place_num_procs           2
 OpenMP: place_proc_ids          12          52
 OpenMP: place_num          13 place_num_procs           2
 OpenMP: place_proc_ids          13          53
 OpenMP: place_num          14 place_num_procs           2
 OpenMP: place_proc_ids          14          54
 OpenMP: place_num          15 place_num_procs           2
 OpenMP: place_proc_ids          15          55
 OpenMP: place_num          16 place_num_procs           2
 OpenMP: place_proc_ids          16          56
 OpenMP: place_num          17 place_num_procs           2
 OpenMP: place_proc_ids          17          57
 OpenMP: place_num          18 place_num_procs           2
 OpenMP: place_proc_ids          18          58
 OpenMP: place_num          19 place_num_procs           2
 OpenMP: place_proc_ids          19          59
 OpenMP: place_num          20 place_num_procs           2
 OpenMP: place_proc_ids          20          60
 OpenMP: place_num          21 place_num_procs           2
 OpenMP: place_proc_ids          21          61
 OpenMP: place_num          22 place_num_procs           2
 OpenMP: place_proc_ids          22          62
 OpenMP: place_num          23 place_num_procs           2
 OpenMP: place_proc_ids          23          63
 OpenMP: place_num          24 place_num_procs           2
 OpenMP: place_proc_ids          24          64
 OpenMP: place_num          25 place_num_procs           2
 OpenMP: place_proc_ids          25          65
 OpenMP: place_num          26 place_num_procs           2
 OpenMP: place_proc_ids          26          66
 OpenMP: place_num          27 place_num_procs           2
 OpenMP: place_proc_ids          27          67
 OpenMP: place_num          28 place_num_procs           2
 OpenMP: place_proc_ids          28          68
 OpenMP: place_num          29 place_num_procs           2
 OpenMP: place_proc_ids          29          69
 OpenMP: place_num          30 place_num_procs           2
 OpenMP: place_proc_ids          30          70
 OpenMP: place_num          31 place_num_procs           2
 OpenMP: place_proc_ids          31          71
 OpenMP: place_num          32 place_num_procs           2
 OpenMP: place_proc_ids          32          72
 OpenMP: place_num          33 place_num_procs           2
 OpenMP: place_proc_ids          33          73
 OpenMP: place_num          34 place_num_procs           2
 OpenMP: place_proc_ids          34          74
 OpenMP: place_num          35 place_num_procs           2
 OpenMP: place_proc_ids          35          75
 OpenMP: place_num          36 place_num_procs           2
 OpenMP: place_proc_ids          36          76
 OpenMP: place_num          37 place_num_procs           2
 OpenMP: place_proc_ids          37          77
 OpenMP: place_num          38 place_num_procs           2
 OpenMP: place_proc_ids          38          78
 OpenMP: place_num          39 place_num_procs           2
 OpenMP: place_proc_ids          39          79
 print master
  OpenMP: get_num_threads           2
  OpenMP: get_max_threads           4
  OpenMP: get_thread_num           0
  OpenMP: get_place_num           0
  OpenMP: get_place_num           0
 OpenMP: place_num           0 place_num_procs           2
 OpenMP: place_proc_ids           0          40
 OpenMP: place_num           1 place_num_procs           2
 OpenMP: place_proc_ids           1          41
 OpenMP: place_num           2 place_num_procs           2
 OpenMP: place_proc_ids           2          42
 OpenMP: place_num           3 place_num_procs           2
 OpenMP: place_proc_ids           3          43
 OpenMP: place_num           4 place_num_procs           2
 OpenMP: place_proc_ids           4          44
 OpenMP: place_num           5 place_num_procs           2
 OpenMP: place_proc_ids           5          45
 OpenMP: place_num           6 place_num_procs           2
 OpenMP: place_proc_ids           6          46
 OpenMP: place_num           7 place_num_procs           2
 OpenMP: place_proc_ids           7          47
 OpenMP: place_num           8 place_num_procs           2
 OpenMP: place_proc_ids           8          48
 OpenMP: place_num           9 place_num_procs           2
 OpenMP: place_proc_ids           9          49
 OpenMP: place_num          10 place_num_procs           2
 OpenMP: place_proc_ids          10          50
 OpenMP: place_num          11 place_num_procs           2
 OpenMP: place_proc_ids          11          51
 OpenMP: place_num          12 place_num_procs           2
 OpenMP: place_proc_ids          12          52
 OpenMP: place_num          13 place_num_procs           2
 OpenMP: place_proc_ids          13          53
 OpenMP: place_num          14 place_num_procs           2
 OpenMP: place_proc_ids          14          54
 OpenMP: place_num          15 place_num_procs           2
 OpenMP: place_proc_ids          15          55
 OpenMP: place_num          16 place_num_procs           2
 OpenMP: place_proc_ids          16          56
 OpenMP: place_num          17 place_num_procs           2
 OpenMP: place_proc_ids          17          57
 OpenMP: place_num          18 place_num_procs           2
 OpenMP: place_proc_ids          18          58
 OpenMP: place_num          19 place_num_procs           2
 OpenMP: place_proc_ids          19          59
 OpenMP: place_num          20 place_num_procs           2
 OpenMP: place_proc_ids          20          60
 OpenMP: place_num          21 place_num_procs           2
 OpenMP: place_proc_ids          21          61
 OpenMP: place_num          22 place_num_procs           2
 OpenMP: place_proc_ids          22          62
 OpenMP: place_num          23 place_num_procs           2
 OpenMP: place_proc_ids          23          63
 OpenMP: place_num          24 place_num_procs           2
 OpenMP: place_proc_ids          24          64
 OpenMP: place_num          25 place_num_procs           2
 OpenMP: place_proc_ids          25          65
 OpenMP: place_num          26 place_num_procs           2
 OpenMP: place_proc_ids          26          66
 OpenMP: place_num          27 place_num_procs           2
 OpenMP: place_proc_ids          27          67
 OpenMP: place_num          28 place_num_procs           2
 OpenMP: place_proc_ids          28          68
 OpenMP: place_num          29 place_num_procs           2
 OpenMP: place_proc_ids          29          69
 OpenMP: place_num          30 place_num_procs           2
 OpenMP: place_proc_ids          30          70
 OpenMP: place_num          31 place_num_procs           2
 OpenMP: place_proc_ids          31          71
 OpenMP: place_num          32 place_num_procs           2
 OpenMP: place_proc_ids          32          72
 OpenMP: place_num          33 place_num_procs           2
 OpenMP: place_proc_ids          33          73
 OpenMP: place_num          34 place_num_procs           2
 OpenMP: place_proc_ids          34          74
 OpenMP: place_num          35 place_num_procs           2
 OpenMP: place_proc_ids          35          75
 OpenMP: place_num          36 place_num_procs           2
 OpenMP: place_proc_ids          36          76
 OpenMP: place_num          37 place_num_procs           2
 OpenMP: place_proc_ids          37          77
 OpenMP: place_num          38 place_num_procs           2
 OpenMP: place_proc_ids          38          78
 OpenMP: place_num          39 place_num_procs           2
 OpenMP: place_proc_ids          39          79
 print slave
  OpenMP: get_num_threads           2
  OpenMP: get_max_threads           4
  OpenMP: get_thread_num           1
  OpenMP: get_place_num          20
  OpenMP: get_place_num          20
 OpenMP: place_num           0 place_num_procs           2
 OpenMP: place_proc_ids           0          40
 OpenMP: place_num           1 place_num_procs           2
 OpenMP: place_proc_ids           1          41
 OpenMP: place_num           2 place_num_procs           2
 OpenMP: place_proc_ids           2          42
 OpenMP: place_num           3 place_num_procs           2
 OpenMP: place_proc_ids           3          43
 OpenMP: place_num           4 place_num_procs           2
 OpenMP: place_proc_ids           4          44
 OpenMP: place_num           5 place_num_procs           2
 OpenMP: place_proc_ids           5          45
 OpenMP: place_num           6 place_num_procs           2
 OpenMP: place_proc_ids           6          46
 OpenMP: place_num           7 place_num_procs           2
 OpenMP: place_proc_ids           7          47
 OpenMP: place_num           8 place_num_procs           2
 OpenMP: place_proc_ids           8          48
 OpenMP: place_num           9 place_num_procs           2
 OpenMP: place_proc_ids           9          49
 OpenMP: place_num          10 place_num_procs           2
 OpenMP: place_proc_ids          10          50
 OpenMP: place_num          11 place_num_procs           2
 OpenMP: place_proc_ids          11          51
 OpenMP: place_num          12 place_num_procs           2
 OpenMP: place_proc_ids          12          52
 OpenMP: place_num          13 place_num_procs           2
 OpenMP: place_proc_ids          13          53
 OpenMP: place_num          14 place_num_procs           2
 OpenMP: place_proc_ids          14          54
 OpenMP: place_num          15 place_num_procs           2
 OpenMP: place_proc_ids          15          55
 OpenMP: place_num          16 place_num_procs           2
 OpenMP: place_proc_ids          16          56
 OpenMP: place_num          17 place_num_procs           2
 OpenMP: place_proc_ids          17          57
 OpenMP: place_num          18 place_num_procs           2
 OpenMP: place_proc_ids          18          58
 OpenMP: place_num          19 place_num_procs           2
 OpenMP: place_proc_ids          19          59
 OpenMP: place_num          20 place_num_procs           2
 OpenMP: place_proc_ids          20          60
 OpenMP: place_num          21 place_num_procs           2
 OpenMP: place_proc_ids          21          61
 OpenMP: place_num          22 place_num_procs           2
 OpenMP: place_proc_ids          22          62
 OpenMP: place_num          23 place_num_procs           2
 OpenMP: place_proc_ids          23          63
 OpenMP: place_num          24 place_num_procs           2
 OpenMP: place_proc_ids          24          64
 OpenMP: place_num          25 place_num_procs           2
 OpenMP: place_proc_ids          25          65
 OpenMP: place_num          26 place_num_procs           2
 OpenMP: place_proc_ids          26          66
 OpenMP: place_num          27 place_num_procs           2
 OpenMP: place_proc_ids          27          67
 OpenMP: place_num          28 place_num_procs           2
 OpenMP: place_proc_ids          28          68
 OpenMP: place_num          29 place_num_procs           2
 OpenMP: place_proc_ids          29          69
 OpenMP: place_num          30 place_num_procs           2
 OpenMP: place_proc_ids          30          70
 OpenMP: place_num          31 place_num_procs           2
 OpenMP: place_proc_ids          31          71
 OpenMP: place_num          32 place_num_procs           2
 OpenMP: place_proc_ids          32          72
 OpenMP: place_num          33 place_num_procs           2
 OpenMP: place_proc_ids          33          73
 OpenMP: place_num          34 place_num_procs           2
 OpenMP: place_proc_ids          34          74
 OpenMP: place_num          35 place_num_procs           2
 OpenMP: place_proc_ids          35          75
 OpenMP: place_num          36 place_num_procs           2
 OpenMP: place_proc_ids          36          76
 OpenMP: place_num          37 place_num_procs           2
 OpenMP: place_proc_ids          37          77
 OpenMP: place_num          38 place_num_procs           2
 OpenMP: place_proc_ids          38          78
 OpenMP: place_num          39 place_num_procs           2
 OpenMP: place_proc_ids          39          79
 print master in nested region
  OpenMP: get_num_threads           2
  OpenMP: get_max_threads           4
  OpenMP: get_thread_num           0
  OpenMP: get_place_num          20
  OpenMP: get_place_num          20
 OpenMP: place_num           0 place_num_procs           2
 OpenMP: place_proc_ids           0          40
 OpenMP: place_num           1 place_num_procs           2
 OpenMP: place_proc_ids           1          41
 OpenMP: place_num           2 place_num_procs           2
 OpenMP: place_proc_ids           2          42
 OpenMP: place_num           3 place_num_procs           2
 OpenMP: place_proc_ids           3          43
 OpenMP: place_num           4 place_num_procs           2
 OpenMP: place_proc_ids           4          44
 OpenMP: place_num           5 place_num_procs           2
 OpenMP: place_proc_ids           5          45
 OpenMP: place_num           6 place_num_procs           2
 OpenMP: place_proc_ids           6          46
 OpenMP: place_num           7 place_num_procs           2
 OpenMP: place_proc_ids           7          47
 OpenMP: place_num           8 place_num_procs           2
 OpenMP: place_proc_ids           8          48
 OpenMP: place_num           9 place_num_procs           2
 OpenMP: place_proc_ids           9          49
 OpenMP: place_num          10 place_num_procs           2
 OpenMP: place_proc_ids          10          50
 OpenMP: place_num          11 place_num_procs           2
 OpenMP: place_proc_ids          11          51
 OpenMP: place_num          12 place_num_procs           2
 OpenMP: place_proc_ids          12          52
 OpenMP: place_num          13 place_num_procs           2
 OpenMP: place_proc_ids          13          53
 OpenMP: place_num          14 place_num_procs           2
 OpenMP: place_proc_ids          14          54
 OpenMP: place_num          15 place_num_procs           2
 OpenMP: place_proc_ids          15          55
 OpenMP: place_num          16 place_num_procs           2
 OpenMP: place_proc_ids          16          56
 OpenMP: place_num          17 place_num_procs           2
 OpenMP: place_proc_ids          17          57
 OpenMP: place_num          18 place_num_procs           2
 OpenMP: place_proc_ids          18          58
 OpenMP: place_num          19 place_num_procs           2
 OpenMP: place_proc_ids          19          59
 OpenMP: place_num          20 place_num_procs           2
 OpenMP: place_proc_ids          20          60
 OpenMP: place_num          21 place_num_procs           2
 OpenMP: place_proc_ids          21          61
 OpenMP: place_num          22 place_num_procs           2
 OpenMP: place_proc_ids          22          62
 OpenMP: place_num          23 place_num_procs           2
 OpenMP: place_proc_ids          23          63
 OpenMP: place_num          24 place_num_procs           2
 OpenMP: place_proc_ids          24          64
 OpenMP: place_num          25 place_num_procs           2
 OpenMP: place_proc_ids          25          65
 OpenMP: place_num          26 place_num_procs           2
 OpenMP: place_proc_ids          26          66
 OpenMP: place_num          27 place_num_procs           2
 OpenMP: place_proc_ids          27          67
 OpenMP: place_num          28 place_num_procs           2
 OpenMP: place_proc_ids          28          68
 OpenMP: place_num          29 place_num_procs           2
 OpenMP: place_proc_ids          29          69
 OpenMP: place_num          30 place_num_procs           2
 OpenMP: place_proc_ids          30          70
 OpenMP: place_num          31 place_num_procs           2
 OpenMP: place_proc_ids          31          71
 OpenMP: place_num          32 place_num_procs           2
 OpenMP: place_proc_ids          32          72
 OpenMP: place_num          33 place_num_procs           2
 OpenMP: place_proc_ids          33          73
 OpenMP: place_num          34 place_num_procs           2
 OpenMP: place_proc_ids          34          74
 OpenMP: place_num          35 place_num_procs           2
 OpenMP: place_proc_ids          35          75
 OpenMP: place_num          36 place_num_procs           2
 OpenMP: place_proc_ids          36          76
 OpenMP: place_num          37 place_num_procs           2
 OpenMP: place_proc_ids          37          77
 OpenMP: place_num          38 place_num_procs           2
 OpenMP: place_proc_ids          38          78
 OpenMP: place_num          39 place_num_procs           2
 OpenMP: place_proc_ids          39          79
 print slave in nested region
  OpenMP: get_num_threads           2
  OpenMP: get_max_threads           4
  OpenMP: get_thread_num           1
  OpenMP: get_place_num          21
  OpenMP: get_place_num          21
 OpenMP: place_num           0 place_num_procs           2
 OpenMP: place_proc_ids           0          40
 OpenMP: place_num           1 place_num_procs           2
 OpenMP: place_proc_ids           1          41
 OpenMP: place_num           2 place_num_procs           2
 OpenMP: place_proc_ids           2          42
 OpenMP: place_num           3 place_num_procs           2
 OpenMP: place_proc_ids           3          43
 OpenMP: place_num           4 place_num_procs           2
 OpenMP: place_proc_ids           4          44
 OpenMP: place_num           5 place_num_procs           2
 OpenMP: place_proc_ids           5          45
 OpenMP: place_num           6 place_num_procs           2
 OpenMP: place_proc_ids           6          46
 OpenMP: place_num           7 place_num_procs           2
 OpenMP: place_proc_ids           7          47
 OpenMP: place_num           8 place_num_procs           2
 OpenMP: place_proc_ids           8          48
 OpenMP: place_num           9 place_num_procs           2
 OpenMP: place_proc_ids           9          49
 OpenMP: place_num          10 place_num_procs           2
 OpenMP: place_proc_ids          10          50
 OpenMP: place_num          11 place_num_procs           2
 OpenMP: place_proc_ids          11          51
 OpenMP: place_num          12 place_num_procs           2
 OpenMP: place_proc_ids          12          52
 OpenMP: place_num          13 place_num_procs           2
 OpenMP: place_proc_ids          13          53
 OpenMP: place_num          14 place_num_procs           2
 OpenMP: place_proc_ids          14          54
 OpenMP: place_num          15 place_num_procs           2
 OpenMP: place_proc_ids          15          55
 OpenMP: place_num          16 place_num_procs           2
 OpenMP: place_proc_ids          16          56
 OpenMP: place_num          17 place_num_procs           2
 OpenMP: place_proc_ids          17          57
 OpenMP: place_num          18 place_num_procs           2
 OpenMP: place_proc_ids          18          58
 OpenMP: place_num          19 place_num_procs           2
 OpenMP: place_proc_ids          19          59
 OpenMP: place_num          20 place_num_procs           2
 OpenMP: place_proc_ids          20          60
 OpenMP: place_num          21 place_num_procs           2
 OpenMP: place_proc_ids          21          61
 OpenMP: place_num          22 place_num_procs           2
 OpenMP: place_proc_ids          22          62
 OpenMP: place_num          23 place_num_procs           2
 OpenMP: place_proc_ids          23          63
 OpenMP: place_num          24 place_num_procs           2
 OpenMP: place_proc_ids          24          64
 OpenMP: place_num          25 place_num_procs           2
 OpenMP: place_proc_ids          25          65
 OpenMP: place_num          26 place_num_procs           2
 OpenMP: place_proc_ids          26          66
 OpenMP: place_num          27 place_num_procs           2
 OpenMP: place_proc_ids          27          67
 OpenMP: place_num          28 place_num_procs           2
 OpenMP: place_proc_ids          28          68
 OpenMP: place_num          29 place_num_procs           2
 OpenMP: place_proc_ids          29          69
 OpenMP: place_num          30 place_num_procs           2
 OpenMP: place_proc_ids          30          70
 OpenMP: place_num          31 place_num_procs           2
 OpenMP: place_proc_ids          31          71
 OpenMP: place_num          32 place_num_procs           2
 OpenMP: place_proc_ids          32          72
 OpenMP: place_num          33 place_num_procs           2
 OpenMP: place_proc_ids          33          73
 OpenMP: place_num          34 place_num_procs           2
 OpenMP: place_proc_ids          34          74
 OpenMP: place_num          35 place_num_procs           2
 OpenMP: place_proc_ids          35          75
 OpenMP: place_num          36 place_num_procs           2
 OpenMP: place_proc_ids          36          76
 OpenMP: place_num          37 place_num_procs           2
 OpenMP: place_proc_ids          37          77
 OpenMP: place_num          38 place_num_procs           2
 OpenMP: place_proc_ids          38          78
 OpenMP: place_num          39 place_num_procs           2
 OpenMP: place_proc_ids          39          79

So you see the thread 1 in thread 1's parallel region, is pinned to place 21.

Now if you change the number of threads of the initial parallel region to the number of NUMA-domains you should get for each subsequent parallel region threads inside the same NUMA domain.

jimdempseyatthecove
Honored Contributor III
1,303 Views

Great! Thank you very much.

 

I wasn't aware of the OpenMP clause proc_bind(...), I thought it was only available as an environment variable. I should have RTFM.

 

This may need a little bit of work/thought on a system with a mix of performance cores and efficiency cores.

 

Thanks again

 

Jim

0 Kudos
TobiasK
Moderator
1,269 Views

@jimdempseyatthecove


well RTFM for OpenMP is sometimes too much:)


Please let me know when you figure out how to handle hybrid architectures!


Best

Tobias


0 Kudos
Reply