- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have an application that may benefit with nested parallelism.
The CPU has a mix of high performance cores and efficiency cores.
An example configuration has 4 main things, easily assigned to affinity pinned CPUs
KMP_HW_SUBSET=4ceff0
or
KMP_HW_SUBSET=ceff0 (any of the 8 high performance cores)
Now, as the simulation runs, the "things" grow. At some point, it is determined, that a particular thing can benefit from nested parallelism.
When nested is engaged for a particular OpenMP thread, it spawns a new thread team.
How do I control the affinity of these additional threads?
Are they pinned to the same core(s) as the spawning thread?
IIF I chose KMP_HW_SUBSET=4ceff0
If so, they would be tripping over each other.
IIF I chose KMP_HW_SUBSET=ceff0 (any of the 8 high performance cores)
Would the new team nest level (sans its master thread) be excluded from the ceff0 cores as they are assigned to the main-level threads.
There appears to be no guidance on this matter in the reference manual.
To add to the complication, the application spawns a front-end, which includes live charts and graphs of the simulation running. It would be preferred for these/this thread to be not affinity pinned (let run on efficiency core when all performance cores are busy).
Note, while I know I can use:
SetThreadAffinityMask(hThread, ulpAffinityMask)
To get what I want, this does mean I must query the CPU as to which cores are performance and which are efficiency.
While I can use: OMP_NUM_THREADS=4,2
for 4 main level threads, with each main thread able to construct a team up to 2 threads.
There is no specification of thread placement.
Reference states: Sets the thread affinity policy to be used for parallel regions at the corresponding nested level.
But there is no indication as to how you specify a nest level.
OMP_PLACES
Doesn't have a means of specifying the nest level.
Any assistance on this will be appreciated.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I understand you concern, but as far as I know, there is currently no interest in bringing NUMA aware pinning to OpenMP. I found my small sample program that demonstrates what I had in mind to somewhat deal with the situation. Note I tested it only on Linux, and you have to set OMP_PLACES=cores then it should show you the master and slave threads in the first OpenMP region are pinned with a gap, while the threads in the nested region from the slave thread are pinned close to this second thread.
program omp_nested_pinning
use omp_lib
implicit none
integer :: i,active_levels
active_levels=omp_get_max_active_levels()
if(active_levels.lt.2)then
write(*,*) 'warning, OMP_MAX_ACTIVE_LEVELS is set to',active_levels
write(*,*) 'setting OMP_MAX_ACTIVE_LEVELS to 2 now'
call omp_set_max_active_levels(2)
end if
write(*,*) 'print outside'
call print_places()
!$omp parallel num_threads(2) proc_bind(spread)
if(omp_get_thread_num().eq.0)then
write(*,*) 'print master'
call print_places()
end if
!$omp barrier
if(omp_get_thread_num().eq.1)then
write(*,*) 'print slave'
call print_places()
!$omp parallel num_threads(2) proc_bind(close)
if(omp_get_thread_num().eq.0)then
write(*,*) 'print master in nested region'
call print_places()
end if
!$omp barrier
if(omp_get_thread_num().eq.1)then
write(*,*) 'print slave in nested region'
call print_places()
end if
!$omp end parallel
end if
!$omp end parallel
contains
subroutine print_places()
implicit none
integer, allocatable :: place_proc_ids(:)
integer :: place_num_procs,num_places,place_num
write(*,*)' OpenMP: get_num_threads', omp_get_num_threads()
write(*,*)' OpenMP: get_max_threads', omp_get_max_threads()
write(*,*)' OpenMP: get_thread_num', omp_get_thread_num()
write(*,*)' OpenMP: get_place_num', omp_get_place_num()
write(*,*)' OpenMP: get_place_num', omp_get_place_num()
num_places=omp_get_num_places()
do place_num=0,num_places-1
place_num_procs=omp_get_place_num_procs(place_num)
allocate(place_proc_ids(place_num_procs))
call omp_get_place_proc_ids(place_num,place_proc_ids)
write(*,*) 'OpenMP: place_num',place_num,'place_num_procs',place_num_procs
write(*,*) 'OpenMP: place_proc_ids',place_proc_ids
deallocate(place_proc_ids)
end do
end subroutine print_places
end program omp_nested_pinning
OPENMP DISPLAY ENVIRONMENT BEGIN
_OPENMP='201611'
[host] OMP_AFFINITY_FORMAT='OMP: pid %P tid %i thread %n bound to OS proc set {%A}'
[host] OMP_ALLOCATOR='omp_default_mem_alloc'
[host] OMP_CANCELLATION='FALSE'
[host] OMP_DEBUG='disabled'
[host] OMP_DEFAULT_DEVICE='0'
[host] OMP_DISPLAY_AFFINITY='FALSE'
[host] OMP_DISPLAY_ENV='TRUE'
[host] OMP_DYNAMIC='FALSE'
[host] OMP_MAX_ACTIVE_LEVELS='1'
[host] OMP_MAX_TASK_PRIORITY='0'
[host] OMP_NESTED: deprecated; max-active-levels-var=1
[host] OMP_NUM_TEAMS='0'
[host] OMP_NUM_THREADS='4'
[host] OMP_PLACES='cores'
[host] OMP_PROC_BIND='close'
[host] OMP_SCHEDULE='static'
[host] OMP_STACKSIZE='4M'
[host] OMP_TARGET_OFFLOAD=DEFAULT
[host] OMP_TEAMS_THREAD_LIMIT='0'
[host] OMP_THREAD_LIMIT='2147483647'
[host] OMP_TOOL='enabled'
[host] OMP_TOOL_LIBRARIES: value is not defined
[host] OMP_TOOL_VERBOSE_INIT: value is not defined
[host] OMP_WAIT_POLICY='PASSIVE'
OPENMP DISPLAY ENVIRONMENT END
warning, OMP_MAX_ACTIVE_LEVELS is set to 1
setting OMP_MAX_ACTIVE_LEVELS to 2 now
print outside
OpenMP: get_num_threads 1
OpenMP: get_max_threads 4
OpenMP: get_thread_num 0
OpenMP: get_place_num 0
OpenMP: get_place_num 0
OpenMP: place_num 0 place_num_procs 2
OpenMP: place_proc_ids 0 40
OpenMP: place_num 1 place_num_procs 2
OpenMP: place_proc_ids 1 41
OpenMP: place_num 2 place_num_procs 2
OpenMP: place_proc_ids 2 42
OpenMP: place_num 3 place_num_procs 2
OpenMP: place_proc_ids 3 43
OpenMP: place_num 4 place_num_procs 2
OpenMP: place_proc_ids 4 44
OpenMP: place_num 5 place_num_procs 2
OpenMP: place_proc_ids 5 45
OpenMP: place_num 6 place_num_procs 2
OpenMP: place_proc_ids 6 46
OpenMP: place_num 7 place_num_procs 2
OpenMP: place_proc_ids 7 47
OpenMP: place_num 8 place_num_procs 2
OpenMP: place_proc_ids 8 48
OpenMP: place_num 9 place_num_procs 2
OpenMP: place_proc_ids 9 49
OpenMP: place_num 10 place_num_procs 2
OpenMP: place_proc_ids 10 50
OpenMP: place_num 11 place_num_procs 2
OpenMP: place_proc_ids 11 51
OpenMP: place_num 12 place_num_procs 2
OpenMP: place_proc_ids 12 52
OpenMP: place_num 13 place_num_procs 2
OpenMP: place_proc_ids 13 53
OpenMP: place_num 14 place_num_procs 2
OpenMP: place_proc_ids 14 54
OpenMP: place_num 15 place_num_procs 2
OpenMP: place_proc_ids 15 55
OpenMP: place_num 16 place_num_procs 2
OpenMP: place_proc_ids 16 56
OpenMP: place_num 17 place_num_procs 2
OpenMP: place_proc_ids 17 57
OpenMP: place_num 18 place_num_procs 2
OpenMP: place_proc_ids 18 58
OpenMP: place_num 19 place_num_procs 2
OpenMP: place_proc_ids 19 59
OpenMP: place_num 20 place_num_procs 2
OpenMP: place_proc_ids 20 60
OpenMP: place_num 21 place_num_procs 2
OpenMP: place_proc_ids 21 61
OpenMP: place_num 22 place_num_procs 2
OpenMP: place_proc_ids 22 62
OpenMP: place_num 23 place_num_procs 2
OpenMP: place_proc_ids 23 63
OpenMP: place_num 24 place_num_procs 2
OpenMP: place_proc_ids 24 64
OpenMP: place_num 25 place_num_procs 2
OpenMP: place_proc_ids 25 65
OpenMP: place_num 26 place_num_procs 2
OpenMP: place_proc_ids 26 66
OpenMP: place_num 27 place_num_procs 2
OpenMP: place_proc_ids 27 67
OpenMP: place_num 28 place_num_procs 2
OpenMP: place_proc_ids 28 68
OpenMP: place_num 29 place_num_procs 2
OpenMP: place_proc_ids 29 69
OpenMP: place_num 30 place_num_procs 2
OpenMP: place_proc_ids 30 70
OpenMP: place_num 31 place_num_procs 2
OpenMP: place_proc_ids 31 71
OpenMP: place_num 32 place_num_procs 2
OpenMP: place_proc_ids 32 72
OpenMP: place_num 33 place_num_procs 2
OpenMP: place_proc_ids 33 73
OpenMP: place_num 34 place_num_procs 2
OpenMP: place_proc_ids 34 74
OpenMP: place_num 35 place_num_procs 2
OpenMP: place_proc_ids 35 75
OpenMP: place_num 36 place_num_procs 2
OpenMP: place_proc_ids 36 76
OpenMP: place_num 37 place_num_procs 2
OpenMP: place_proc_ids 37 77
OpenMP: place_num 38 place_num_procs 2
OpenMP: place_proc_ids 38 78
OpenMP: place_num 39 place_num_procs 2
OpenMP: place_proc_ids 39 79
print master
OpenMP: get_num_threads 2
OpenMP: get_max_threads 4
OpenMP: get_thread_num 0
OpenMP: get_place_num 0
OpenMP: get_place_num 0
OpenMP: place_num 0 place_num_procs 2
OpenMP: place_proc_ids 0 40
OpenMP: place_num 1 place_num_procs 2
OpenMP: place_proc_ids 1 41
OpenMP: place_num 2 place_num_procs 2
OpenMP: place_proc_ids 2 42
OpenMP: place_num 3 place_num_procs 2
OpenMP: place_proc_ids 3 43
OpenMP: place_num 4 place_num_procs 2
OpenMP: place_proc_ids 4 44
OpenMP: place_num 5 place_num_procs 2
OpenMP: place_proc_ids 5 45
OpenMP: place_num 6 place_num_procs 2
OpenMP: place_proc_ids 6 46
OpenMP: place_num 7 place_num_procs 2
OpenMP: place_proc_ids 7 47
OpenMP: place_num 8 place_num_procs 2
OpenMP: place_proc_ids 8 48
OpenMP: place_num 9 place_num_procs 2
OpenMP: place_proc_ids 9 49
OpenMP: place_num 10 place_num_procs 2
OpenMP: place_proc_ids 10 50
OpenMP: place_num 11 place_num_procs 2
OpenMP: place_proc_ids 11 51
OpenMP: place_num 12 place_num_procs 2
OpenMP: place_proc_ids 12 52
OpenMP: place_num 13 place_num_procs 2
OpenMP: place_proc_ids 13 53
OpenMP: place_num 14 place_num_procs 2
OpenMP: place_proc_ids 14 54
OpenMP: place_num 15 place_num_procs 2
OpenMP: place_proc_ids 15 55
OpenMP: place_num 16 place_num_procs 2
OpenMP: place_proc_ids 16 56
OpenMP: place_num 17 place_num_procs 2
OpenMP: place_proc_ids 17 57
OpenMP: place_num 18 place_num_procs 2
OpenMP: place_proc_ids 18 58
OpenMP: place_num 19 place_num_procs 2
OpenMP: place_proc_ids 19 59
OpenMP: place_num 20 place_num_procs 2
OpenMP: place_proc_ids 20 60
OpenMP: place_num 21 place_num_procs 2
OpenMP: place_proc_ids 21 61
OpenMP: place_num 22 place_num_procs 2
OpenMP: place_proc_ids 22 62
OpenMP: place_num 23 place_num_procs 2
OpenMP: place_proc_ids 23 63
OpenMP: place_num 24 place_num_procs 2
OpenMP: place_proc_ids 24 64
OpenMP: place_num 25 place_num_procs 2
OpenMP: place_proc_ids 25 65
OpenMP: place_num 26 place_num_procs 2
OpenMP: place_proc_ids 26 66
OpenMP: place_num 27 place_num_procs 2
OpenMP: place_proc_ids 27 67
OpenMP: place_num 28 place_num_procs 2
OpenMP: place_proc_ids 28 68
OpenMP: place_num 29 place_num_procs 2
OpenMP: place_proc_ids 29 69
OpenMP: place_num 30 place_num_procs 2
OpenMP: place_proc_ids 30 70
OpenMP: place_num 31 place_num_procs 2
OpenMP: place_proc_ids 31 71
OpenMP: place_num 32 place_num_procs 2
OpenMP: place_proc_ids 32 72
OpenMP: place_num 33 place_num_procs 2
OpenMP: place_proc_ids 33 73
OpenMP: place_num 34 place_num_procs 2
OpenMP: place_proc_ids 34 74
OpenMP: place_num 35 place_num_procs 2
OpenMP: place_proc_ids 35 75
OpenMP: place_num 36 place_num_procs 2
OpenMP: place_proc_ids 36 76
OpenMP: place_num 37 place_num_procs 2
OpenMP: place_proc_ids 37 77
OpenMP: place_num 38 place_num_procs 2
OpenMP: place_proc_ids 38 78
OpenMP: place_num 39 place_num_procs 2
OpenMP: place_proc_ids 39 79
print slave
OpenMP: get_num_threads 2
OpenMP: get_max_threads 4
OpenMP: get_thread_num 1
OpenMP: get_place_num 20
OpenMP: get_place_num 20
OpenMP: place_num 0 place_num_procs 2
OpenMP: place_proc_ids 0 40
OpenMP: place_num 1 place_num_procs 2
OpenMP: place_proc_ids 1 41
OpenMP: place_num 2 place_num_procs 2
OpenMP: place_proc_ids 2 42
OpenMP: place_num 3 place_num_procs 2
OpenMP: place_proc_ids 3 43
OpenMP: place_num 4 place_num_procs 2
OpenMP: place_proc_ids 4 44
OpenMP: place_num 5 place_num_procs 2
OpenMP: place_proc_ids 5 45
OpenMP: place_num 6 place_num_procs 2
OpenMP: place_proc_ids 6 46
OpenMP: place_num 7 place_num_procs 2
OpenMP: place_proc_ids 7 47
OpenMP: place_num 8 place_num_procs 2
OpenMP: place_proc_ids 8 48
OpenMP: place_num 9 place_num_procs 2
OpenMP: place_proc_ids 9 49
OpenMP: place_num 10 place_num_procs 2
OpenMP: place_proc_ids 10 50
OpenMP: place_num 11 place_num_procs 2
OpenMP: place_proc_ids 11 51
OpenMP: place_num 12 place_num_procs 2
OpenMP: place_proc_ids 12 52
OpenMP: place_num 13 place_num_procs 2
OpenMP: place_proc_ids 13 53
OpenMP: place_num 14 place_num_procs 2
OpenMP: place_proc_ids 14 54
OpenMP: place_num 15 place_num_procs 2
OpenMP: place_proc_ids 15 55
OpenMP: place_num 16 place_num_procs 2
OpenMP: place_proc_ids 16 56
OpenMP: place_num 17 place_num_procs 2
OpenMP: place_proc_ids 17 57
OpenMP: place_num 18 place_num_procs 2
OpenMP: place_proc_ids 18 58
OpenMP: place_num 19 place_num_procs 2
OpenMP: place_proc_ids 19 59
OpenMP: place_num 20 place_num_procs 2
OpenMP: place_proc_ids 20 60
OpenMP: place_num 21 place_num_procs 2
OpenMP: place_proc_ids 21 61
OpenMP: place_num 22 place_num_procs 2
OpenMP: place_proc_ids 22 62
OpenMP: place_num 23 place_num_procs 2
OpenMP: place_proc_ids 23 63
OpenMP: place_num 24 place_num_procs 2
OpenMP: place_proc_ids 24 64
OpenMP: place_num 25 place_num_procs 2
OpenMP: place_proc_ids 25 65
OpenMP: place_num 26 place_num_procs 2
OpenMP: place_proc_ids 26 66
OpenMP: place_num 27 place_num_procs 2
OpenMP: place_proc_ids 27 67
OpenMP: place_num 28 place_num_procs 2
OpenMP: place_proc_ids 28 68
OpenMP: place_num 29 place_num_procs 2
OpenMP: place_proc_ids 29 69
OpenMP: place_num 30 place_num_procs 2
OpenMP: place_proc_ids 30 70
OpenMP: place_num 31 place_num_procs 2
OpenMP: place_proc_ids 31 71
OpenMP: place_num 32 place_num_procs 2
OpenMP: place_proc_ids 32 72
OpenMP: place_num 33 place_num_procs 2
OpenMP: place_proc_ids 33 73
OpenMP: place_num 34 place_num_procs 2
OpenMP: place_proc_ids 34 74
OpenMP: place_num 35 place_num_procs 2
OpenMP: place_proc_ids 35 75
OpenMP: place_num 36 place_num_procs 2
OpenMP: place_proc_ids 36 76
OpenMP: place_num 37 place_num_procs 2
OpenMP: place_proc_ids 37 77
OpenMP: place_num 38 place_num_procs 2
OpenMP: place_proc_ids 38 78
OpenMP: place_num 39 place_num_procs 2
OpenMP: place_proc_ids 39 79
print master in nested region
OpenMP: get_num_threads 2
OpenMP: get_max_threads 4
OpenMP: get_thread_num 0
OpenMP: get_place_num 20
OpenMP: get_place_num 20
OpenMP: place_num 0 place_num_procs 2
OpenMP: place_proc_ids 0 40
OpenMP: place_num 1 place_num_procs 2
OpenMP: place_proc_ids 1 41
OpenMP: place_num 2 place_num_procs 2
OpenMP: place_proc_ids 2 42
OpenMP: place_num 3 place_num_procs 2
OpenMP: place_proc_ids 3 43
OpenMP: place_num 4 place_num_procs 2
OpenMP: place_proc_ids 4 44
OpenMP: place_num 5 place_num_procs 2
OpenMP: place_proc_ids 5 45
OpenMP: place_num 6 place_num_procs 2
OpenMP: place_proc_ids 6 46
OpenMP: place_num 7 place_num_procs 2
OpenMP: place_proc_ids 7 47
OpenMP: place_num 8 place_num_procs 2
OpenMP: place_proc_ids 8 48
OpenMP: place_num 9 place_num_procs 2
OpenMP: place_proc_ids 9 49
OpenMP: place_num 10 place_num_procs 2
OpenMP: place_proc_ids 10 50
OpenMP: place_num 11 place_num_procs 2
OpenMP: place_proc_ids 11 51
OpenMP: place_num 12 place_num_procs 2
OpenMP: place_proc_ids 12 52
OpenMP: place_num 13 place_num_procs 2
OpenMP: place_proc_ids 13 53
OpenMP: place_num 14 place_num_procs 2
OpenMP: place_proc_ids 14 54
OpenMP: place_num 15 place_num_procs 2
OpenMP: place_proc_ids 15 55
OpenMP: place_num 16 place_num_procs 2
OpenMP: place_proc_ids 16 56
OpenMP: place_num 17 place_num_procs 2
OpenMP: place_proc_ids 17 57
OpenMP: place_num 18 place_num_procs 2
OpenMP: place_proc_ids 18 58
OpenMP: place_num 19 place_num_procs 2
OpenMP: place_proc_ids 19 59
OpenMP: place_num 20 place_num_procs 2
OpenMP: place_proc_ids 20 60
OpenMP: place_num 21 place_num_procs 2
OpenMP: place_proc_ids 21 61
OpenMP: place_num 22 place_num_procs 2
OpenMP: place_proc_ids 22 62
OpenMP: place_num 23 place_num_procs 2
OpenMP: place_proc_ids 23 63
OpenMP: place_num 24 place_num_procs 2
OpenMP: place_proc_ids 24 64
OpenMP: place_num 25 place_num_procs 2
OpenMP: place_proc_ids 25 65
OpenMP: place_num 26 place_num_procs 2
OpenMP: place_proc_ids 26 66
OpenMP: place_num 27 place_num_procs 2
OpenMP: place_proc_ids 27 67
OpenMP: place_num 28 place_num_procs 2
OpenMP: place_proc_ids 28 68
OpenMP: place_num 29 place_num_procs 2
OpenMP: place_proc_ids 29 69
OpenMP: place_num 30 place_num_procs 2
OpenMP: place_proc_ids 30 70
OpenMP: place_num 31 place_num_procs 2
OpenMP: place_proc_ids 31 71
OpenMP: place_num 32 place_num_procs 2
OpenMP: place_proc_ids 32 72
OpenMP: place_num 33 place_num_procs 2
OpenMP: place_proc_ids 33 73
OpenMP: place_num 34 place_num_procs 2
OpenMP: place_proc_ids 34 74
OpenMP: place_num 35 place_num_procs 2
OpenMP: place_proc_ids 35 75
OpenMP: place_num 36 place_num_procs 2
OpenMP: place_proc_ids 36 76
OpenMP: place_num 37 place_num_procs 2
OpenMP: place_proc_ids 37 77
OpenMP: place_num 38 place_num_procs 2
OpenMP: place_proc_ids 38 78
OpenMP: place_num 39 place_num_procs 2
OpenMP: place_proc_ids 39 79
print slave in nested region
OpenMP: get_num_threads 2
OpenMP: get_max_threads 4
OpenMP: get_thread_num 1
OpenMP: get_place_num 21
OpenMP: get_place_num 21
OpenMP: place_num 0 place_num_procs 2
OpenMP: place_proc_ids 0 40
OpenMP: place_num 1 place_num_procs 2
OpenMP: place_proc_ids 1 41
OpenMP: place_num 2 place_num_procs 2
OpenMP: place_proc_ids 2 42
OpenMP: place_num 3 place_num_procs 2
OpenMP: place_proc_ids 3 43
OpenMP: place_num 4 place_num_procs 2
OpenMP: place_proc_ids 4 44
OpenMP: place_num 5 place_num_procs 2
OpenMP: place_proc_ids 5 45
OpenMP: place_num 6 place_num_procs 2
OpenMP: place_proc_ids 6 46
OpenMP: place_num 7 place_num_procs 2
OpenMP: place_proc_ids 7 47
OpenMP: place_num 8 place_num_procs 2
OpenMP: place_proc_ids 8 48
OpenMP: place_num 9 place_num_procs 2
OpenMP: place_proc_ids 9 49
OpenMP: place_num 10 place_num_procs 2
OpenMP: place_proc_ids 10 50
OpenMP: place_num 11 place_num_procs 2
OpenMP: place_proc_ids 11 51
OpenMP: place_num 12 place_num_procs 2
OpenMP: place_proc_ids 12 52
OpenMP: place_num 13 place_num_procs 2
OpenMP: place_proc_ids 13 53
OpenMP: place_num 14 place_num_procs 2
OpenMP: place_proc_ids 14 54
OpenMP: place_num 15 place_num_procs 2
OpenMP: place_proc_ids 15 55
OpenMP: place_num 16 place_num_procs 2
OpenMP: place_proc_ids 16 56
OpenMP: place_num 17 place_num_procs 2
OpenMP: place_proc_ids 17 57
OpenMP: place_num 18 place_num_procs 2
OpenMP: place_proc_ids 18 58
OpenMP: place_num 19 place_num_procs 2
OpenMP: place_proc_ids 19 59
OpenMP: place_num 20 place_num_procs 2
OpenMP: place_proc_ids 20 60
OpenMP: place_num 21 place_num_procs 2
OpenMP: place_proc_ids 21 61
OpenMP: place_num 22 place_num_procs 2
OpenMP: place_proc_ids 22 62
OpenMP: place_num 23 place_num_procs 2
OpenMP: place_proc_ids 23 63
OpenMP: place_num 24 place_num_procs 2
OpenMP: place_proc_ids 24 64
OpenMP: place_num 25 place_num_procs 2
OpenMP: place_proc_ids 25 65
OpenMP: place_num 26 place_num_procs 2
OpenMP: place_proc_ids 26 66
OpenMP: place_num 27 place_num_procs 2
OpenMP: place_proc_ids 27 67
OpenMP: place_num 28 place_num_procs 2
OpenMP: place_proc_ids 28 68
OpenMP: place_num 29 place_num_procs 2
OpenMP: place_proc_ids 29 69
OpenMP: place_num 30 place_num_procs 2
OpenMP: place_proc_ids 30 70
OpenMP: place_num 31 place_num_procs 2
OpenMP: place_proc_ids 31 71
OpenMP: place_num 32 place_num_procs 2
OpenMP: place_proc_ids 32 72
OpenMP: place_num 33 place_num_procs 2
OpenMP: place_proc_ids 33 73
OpenMP: place_num 34 place_num_procs 2
OpenMP: place_proc_ids 34 74
OpenMP: place_num 35 place_num_procs 2
OpenMP: place_proc_ids 35 75
OpenMP: place_num 36 place_num_procs 2
OpenMP: place_proc_ids 36 76
OpenMP: place_num 37 place_num_procs 2
OpenMP: place_proc_ids 37 77
OpenMP: place_num 38 place_num_procs 2
OpenMP: place_proc_ids 38 78
OpenMP: place_num 39 place_num_procs 2
OpenMP: place_proc_ids 39 79
So you see the thread 1 in thread 1's parallel region, is pinned to place 21.
Now if you change the number of threads of the initial parallel region to the number of NUMA-domains you should get for each subsequent parallel region threads inside the same NUMA domain.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @jimdempseyatthecove
Nested openmp, that's always interesting:)
Do you want to use the hyperthreads or the e-cores only in the nested region? Or do you only want to use the P-cores always?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In this case, I wish to use the P-cores.
While I can set the process affinity to the P-cores (1st 8 in my case) then let the main thread team, and its team members's thread teams (at nest level 1) fend for themselves, this seams a bit inefficient for maintaining data in cache (when threads migrate within the process affinity).
It would be nice, if the OpenMP, environment and/or API (omp_....) could facilitate the selection.
In the general case, let's look at a 2 socket system running an application such as mine.
Suppose at program start, there are two or say four collections that need to be processed. These collections can grow parodically during processing.
At the beginning, it may be desirable in the two-collection case to place each collection on a different socket this can be done with environment variable settings. During execution, as a collection grows, at some point, it becomes advantageous to parallelize the collection via nested parallelism. It is advantageous to bind the new thread team to the socket ie affinity of spawning thread, such that L3 cache can be shared amongst the halves of the collection.
Considering that OpenMP has offload directives that one can specify where (which accelerator) to process a region, I am surprised that no effort was made to permit a directive to specify a preference as to which socket or NUMA node to spawn a nest level onto.
On an unrelated topic.
Windows Task Manager what is "CPU n - Parked" relate to?
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@jimdempseyatthecove
yes you are right, numa aware OpenMP is really something which is basically non-existent.
I guess the reason is that almost all applications are using MPI to handle the numa placement. So in your case, the MPI rank would be placed on each socket and spawns, as needed, openmp threads within it's CPU set.
I once thought about the same problem and I think it's best to do the following:
For the first openmp level specify proc bind spread, either by environment or at each parallel region you create. For the nested regions you specify proc bind close. For specifying the omp places where the threads get pinned it's not relevant if they are nested threads or not, it's simply counting. E.g. if you have 8 places, e.g. core id's 0-7, you start the first openmp region with two threads and each thread later spawns a nested region with 4 threads then all 8 places are in use. The first two threads will pin to place 0 and 4 due to proc bind spread and the nested threads will additionally use 1,2,3 and 5,6,7.
At least that is what I remember from my experiments at that time. Let me check if I have some example code around to show the behavior.
Regarding offload to GPUs, yes that's also more complicated. But again, (Intel) MPI is of help here. We support pinning a set of ranks to a particular set of GPUs, so that inside a MPI rank you only see the GPU that is available. I think that is again much simpler than handling the device selection from within OpenMP. With OpenMP you simply don't know if the GPU id is now connected to the socket you are running your threads on etc. Intel MPI has a logic to determine which GPU is connected to which socket / numa domain and can then pin to the closest GPU available.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FWIW
Years ago, I wrote a threading system called QuickThread. It didn't get much traction, and so I abandoned it. It was C++ but the earliest phases also included Fortran. Fortran was dropped because at that time you had to hand write all the interfaces (no auto gen interfaces at that time). At program start, the initialization code would discover the topology of the system (NUMA levels and nodes, L3's, L2's and L1's). The thread team invoking statements could optionally include a placement argument. For example,
parallel_for(
M0$, // same socket as spawing thread
(intptr_t)0, nTiles,
[&](intptr_t iBegin, intptr_t iEnd)
{
...
// nested within socket
parallel_for(
L2$, // same L2 cach as spawing thread
...
And a bunch of other features. FIFO and LIFO queues, low priority queues, I/O (high level) queues and such, blocking tasks, non-blocking tasks, waiting on other tasks, etc.... The potential users would rather stick with OpenMP or TBB.
The Fortran support syntax didn't have templates, instead you called a dispatch function:
subroutine QuickThreadTest
use SimpleArray_mod
use QuickThreadInterfaces
implicit none
! Local Variables
! Stack local control structure
type(T_QuickThreadControlStructure) :: qtControl
integer :: i
! code
do i=1, NumberOfIterations
! Slice the array processing by number of worker threads
! across range of 1 to size of array A (B, and C)
call QuickThreadQueueDo(qtControl, DoSimpleArraySlice, 1, size(A))
if(isTemporal) call QuickThreadWaitTillDone(qtControl)
end do
if(.not. isTemporal) call QuickThreadWaitTillDone(qtControl)
end subroutine QuickThreadTest
! DoSimpleArraySlice
! Perform work on slice of arrays A, B, and C
subroutine DoSimpleArraySlice(iFrom, iTo)
use SimpleArray_mod
implicit none
!DEC$ ATTRIBUTES VALUE :: iFrom
integer :: iFrom
!DEC$ ATTRIBUTES VALUE :: iTo
integer :: iTo
! Local Variables
integer :: j
do j=iFrom, iTo
A(j) = B(j) + C(j)
end do
end subroutine DoSimpleArraySlice
Where you can insert the placement in the qtControl structure.
A case of good design - but no interest.
The foundation of this threading system is such that the "placement" could also include Performance core and Efficiency core selection.
I think without much work, the OpenMP directives in both ifx and icx could incorporate clauses for placement and then use the task scheduler I built for QuickThread. Some appropriate name would have to be chosen. That would be a start, then it might be desirable to add in the other features (asychronous procedures, I/O level, FIFO, LIFO, completion procedures, etc...).
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I understand you concern, but as far as I know, there is currently no interest in bringing NUMA aware pinning to OpenMP. I found my small sample program that demonstrates what I had in mind to somewhat deal with the situation. Note I tested it only on Linux, and you have to set OMP_PLACES=cores then it should show you the master and slave threads in the first OpenMP region are pinned with a gap, while the threads in the nested region from the slave thread are pinned close to this second thread.
program omp_nested_pinning
use omp_lib
implicit none
integer :: i,active_levels
active_levels=omp_get_max_active_levels()
if(active_levels.lt.2)then
write(*,*) 'warning, OMP_MAX_ACTIVE_LEVELS is set to',active_levels
write(*,*) 'setting OMP_MAX_ACTIVE_LEVELS to 2 now'
call omp_set_max_active_levels(2)
end if
write(*,*) 'print outside'
call print_places()
!$omp parallel num_threads(2) proc_bind(spread)
if(omp_get_thread_num().eq.0)then
write(*,*) 'print master'
call print_places()
end if
!$omp barrier
if(omp_get_thread_num().eq.1)then
write(*,*) 'print slave'
call print_places()
!$omp parallel num_threads(2) proc_bind(close)
if(omp_get_thread_num().eq.0)then
write(*,*) 'print master in nested region'
call print_places()
end if
!$omp barrier
if(omp_get_thread_num().eq.1)then
write(*,*) 'print slave in nested region'
call print_places()
end if
!$omp end parallel
end if
!$omp end parallel
contains
subroutine print_places()
implicit none
integer, allocatable :: place_proc_ids(:)
integer :: place_num_procs,num_places,place_num
write(*,*)' OpenMP: get_num_threads', omp_get_num_threads()
write(*,*)' OpenMP: get_max_threads', omp_get_max_threads()
write(*,*)' OpenMP: get_thread_num', omp_get_thread_num()
write(*,*)' OpenMP: get_place_num', omp_get_place_num()
write(*,*)' OpenMP: get_place_num', omp_get_place_num()
num_places=omp_get_num_places()
do place_num=0,num_places-1
place_num_procs=omp_get_place_num_procs(place_num)
allocate(place_proc_ids(place_num_procs))
call omp_get_place_proc_ids(place_num,place_proc_ids)
write(*,*) 'OpenMP: place_num',place_num,'place_num_procs',place_num_procs
write(*,*) 'OpenMP: place_proc_ids',place_proc_ids
deallocate(place_proc_ids)
end do
end subroutine print_places
end program omp_nested_pinning
OPENMP DISPLAY ENVIRONMENT BEGIN
_OPENMP='201611'
[host] OMP_AFFINITY_FORMAT='OMP: pid %P tid %i thread %n bound to OS proc set {%A}'
[host] OMP_ALLOCATOR='omp_default_mem_alloc'
[host] OMP_CANCELLATION='FALSE'
[host] OMP_DEBUG='disabled'
[host] OMP_DEFAULT_DEVICE='0'
[host] OMP_DISPLAY_AFFINITY='FALSE'
[host] OMP_DISPLAY_ENV='TRUE'
[host] OMP_DYNAMIC='FALSE'
[host] OMP_MAX_ACTIVE_LEVELS='1'
[host] OMP_MAX_TASK_PRIORITY='0'
[host] OMP_NESTED: deprecated; max-active-levels-var=1
[host] OMP_NUM_TEAMS='0'
[host] OMP_NUM_THREADS='4'
[host] OMP_PLACES='cores'
[host] OMP_PROC_BIND='close'
[host] OMP_SCHEDULE='static'
[host] OMP_STACKSIZE='4M'
[host] OMP_TARGET_OFFLOAD=DEFAULT
[host] OMP_TEAMS_THREAD_LIMIT='0'
[host] OMP_THREAD_LIMIT='2147483647'
[host] OMP_TOOL='enabled'
[host] OMP_TOOL_LIBRARIES: value is not defined
[host] OMP_TOOL_VERBOSE_INIT: value is not defined
[host] OMP_WAIT_POLICY='PASSIVE'
OPENMP DISPLAY ENVIRONMENT END
warning, OMP_MAX_ACTIVE_LEVELS is set to 1
setting OMP_MAX_ACTIVE_LEVELS to 2 now
print outside
OpenMP: get_num_threads 1
OpenMP: get_max_threads 4
OpenMP: get_thread_num 0
OpenMP: get_place_num 0
OpenMP: get_place_num 0
OpenMP: place_num 0 place_num_procs 2
OpenMP: place_proc_ids 0 40
OpenMP: place_num 1 place_num_procs 2
OpenMP: place_proc_ids 1 41
OpenMP: place_num 2 place_num_procs 2
OpenMP: place_proc_ids 2 42
OpenMP: place_num 3 place_num_procs 2
OpenMP: place_proc_ids 3 43
OpenMP: place_num 4 place_num_procs 2
OpenMP: place_proc_ids 4 44
OpenMP: place_num 5 place_num_procs 2
OpenMP: place_proc_ids 5 45
OpenMP: place_num 6 place_num_procs 2
OpenMP: place_proc_ids 6 46
OpenMP: place_num 7 place_num_procs 2
OpenMP: place_proc_ids 7 47
OpenMP: place_num 8 place_num_procs 2
OpenMP: place_proc_ids 8 48
OpenMP: place_num 9 place_num_procs 2
OpenMP: place_proc_ids 9 49
OpenMP: place_num 10 place_num_procs 2
OpenMP: place_proc_ids 10 50
OpenMP: place_num 11 place_num_procs 2
OpenMP: place_proc_ids 11 51
OpenMP: place_num 12 place_num_procs 2
OpenMP: place_proc_ids 12 52
OpenMP: place_num 13 place_num_procs 2
OpenMP: place_proc_ids 13 53
OpenMP: place_num 14 place_num_procs 2
OpenMP: place_proc_ids 14 54
OpenMP: place_num 15 place_num_procs 2
OpenMP: place_proc_ids 15 55
OpenMP: place_num 16 place_num_procs 2
OpenMP: place_proc_ids 16 56
OpenMP: place_num 17 place_num_procs 2
OpenMP: place_proc_ids 17 57
OpenMP: place_num 18 place_num_procs 2
OpenMP: place_proc_ids 18 58
OpenMP: place_num 19 place_num_procs 2
OpenMP: place_proc_ids 19 59
OpenMP: place_num 20 place_num_procs 2
OpenMP: place_proc_ids 20 60
OpenMP: place_num 21 place_num_procs 2
OpenMP: place_proc_ids 21 61
OpenMP: place_num 22 place_num_procs 2
OpenMP: place_proc_ids 22 62
OpenMP: place_num 23 place_num_procs 2
OpenMP: place_proc_ids 23 63
OpenMP: place_num 24 place_num_procs 2
OpenMP: place_proc_ids 24 64
OpenMP: place_num 25 place_num_procs 2
OpenMP: place_proc_ids 25 65
OpenMP: place_num 26 place_num_procs 2
OpenMP: place_proc_ids 26 66
OpenMP: place_num 27 place_num_procs 2
OpenMP: place_proc_ids 27 67
OpenMP: place_num 28 place_num_procs 2
OpenMP: place_proc_ids 28 68
OpenMP: place_num 29 place_num_procs 2
OpenMP: place_proc_ids 29 69
OpenMP: place_num 30 place_num_procs 2
OpenMP: place_proc_ids 30 70
OpenMP: place_num 31 place_num_procs 2
OpenMP: place_proc_ids 31 71
OpenMP: place_num 32 place_num_procs 2
OpenMP: place_proc_ids 32 72
OpenMP: place_num 33 place_num_procs 2
OpenMP: place_proc_ids 33 73
OpenMP: place_num 34 place_num_procs 2
OpenMP: place_proc_ids 34 74
OpenMP: place_num 35 place_num_procs 2
OpenMP: place_proc_ids 35 75
OpenMP: place_num 36 place_num_procs 2
OpenMP: place_proc_ids 36 76
OpenMP: place_num 37 place_num_procs 2
OpenMP: place_proc_ids 37 77
OpenMP: place_num 38 place_num_procs 2
OpenMP: place_proc_ids 38 78
OpenMP: place_num 39 place_num_procs 2
OpenMP: place_proc_ids 39 79
print master
OpenMP: get_num_threads 2
OpenMP: get_max_threads 4
OpenMP: get_thread_num 0
OpenMP: get_place_num 0
OpenMP: get_place_num 0
OpenMP: place_num 0 place_num_procs 2
OpenMP: place_proc_ids 0 40
OpenMP: place_num 1 place_num_procs 2
OpenMP: place_proc_ids 1 41
OpenMP: place_num 2 place_num_procs 2
OpenMP: place_proc_ids 2 42
OpenMP: place_num 3 place_num_procs 2
OpenMP: place_proc_ids 3 43
OpenMP: place_num 4 place_num_procs 2
OpenMP: place_proc_ids 4 44
OpenMP: place_num 5 place_num_procs 2
OpenMP: place_proc_ids 5 45
OpenMP: place_num 6 place_num_procs 2
OpenMP: place_proc_ids 6 46
OpenMP: place_num 7 place_num_procs 2
OpenMP: place_proc_ids 7 47
OpenMP: place_num 8 place_num_procs 2
OpenMP: place_proc_ids 8 48
OpenMP: place_num 9 place_num_procs 2
OpenMP: place_proc_ids 9 49
OpenMP: place_num 10 place_num_procs 2
OpenMP: place_proc_ids 10 50
OpenMP: place_num 11 place_num_procs 2
OpenMP: place_proc_ids 11 51
OpenMP: place_num 12 place_num_procs 2
OpenMP: place_proc_ids 12 52
OpenMP: place_num 13 place_num_procs 2
OpenMP: place_proc_ids 13 53
OpenMP: place_num 14 place_num_procs 2
OpenMP: place_proc_ids 14 54
OpenMP: place_num 15 place_num_procs 2
OpenMP: place_proc_ids 15 55
OpenMP: place_num 16 place_num_procs 2
OpenMP: place_proc_ids 16 56
OpenMP: place_num 17 place_num_procs 2
OpenMP: place_proc_ids 17 57
OpenMP: place_num 18 place_num_procs 2
OpenMP: place_proc_ids 18 58
OpenMP: place_num 19 place_num_procs 2
OpenMP: place_proc_ids 19 59
OpenMP: place_num 20 place_num_procs 2
OpenMP: place_proc_ids 20 60
OpenMP: place_num 21 place_num_procs 2
OpenMP: place_proc_ids 21 61
OpenMP: place_num 22 place_num_procs 2
OpenMP: place_proc_ids 22 62
OpenMP: place_num 23 place_num_procs 2
OpenMP: place_proc_ids 23 63
OpenMP: place_num 24 place_num_procs 2
OpenMP: place_proc_ids 24 64
OpenMP: place_num 25 place_num_procs 2
OpenMP: place_proc_ids 25 65
OpenMP: place_num 26 place_num_procs 2
OpenMP: place_proc_ids 26 66
OpenMP: place_num 27 place_num_procs 2
OpenMP: place_proc_ids 27 67
OpenMP: place_num 28 place_num_procs 2
OpenMP: place_proc_ids 28 68
OpenMP: place_num 29 place_num_procs 2
OpenMP: place_proc_ids 29 69
OpenMP: place_num 30 place_num_procs 2
OpenMP: place_proc_ids 30 70
OpenMP: place_num 31 place_num_procs 2
OpenMP: place_proc_ids 31 71
OpenMP: place_num 32 place_num_procs 2
OpenMP: place_proc_ids 32 72
OpenMP: place_num 33 place_num_procs 2
OpenMP: place_proc_ids 33 73
OpenMP: place_num 34 place_num_procs 2
OpenMP: place_proc_ids 34 74
OpenMP: place_num 35 place_num_procs 2
OpenMP: place_proc_ids 35 75
OpenMP: place_num 36 place_num_procs 2
OpenMP: place_proc_ids 36 76
OpenMP: place_num 37 place_num_procs 2
OpenMP: place_proc_ids 37 77
OpenMP: place_num 38 place_num_procs 2
OpenMP: place_proc_ids 38 78
OpenMP: place_num 39 place_num_procs 2
OpenMP: place_proc_ids 39 79
print slave
OpenMP: get_num_threads 2
OpenMP: get_max_threads 4
OpenMP: get_thread_num 1
OpenMP: get_place_num 20
OpenMP: get_place_num 20
OpenMP: place_num 0 place_num_procs 2
OpenMP: place_proc_ids 0 40
OpenMP: place_num 1 place_num_procs 2
OpenMP: place_proc_ids 1 41
OpenMP: place_num 2 place_num_procs 2
OpenMP: place_proc_ids 2 42
OpenMP: place_num 3 place_num_procs 2
OpenMP: place_proc_ids 3 43
OpenMP: place_num 4 place_num_procs 2
OpenMP: place_proc_ids 4 44
OpenMP: place_num 5 place_num_procs 2
OpenMP: place_proc_ids 5 45
OpenMP: place_num 6 place_num_procs 2
OpenMP: place_proc_ids 6 46
OpenMP: place_num 7 place_num_procs 2
OpenMP: place_proc_ids 7 47
OpenMP: place_num 8 place_num_procs 2
OpenMP: place_proc_ids 8 48
OpenMP: place_num 9 place_num_procs 2
OpenMP: place_proc_ids 9 49
OpenMP: place_num 10 place_num_procs 2
OpenMP: place_proc_ids 10 50
OpenMP: place_num 11 place_num_procs 2
OpenMP: place_proc_ids 11 51
OpenMP: place_num 12 place_num_procs 2
OpenMP: place_proc_ids 12 52
OpenMP: place_num 13 place_num_procs 2
OpenMP: place_proc_ids 13 53
OpenMP: place_num 14 place_num_procs 2
OpenMP: place_proc_ids 14 54
OpenMP: place_num 15 place_num_procs 2
OpenMP: place_proc_ids 15 55
OpenMP: place_num 16 place_num_procs 2
OpenMP: place_proc_ids 16 56
OpenMP: place_num 17 place_num_procs 2
OpenMP: place_proc_ids 17 57
OpenMP: place_num 18 place_num_procs 2
OpenMP: place_proc_ids 18 58
OpenMP: place_num 19 place_num_procs 2
OpenMP: place_proc_ids 19 59
OpenMP: place_num 20 place_num_procs 2
OpenMP: place_proc_ids 20 60
OpenMP: place_num 21 place_num_procs 2
OpenMP: place_proc_ids 21 61
OpenMP: place_num 22 place_num_procs 2
OpenMP: place_proc_ids 22 62
OpenMP: place_num 23 place_num_procs 2
OpenMP: place_proc_ids 23 63
OpenMP: place_num 24 place_num_procs 2
OpenMP: place_proc_ids 24 64
OpenMP: place_num 25 place_num_procs 2
OpenMP: place_proc_ids 25 65
OpenMP: place_num 26 place_num_procs 2
OpenMP: place_proc_ids 26 66
OpenMP: place_num 27 place_num_procs 2
OpenMP: place_proc_ids 27 67
OpenMP: place_num 28 place_num_procs 2
OpenMP: place_proc_ids 28 68
OpenMP: place_num 29 place_num_procs 2
OpenMP: place_proc_ids 29 69
OpenMP: place_num 30 place_num_procs 2
OpenMP: place_proc_ids 30 70
OpenMP: place_num 31 place_num_procs 2
OpenMP: place_proc_ids 31 71
OpenMP: place_num 32 place_num_procs 2
OpenMP: place_proc_ids 32 72
OpenMP: place_num 33 place_num_procs 2
OpenMP: place_proc_ids 33 73
OpenMP: place_num 34 place_num_procs 2
OpenMP: place_proc_ids 34 74
OpenMP: place_num 35 place_num_procs 2
OpenMP: place_proc_ids 35 75
OpenMP: place_num 36 place_num_procs 2
OpenMP: place_proc_ids 36 76
OpenMP: place_num 37 place_num_procs 2
OpenMP: place_proc_ids 37 77
OpenMP: place_num 38 place_num_procs 2
OpenMP: place_proc_ids 38 78
OpenMP: place_num 39 place_num_procs 2
OpenMP: place_proc_ids 39 79
print master in nested region
OpenMP: get_num_threads 2
OpenMP: get_max_threads 4
OpenMP: get_thread_num 0
OpenMP: get_place_num 20
OpenMP: get_place_num 20
OpenMP: place_num 0 place_num_procs 2
OpenMP: place_proc_ids 0 40
OpenMP: place_num 1 place_num_procs 2
OpenMP: place_proc_ids 1 41
OpenMP: place_num 2 place_num_procs 2
OpenMP: place_proc_ids 2 42
OpenMP: place_num 3 place_num_procs 2
OpenMP: place_proc_ids 3 43
OpenMP: place_num 4 place_num_procs 2
OpenMP: place_proc_ids 4 44
OpenMP: place_num 5 place_num_procs 2
OpenMP: place_proc_ids 5 45
OpenMP: place_num 6 place_num_procs 2
OpenMP: place_proc_ids 6 46
OpenMP: place_num 7 place_num_procs 2
OpenMP: place_proc_ids 7 47
OpenMP: place_num 8 place_num_procs 2
OpenMP: place_proc_ids 8 48
OpenMP: place_num 9 place_num_procs 2
OpenMP: place_proc_ids 9 49
OpenMP: place_num 10 place_num_procs 2
OpenMP: place_proc_ids 10 50
OpenMP: place_num 11 place_num_procs 2
OpenMP: place_proc_ids 11 51
OpenMP: place_num 12 place_num_procs 2
OpenMP: place_proc_ids 12 52
OpenMP: place_num 13 place_num_procs 2
OpenMP: place_proc_ids 13 53
OpenMP: place_num 14 place_num_procs 2
OpenMP: place_proc_ids 14 54
OpenMP: place_num 15 place_num_procs 2
OpenMP: place_proc_ids 15 55
OpenMP: place_num 16 place_num_procs 2
OpenMP: place_proc_ids 16 56
OpenMP: place_num 17 place_num_procs 2
OpenMP: place_proc_ids 17 57
OpenMP: place_num 18 place_num_procs 2
OpenMP: place_proc_ids 18 58
OpenMP: place_num 19 place_num_procs 2
OpenMP: place_proc_ids 19 59
OpenMP: place_num 20 place_num_procs 2
OpenMP: place_proc_ids 20 60
OpenMP: place_num 21 place_num_procs 2
OpenMP: place_proc_ids 21 61
OpenMP: place_num 22 place_num_procs 2
OpenMP: place_proc_ids 22 62
OpenMP: place_num 23 place_num_procs 2
OpenMP: place_proc_ids 23 63
OpenMP: place_num 24 place_num_procs 2
OpenMP: place_proc_ids 24 64
OpenMP: place_num 25 place_num_procs 2
OpenMP: place_proc_ids 25 65
OpenMP: place_num 26 place_num_procs 2
OpenMP: place_proc_ids 26 66
OpenMP: place_num 27 place_num_procs 2
OpenMP: place_proc_ids 27 67
OpenMP: place_num 28 place_num_procs 2
OpenMP: place_proc_ids 28 68
OpenMP: place_num 29 place_num_procs 2
OpenMP: place_proc_ids 29 69
OpenMP: place_num 30 place_num_procs 2
OpenMP: place_proc_ids 30 70
OpenMP: place_num 31 place_num_procs 2
OpenMP: place_proc_ids 31 71
OpenMP: place_num 32 place_num_procs 2
OpenMP: place_proc_ids 32 72
OpenMP: place_num 33 place_num_procs 2
OpenMP: place_proc_ids 33 73
OpenMP: place_num 34 place_num_procs 2
OpenMP: place_proc_ids 34 74
OpenMP: place_num 35 place_num_procs 2
OpenMP: place_proc_ids 35 75
OpenMP: place_num 36 place_num_procs 2
OpenMP: place_proc_ids 36 76
OpenMP: place_num 37 place_num_procs 2
OpenMP: place_proc_ids 37 77
OpenMP: place_num 38 place_num_procs 2
OpenMP: place_proc_ids 38 78
OpenMP: place_num 39 place_num_procs 2
OpenMP: place_proc_ids 39 79
print slave in nested region
OpenMP: get_num_threads 2
OpenMP: get_max_threads 4
OpenMP: get_thread_num 1
OpenMP: get_place_num 21
OpenMP: get_place_num 21
OpenMP: place_num 0 place_num_procs 2
OpenMP: place_proc_ids 0 40
OpenMP: place_num 1 place_num_procs 2
OpenMP: place_proc_ids 1 41
OpenMP: place_num 2 place_num_procs 2
OpenMP: place_proc_ids 2 42
OpenMP: place_num 3 place_num_procs 2
OpenMP: place_proc_ids 3 43
OpenMP: place_num 4 place_num_procs 2
OpenMP: place_proc_ids 4 44
OpenMP: place_num 5 place_num_procs 2
OpenMP: place_proc_ids 5 45
OpenMP: place_num 6 place_num_procs 2
OpenMP: place_proc_ids 6 46
OpenMP: place_num 7 place_num_procs 2
OpenMP: place_proc_ids 7 47
OpenMP: place_num 8 place_num_procs 2
OpenMP: place_proc_ids 8 48
OpenMP: place_num 9 place_num_procs 2
OpenMP: place_proc_ids 9 49
OpenMP: place_num 10 place_num_procs 2
OpenMP: place_proc_ids 10 50
OpenMP: place_num 11 place_num_procs 2
OpenMP: place_proc_ids 11 51
OpenMP: place_num 12 place_num_procs 2
OpenMP: place_proc_ids 12 52
OpenMP: place_num 13 place_num_procs 2
OpenMP: place_proc_ids 13 53
OpenMP: place_num 14 place_num_procs 2
OpenMP: place_proc_ids 14 54
OpenMP: place_num 15 place_num_procs 2
OpenMP: place_proc_ids 15 55
OpenMP: place_num 16 place_num_procs 2
OpenMP: place_proc_ids 16 56
OpenMP: place_num 17 place_num_procs 2
OpenMP: place_proc_ids 17 57
OpenMP: place_num 18 place_num_procs 2
OpenMP: place_proc_ids 18 58
OpenMP: place_num 19 place_num_procs 2
OpenMP: place_proc_ids 19 59
OpenMP: place_num 20 place_num_procs 2
OpenMP: place_proc_ids 20 60
OpenMP: place_num 21 place_num_procs 2
OpenMP: place_proc_ids 21 61
OpenMP: place_num 22 place_num_procs 2
OpenMP: place_proc_ids 22 62
OpenMP: place_num 23 place_num_procs 2
OpenMP: place_proc_ids 23 63
OpenMP: place_num 24 place_num_procs 2
OpenMP: place_proc_ids 24 64
OpenMP: place_num 25 place_num_procs 2
OpenMP: place_proc_ids 25 65
OpenMP: place_num 26 place_num_procs 2
OpenMP: place_proc_ids 26 66
OpenMP: place_num 27 place_num_procs 2
OpenMP: place_proc_ids 27 67
OpenMP: place_num 28 place_num_procs 2
OpenMP: place_proc_ids 28 68
OpenMP: place_num 29 place_num_procs 2
OpenMP: place_proc_ids 29 69
OpenMP: place_num 30 place_num_procs 2
OpenMP: place_proc_ids 30 70
OpenMP: place_num 31 place_num_procs 2
OpenMP: place_proc_ids 31 71
OpenMP: place_num 32 place_num_procs 2
OpenMP: place_proc_ids 32 72
OpenMP: place_num 33 place_num_procs 2
OpenMP: place_proc_ids 33 73
OpenMP: place_num 34 place_num_procs 2
OpenMP: place_proc_ids 34 74
OpenMP: place_num 35 place_num_procs 2
OpenMP: place_proc_ids 35 75
OpenMP: place_num 36 place_num_procs 2
OpenMP: place_proc_ids 36 76
OpenMP: place_num 37 place_num_procs 2
OpenMP: place_proc_ids 37 77
OpenMP: place_num 38 place_num_procs 2
OpenMP: place_proc_ids 38 78
OpenMP: place_num 39 place_num_procs 2
OpenMP: place_proc_ids 39 79
So you see the thread 1 in thread 1's parallel region, is pinned to place 21.
Now if you change the number of threads of the initial parallel region to the number of NUMA-domains you should get for each subsequent parallel region threads inside the same NUMA domain.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Great! Thank you very much.
I wasn't aware of the OpenMP clause proc_bind(...), I thought it was only available as an environment variable. I should have RTFM.
This may need a little bit of work/thought on a system with a mix of performance cores and efficiency cores.
Thanks again
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
well RTFM for OpenMP is sometimes too much:)
Please let me know when you figure out how to handle hybrid architectures!
Best
Tobias

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page