Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Bug omp_get_place_num_procs

jimdempseyatthecove
Honored Contributor III
655 Views

**** EDIT 2 **** Ignore this message. my bad, mistook : for range (-) instead of count

EDIT: Using Windows 10 Pro, Parallel Studio 2019 Update 4, MS VS 2019

On a KNL 7210 the following program generates a runtime error:

 

 

program bug_omp_get_place_num_procs
    USE lapack95, ONLY: HEEVR
    USE f95_precision, ONLY: WP => SP
    USE omp_lib
	use kernel32
	use mkl_service
    implicit none

	
    call SETENVQQ("OMP_NUM_THREADS=4")
!KNL        call SETENVQQ("OMP_PLACES={0:12},{13,15,17},{100:112},{113:145}")
!i72600K    call SETENVQQ("OMP_PLACES={0},{1,2,3},{6,7},{5}")
    call SETENVQQ("OMP_PLACES={0:12},{13,15,17},{100:112},{113:145}")
    call SETENVQQ("OMP_PPROC_BIND=")
    call SETENVQQ("KMP_AFFINITY=")
    call SETENVQQ("KMP_HW_SUBSET=")
	print *,"kmp_get_affinity_max_proc()",kmp_get_affinity_max_proc()
    !$omp parallel
	!$omp critical
	print *, omp_get_num_threads(), omp_get_thread_num(), OMP_GET_PLACE_NUM_PROCS(omp_get_thread_num())
	print *
	!$omp end critical
    !$omp end parallel
    print *,"done"
end program bug_omp_get_place_num_procs

OMP: Warning #123: Ignoring invalid OS proc ID 256.
 kmp_get_affinity_max_proc()         256
           4           0          12

           4           1           3
OMP: Error #114: kmp_set_affinity: invalid mask.

OMP: Error #114: kmp_set_affinity: invalid mask.

 

 

On Core i72600K no such error

 

 

program bug_omp_get_place_num_procs
    USE lapack95, ONLY: HEEVR
    USE f95_precision, ONLY: WP => SP
    USE omp_lib
	use kernel32
	use mkl_service
    implicit none

	
    call SETENVQQ("OMP_NUM_THREADS=4")
!KNL        call SETENVQQ("OMP_PLACES={0:12},{13,15,17},{100:112},{113:145}")
!i72600K    call SETENVQQ("OMP_PLACES={0},{1,2,3},{6,7},{5}")
    call SETENVQQ("OMP_PLACES={0},{1,2,3},{6,7},{5}")
    call SETENVQQ("OMP_PPROC_BIND=")
    call SETENVQQ("KMP_AFFINITY=")
    call SETENVQQ("KMP_HW_SUBSET=")
	print *,"kmp_get_affinity_max_proc()",kmp_get_affinity_max_proc()
    !$omp parallel
	!$omp critical
	print *, omp_get_num_threads(), omp_get_thread_num(), OMP_GET_PLACE_NUM_PROCS(omp_get_thread_num())
	print *
	!$omp end critical
    !$omp end parallel
    print *,"done"
end program bug_omp_get_place_num_procs
 kmp_get_affinity_max_proc()           8
           4           0           1

           4           1           3

           4           2           2

           4           3           1

 done

 

 

Could you check into this?

Jim Dempsey

 

0 Kudos
4 Replies
jimdempseyatthecove
Honored Contributor III
652 Views

EDIT:

Further testing... (removing call to call to omp_get_place_num_procs)

The issue is not due to omp_get_num_procs, but rather OMP_PLACES when used on a system with more than one processor group (Windows), and (I assume) where a place resides outside the main thread's processor group.

Jim Dempsey

0 Kudos
Ron_Green
Moderator
618 Views

Hi jim,

 

I'm dusting off my KNL systems.  I can take a look tomorrow.  So do you think this is a bug on KNL?

0 Kudos
jimdempseyatthecove
Honored Contributor III
611 Views

Ronald,

There is no problem. I found my error. See EDIT on first post.

I do have a minor issue on my KNL using Windows 10.

I installed a retail copy of Windows 10 Pro on the KNL

I am experiencing what appears to be significant issues with the keyboard. USB wired, as well as USB Bluetooth. Both have significant delay to respond to key press, as well as about a 25% chance of double striking. It takes several attempts to get logged in. I've been unable to find solutions to this via Googling.

This behavior does not appear when I remote desktop into the system. Which is what I usually do as I have a 4K 28" monitor on the remote system.

If you get around to dusting off your KNL (and install Windows 10), see if you have similar issues. Or know of this issue. I did install the "current" drivers from the Supermicro site (K1SPE motherboard in that special Ninja offering that had years back).

I did not have this problem using Cent OS 7.2 on that same system.

Using Remote Desktop, I do experience some delays (eg. Build of Intel IVF project in MS VS 2019) will hang for 10-20 seconds. Other times it builds right away. I am aware that the CPU is a poky 1.2GHz/core but its pokeyness should be consistent.

Merry Christmas

Jim

0 Kudos
jimdempseyatthecove
Honored Contributor III
609 Views

Ron,

IMHO There is an issue with the Windows implementation of OpenMP on manycore system. This may be documented as a limitation, but this can be corrected, at least I think it can be.

Windows uses the concept of Processor Groups, each group has a limitation of up to 64 hardware thread. For example, the KNL system has 256 HW threads, and Windows creates four Processor Groups (64 threads each).

The problem I encountered appears when using the MKL threaded library from an OpenMP threaded application.

When using OMP_PLACES={...},{...},...

to specify the main program's OpenMP places (each main OpenMP thread is affinitied to one place). For example if the places have 8 threads each, say 16 places, then the 16 MKL instances, each will have the respective 8 threads as given to the main threads.

The problem is, that one cannot specify a place that spans a processor group.

OMP_PLACES={0:128},{128:128}

errors out (place spanning processor group)

While MKL is capable of using multiple processor groups (e.g. no places specified and one main thread), you (we) cannot use the OMP_PLACES to specify a larger area for MKL to use.

It would be permissible for the OpenMP (first level) to choose one of the processor groups the place spans, but leave the full place available for MKL to use.

I hope this makes sense to you.

Jim Dempsey

0 Kudos
Reply