Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Strange OpenMP behavior with OMP_SET_NUM_THREADS

Jon_D
新貢獻者 II
1,393 檢視
I have a large program which I am trying to speed up with OpenMP constructs. When I do not specify the number of threads using the OMP_SET_NUM_THREADS library routine, 5 threads are created (according to Resource Monitor in Win 7) and I can see that CPU usage is around 100%. However, when I try to explicitly set the number of threads, I see that as many threads as I asked for are indeed created but the CPU usage hovers around 25% regardless of the number of threads. When I try to replicate the same behavior using a simple test code, I don't see this strange behavior. Has anybody run into this problem? Is this a likely compiler bug or do I need to set some environment variables to get the OMP_SET_NUM_THREADS routine work properly? Any help will be greatly appreciated. Thanks, Jon
0 積分
11 回應
Jon_D
新貢獻者 II
1,393 檢視

I forgot to mention in my original post. I am using IVF 2013.3.171 under Win 7 Enterprise.

Jon

Steven_L_Intel1
1,393 檢視

5 threads is an unsuual number - do you have 5 cores or processors on your system? By default it uses the number of "logical processors". What model of processor is in your system and how many? Maybe your program has too much overhead for more (I assume) threads?

Jon_D
新貢獻者 II
1,393 檢視

Intel Xeon E5606 with 4 cores. INtel Amplifier shows 4 threads (including the master) but REsource Monitor lists 5.

jimdempseyatthecove
榮譽貢獻者 III
1,393 檢視

OpenMP may add a watchdog thread (depending on the version of OpenMP). Set break point inside of parallel region, then use debugger threads window. If one of the threads is inside a routine ...watchdog... then this is the case and you can ignore the 5-thread issue (as 4 are worker threads).

Jim Dempsey

Roman1
新貢獻者 I
1,393 檢視

Can you also try adding the following line to your code:

write(*,*)  "Number of threads = ",  OMP_GET_MAX_THREADS()

Jon_D
新貢獻者 II
1,393 檢視

I get

Number of threads = 4.

Steven_L_Intel1
1,393 檢視

So what are you specifying in the OMP_SET_NUM_THREADS call?

Jon_D
新貢獻者 II
1,393 檢視

When I use OMP_SET_NUM_THREADS(3) I get 75% CPU usage as expected. When I try something like

integer :: nt

nt = OMP_GET_MAX_THREADS()

CALL OMP_SET_NUM_THREADS(nt-1)

then CPU usage goes down to 25%. If I set it to a number larger than the number of cores I have (say 12) I still get only 25% CPU usage.

Roman1
新貢獻者 I
1,393 檢視

I'm a bit confused.  Is this what you are seeing?

CALL OMP_SET_NUM_THREADS(4)   ! results in 100% CPU usage

CALL OMP_SET_NUM_THREADS(3 )  ! results in 75% CPU usage

CALL OMP_SET_NUM_THREADS(12 )  ! results in 25% CPU usage

Just before the code enters the parallel region, can you put the write statement I suggested earlier.  This is to make sure the the number of threads running is what you expect.

write(*,*) "Number of threads = ", OMP_GET_MAX_THREADS()

Roman

Steven_L_Intel1
1,393 檢視

Since you have four cores, you are "oversubscribing" which will make execution less efficient.

Jon_D
新貢獻者 II
1,393 檢視

Roman, thanks for suggesting to print the number of threads again. It allowed me to figure out what was happening and solve the problem. I have the paralellization in a module compiled as a static library. The calls to set the maximum number of threads are made in a subroutine.  So everytime I call the subroutine I was setting the number of threads to the maximum number of threads less 1. So in 4 calls to the subroutine I was going from 4 threads to 1 thread.I simply needed to set the number of threads once in the main program.

Thanks for the help.

Jon

回覆