Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29235 Discussions

Strange OpenMP behavior with OMP_SET_NUM_THREADS

Jon_D
New Contributor II
1,384 Views
I have a large program which I am trying to speed up with OpenMP constructs. When I do not specify the number of threads using the OMP_SET_NUM_THREADS library routine, 5 threads are created (according to Resource Monitor in Win 7) and I can see that CPU usage is around 100%. However, when I try to explicitly set the number of threads, I see that as many threads as I asked for are indeed created but the CPU usage hovers around 25% regardless of the number of threads. When I try to replicate the same behavior using a simple test code, I don't see this strange behavior. Has anybody run into this problem? Is this a likely compiler bug or do I need to set some environment variables to get the OMP_SET_NUM_THREADS routine work properly? Any help will be greatly appreciated. Thanks, Jon
0 Kudos
11 Replies
Jon_D
New Contributor II
1,384 Views

I forgot to mention in my original post. I am using IVF 2013.3.171 under Win 7 Enterprise.

Jon

0 Kudos
Steven_L_Intel1
Employee
1,384 Views

5 threads is an unsuual number - do you have 5 cores or processors on your system? By default it uses the number of "logical processors". What model of processor is in your system and how many? Maybe your program has too much overhead for more (I assume) threads?

0 Kudos
Jon_D
New Contributor II
1,384 Views

Intel Xeon E5606 with 4 cores. INtel Amplifier shows 4 threads (including the master) but REsource Monitor lists 5.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,384 Views

OpenMP may add a watchdog thread (depending on the version of OpenMP). Set break point inside of parallel region, then use debugger threads window. If one of the threads is inside a routine ...watchdog... then this is the case and you can ignore the 5-thread issue (as 4 are worker threads).

Jim Dempsey

0 Kudos
Roman1
New Contributor I
1,384 Views

Can you also try adding the following line to your code:

write(*,*)  "Number of threads = ",  OMP_GET_MAX_THREADS()

0 Kudos
Jon_D
New Contributor II
1,384 Views

I get

Number of threads = 4.

0 Kudos
Steven_L_Intel1
Employee
1,384 Views

So what are you specifying in the OMP_SET_NUM_THREADS call?

0 Kudos
Jon_D
New Contributor II
1,384 Views

When I use OMP_SET_NUM_THREADS(3) I get 75% CPU usage as expected. When I try something like

integer :: nt

nt = OMP_GET_MAX_THREADS()

CALL OMP_SET_NUM_THREADS(nt-1)

then CPU usage goes down to 25%. If I set it to a number larger than the number of cores I have (say 12) I still get only 25% CPU usage.

0 Kudos
Roman1
New Contributor I
1,384 Views

I'm a bit confused.  Is this what you are seeing?

CALL OMP_SET_NUM_THREADS(4)   ! results in 100% CPU usage

CALL OMP_SET_NUM_THREADS(3 )  ! results in 75% CPU usage

CALL OMP_SET_NUM_THREADS(12 )  ! results in 25% CPU usage

Just before the code enters the parallel region, can you put the write statement I suggested earlier.  This is to make sure the the number of threads running is what you expect.

write(*,*) "Number of threads = ", OMP_GET_MAX_THREADS()

Roman

0 Kudos
Steven_L_Intel1
Employee
1,384 Views

Since you have four cores, you are "oversubscribing" which will make execution less efficient.

0 Kudos
Jon_D
New Contributor II
1,384 Views

Roman, thanks for suggesting to print the number of threads again. It allowed me to figure out what was happening and solve the problem. I have the paralellization in a module compiled as a static library. The calls to set the maximum number of threads are made in a subroutine.  So everytime I call the subroutine I was setting the number of threads to the maximum number of threads less 1. So in 4 calls to the subroutine I was going from 4 threads to 1 thread.I simply needed to set the number of threads once in the main program.

Thanks for the help.

Jon

0 Kudos
Reply