Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7942 Discussions

OpenMP thread limit OMP_THREAD_LIMIT

aurora_s
Beginner
2,792 Views
Hi,

If I set OMP_THREAD_LIMIT=8, in an application with one level of nested parallelism, then I receive this message:

OMP: Warning #96: Cannot form a team with 8 threads, using 1 instead.
OMP: Hint: Consider unsetting KMP_ALL_THREADS and OMP_THREAD_LIMIT (if either is set).

So there is not nested parallelism. The thing is: it cant form a team of 8 threads, but why it doesnt use 4 or 5 (the free ones) for example? Meanwhile there are threads waiting to entry in a critical zone (in the superior level), without doing anything.

Thanks in advance!
icc 11.1.070
0 Kudos
3 Replies
jimdempseyatthecove
Honored Contributor III
2,792 Views
From: http://publib.boulder.ibm.com/infocenter/comphelp/v101v121/index.jsp?topic=/com.ibm.xlcpp101.aix.doc/compiler_ref/omp_thread_limit.html

Use OMP_THREAD_LIMIT to set the thread-limit-var internal control variable. thread-limit-var is used to indicate the number of OpenMP threads to be used for the whole program. The function omp_get_thread_limit can be used to retrieve this value at run time. The value for OMP_THREAD_LIMIT is a positive integer. If a value is chosen that is more than the number of threads that can be supported or is not a positive integer, the runtime will set a default value for thread-limit-var of OMP_NUM_THREADS or the number of available processors, whichever is greater. Note: if thread-limit-var is set, the default value of the nthreads-var internal control variable is equal to thread-limit-var or the number of available processors, whichever is less.

Therefore you may need to set OMP_NUM_THREADS to oversubscribe the number of threads.
Also, (not seeing your program) the error message (#96) seems to imply you are attempting to nest parallel regions with nested disabled.

Jim Dempsey

0 Kudos
aurora_s
Beginner
2,792 Views
Hi,
Thanks for your reply.
Nested parallelism its enabled, because without OMP_THREAD_LIMIT set, the program behaves correctly and I've set omp_set_nested(true).

I want to limit the maximum number of thread that is being used. For example, if I have 4 physical cores, and I only want to use 2 setting it with omp_set_num_threads(), without OMP_THREAD_LIMIT and in a nested parallel region, openmp eventually could use 4 (2 at the top level that could create another two in a nested parallel region)

.
With OMP_THREAD_LIMIT, the problem its that for example, with a limit of three (and omp_set_num_threads(3)), if at the top level I have two thread blocked waiting to entry in a critical statment and the other one tries to create a parallel region, it will only take one threads and will show the #96 error.
0 Kudos
jimdempseyatthecove
Honored Contributor III
2,792 Views
>>With OMP_THREAD_LIMIT, the problem its that for example, with a limit of three (and omp_set_num_threads(3)), if at the top level I have two thread blocked waiting to entry in a critical statment and the other one tries to create a parallel region, it will only take one threads and will show the #96 error

With two threads waiting at critical section or working elsewhere and third thread creating a parallel region you've already maxed out your thread limit of 3 (set by OMP_THREAD_LIMIT). Therefore only one thread will be used by the new parallel region.

If you want only 3 hardware threads to be used by 3 software threads in parallel region level-0 and have one of those team members (software thread) create a nested parallel region with 3 threads (itself plus two additional threads) then consider:

at beginning of process (before start of OpenMP) affinitize the process to restrict it to 3 hardware threads
.or.
depending on O/S attribute the executable to restrict it to 3 logical processors

Then set OMP_THREAD_LIMIT=5 or OMP_NUM_THREADS=5 .and. in your first (level-0) parallel region limit that to 3 threads, same for the next level. Note, you are oversubscribed here. When the two threads blocked at the critical section are released they with compete (context switch) with the additional two threads running in the nested region. Also, critical sections will tend to have a short run of spinlock before thread suspension. The additional two threads may run timesliced with those in spinlock at critical section (i.e. you may have inefficiencies). Is this what you want?

An alternate route is to use the OpenMP 3.0 and later task construct

create a parallel region with 3 threads
create the three "level-0" tasks
task-0 that eventually reaches critical section
task-1 that eventually reaches critical section
task-2 that eventually spawns two additional tasks and participates in 3-way method
(orspawns three tasks and does not (directly) participate in 3-way method)
end parallel region

The thread running task-0, when complete will be available to run task generated by task-2 (assuming task not already run)
The thread running task-1, when complete will be available to run task generated by task-2 (assuming task not already run)
The thread running task-2, when complete will be available to run task generated by task-2 (assuming task not already run)

Jim Dempsey
0 Kudos
Reply