Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

Behaviour of omp_set_num_threads/omp_get_max_threads

AndrewC
New Contributor III
1,287 Views
Assuming OMP_NUM_THREADS=4 In my main thread omp_get_max_threads() returns 4 as expected I call omp_set_num_threads(1); omp_get_max_threads() now returns 1 as expected. If I launch a separate "worker thread" and call omp_get_max_threads() inside that thread , I get 4. This seems unexpected.
0 Kudos
6 Replies
TimP
Honored Contributor III
1,287 Views

If you omp_set_num_threads in a parallel region, the change would not appear outside that region until the parallel ends.  You would need to give more detail about your expectation.

0 Kudos
AndrewC
New Contributor III
1,287 Views
You are misunderstanding what I am doing (This is on Windows) Here is a skeleton of my code. Assume OMP_NUM_THREADS=4 is set in the environment, 4 core machine. DWORD ThreadProc(LPVOID lpThreadParameter) { int n = omp_get_max_threads() ;<---this returns 4. Not what I would expect. // do some omp stuff.... } main() { int n = omp_get_max_threads() ;<---this returns 4 omp_set_num_threads(1); int n = omp_get_max_threads() ;<---this returns 1... as expected CreateThread(NULL,0,ThreadProc,....); <--- start a new thread. } I found that I have had to modify my code to call omp_set_num_threads(1) again inside the ThreadProc()
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,287 Views

If I were to guess, OpenMP uses thread local storage for "num_threads". Each thread can set this to whatever they want. The thread initialization code apparently obtains the initial copy from the environment (just as the main thread does/would). Also, I assume, the thread local storage contains a one-shot flag to indicate the omp_... routines are to initialize the omp portion of the TLS for use by OpenMP.

What you need to do then is

[cpp]
main()
{
   int myMaxThreads = 1; // whatever
   #pragma omp parallel
{
    omp_set_num_threads(myMaxThreads);
    omp_set_max_threads(myMaxThreads);
}
...

[/cpp]


 

0 Kudos
AndrewC
New Contributor III
1,287 Views

Yup, that's what I figured. Still it's a trap that's easy enough to fall into.

 

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,287 Views

Another "trap" for the newbie is omp_get_thread_num() returns the team member number for the current parallel region (not necessarily a globally unique number). IIF you use nested parallelism each new team starts out with  omp_get_thread_num() == 0, then 1, 2, ... for each additional team member. IOW, depending on context running, you may have several (many) threads having the same omp_get_thread_num()'s.

Jim Dempsey

0 Kudos
Andrey_C_Intel1
Employee
1,287 Views

I would add a couple of comments on the issue.

According to the OpenMP specification the behavior of an OpenMP program is controlled by internal control variables (ICVs). Many of them including nthreads-var are task-specific (or in other words there is one copy per data environment). Changing the value in one task does not affect other OpenMP tasks.

Next, the CreateThread routine has nothing with OpenMP, so the OpenMP task initialized in newly created thread has no idea of the origin of the thread - who when and where created this thread. Thus initial implicit task is initialized using default set of ICVs, that was not affected by the call to omp_set_num_threads in one of existed tasks. As opposed to this, when parallel region is encountered, new threads are created by OpenMP implementation, and their ICVs are inherited from the parent task according to rules described in the section "2.3.4.1 How the Per-Data Environment ICVs Work" of OpenMP 4.0 specification.  That is where the difference between user-created threads and OpenMP implementation-created threads comes from.

To summarize - the behavior of the Intel compiler is perfectly legal in this case.

Regards,
Andrey

0 Kudos
Reply