I have a program where at one point I call a function.
Inside this function I have OpenMP parallelization AND I have MKL/BLAS calls. I know that the BLAS are never called inside an OpenMP parallel block.
I would like to specify the same number of threads be used in both parallel blocks in this function. So I call
The first call goes through with no problem. But the second crashes. This only happens in the Release version of the code. In the Debug version there is no crash.
Can anybody tell me whether there is a problem with my calls above and what I should be doing differently?
A quick update: in order to move forward I removed the omp_set_num_threads call. To my surprise, the code still crashes. I'm printing the value of the single argument to the console and the value being passed is indeed 2.
Thanks for the quick response. Could you send me a link to the documentation so I can get the proper context for your suggestion?
This is the sequence of my parallel calls:
(1) Serial code -> (2) Begin OpenMP Parallel code -> (3) Run parallel BLAS -> (4) End OpenMP Parallel code(2) -> (5) Begin OpenMP Parallel code -> (NO BLAS) -> End OpenMP Parallel code -> (6) Run parallel BLAS -> (7) Serial code
I am given a number of threads to use (call it N). Step (2) has a "pragma omp for" loop with num_threads(m). I know that the threaded MKL BLAS on my machine loses steam rather quickly, so I limit the BLAS to never more than 4 threads. Anything else that I can do I use the variable "m" above. If the BLAS are using 2 threads, then "m" is set such that N = 2*m.
Step 5 is in a separate library I have no control over. But I'm told that the BLAS are never called inside an OpenMP parallel code. So my thought was to set both omp_set_num_threads AND mkl_set_num_threads_local. It crashed. I thought the problem was setting both so I remove the omp_set_num_threads, but the problem persists. Step (5) is NOT called within a parallel OpenMP block.
So, if I understand you correctly, I should be calling mkl_set_num_threads_local right before step (4). Is this correct?
I did some reading here:
I guess I understand your concern with resetting the number of threads. But it appears to me that the worst case scenario is that I end up using a different number of threads than expected.
However, my problem is that the call to mkl_set_num_threads actually leads to a crash in the code. I did print the argument, and the value is indeed 2. So I think I'm setting a proper value.
Is there any way to find out what my be causing the crash in this function?
PS: To make matters worse, the entire code is run in MPI. Fortunately, the problem occurs even if I run the executable without using mpiexec - i.e., the MPI size is 1.