Suppose I create an openmp region with say, 2 threads. And somewhere in that region I have a call to MKL, say DGEMM. Now, is it possible to force this DGEMM call to use exactly my 2 threads ? ( Note: I want DGEMM to use more than 1 thread but I don't want it to create threads of its own). Are there directives/settings to do this ?
My suspect that I can't but would be quite happy if I could.
If not, can TBB do this ? If so, How much effort is it to switch from using openmp to TBB ?
So, Paresh - you want to set the number of threads once and use those threads for everything? Rather than setting the number of threads before each ?gemm call - like: https://software.intel.com/en-us/mkl-linux-developer-guide-changing-the-number-of-openmp-threads-at-...?
Hi Pamela, Yes I want to set the number of threads once. The full code is quite messy so below is an abstraction of what I have now. I can't remove the "omp single" below -- if I do that then every thread seems to do the same sgemm one after the other and that is wasteful. Ideally, I would like sgemm to use my number_of_threads and not create its own. I thought I tried using omp_set_nested( false ) but i no longer remember much about that effort. I will try that again but meanwhile I post this ...
number_of_threads = omp_get_max_threads() / 2; // HALF THE MAX NUMBER OF THREADS POSSIBLE
mkl_set_num_threads( number_of_threads ); // USE THE SAME NUMBER INSIDE SGEMM below
#pragma omp parallel num_threads( number_of_threads )
(.. parallel region ... )
begin loop over pieces of data until "done"
#pragma omp single
sgemm( multiply the pieces of data ); // MKL will create NEW number_of_threads internally. So, while sgemm is running
// omp_get_max_threads() = 2 * number_of_threads will exist on the machine
} // end single ( wait till my (number_of_threads) threads arrive here )
( .. continue parallel region with number_of_threads ... )
end loop over pieces of data (all threads will loop as we will flush "done" or "not done" )
} // end parallel region