User-created threads and MKL internal threads

Murthy__Paresh · ‎05-04-2019

Hi all,

Suppose I create an openmp region with say, 2 threads. And somewhere in that region I have a call to MKL, say DGEMM. Now, is it possible to force this DGEMM call to use exactly my 2 threads ? ( Note: I want DGEMM to use more than 1 thread but I don't want it to create threads of its own). Are there directives/settings to do this ?

My suspect that I can't but would be quite happy if I could.

If not, can TBB do this ? If so, How much effort is it to switch from using openmp to TBB ?

Thanks

Paresh

Pamela_H_Intel · ‎05-15-2019

So, Paresh - you want to set the number of threads once and use those threads for everything? Rather than setting the number of threads before each ?gemm call - like: https://software.intel.com/en-us/mkl-linux-developer-guide-changing-the-number-of-openmp-threads-at-run-time?

Pamela

Murthy__Paresh · ‎05-16-2019

Hi Pamela, Yes I want to set the number of threads once. The full code is quite messy so below is an abstraction of what I have now. I can't remove the "omp single" below -- if I do that then every thread seems to do the same sgemm one after the other and that is wasteful. Ideally, I would like sgemm to use my number_of_threads and not create its own. I thought I tried using omp_set_nested( false ) but i no longer remember much about that effort. I will try that again but meanwhile I post this ...

number_of_threads = omp_get_max_threads() / 2; // HALF THE MAX NUMBER OF THREADS POSSIBLE

mkl_set_num_threads( number_of_threads ); // USE THE SAME NUMBER INSIDE SGEMM below

#pragma omp parallel num_threads( number_of_threads )

{

(.. parallel region ... )

begin loop over pieces of data until "done"

#pragma omp single

{

sgemm( multiply the pieces of data ); // MKL will create NEW number_of_threads internally. So, while sgemm is running

// omp_get_max_threads() = 2 * number_of_threads will exist on the machine

} // end single ( wait till my (number_of_threads) threads arrive here )

( .. continue parallel region with number_of_threads ... )

end loop over pieces of data (all threads will loop as we will flush "done" or "not done" )

} // end parallel region

--------------------------