Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
6621 Discussions

User-created threads and MKL internal threads


Hi all,

Suppose I create an openmp region with say, 2 threads.  And somewhere in that region I have a call to MKL, say DGEMM.  Now, is it possible to force this DGEMM  call to use exactly my 2 threads ?  ( Note:  I want DGEMM to use more than 1 thread but I don't want it to create threads of its own).  Are there directives/settings to do this ? 

My suspect that I can't but would be quite happy if I could.

If not, can TBB do this ?  If so,  How much effort is it to switch from using openmp to TBB ?



0 Kudos
2 Replies

So, Paresh - you want to set the number of threads once and use those threads for everything? Rather than setting the number of threads before each ?gemm call - like:



Hi Pamela, Yes I want to set the number of threads once.  The full code is quite messy so below is an abstraction of what I have now.  I can't remove the "omp single" below -- if I do that then every thread seems to do the same sgemm one after the other and that is wasteful.  Ideally, I would like sgemm to use my number_of_threads and not create its own.  I thought I tried using omp_set_nested( false ) but i no longer remember much about that effort.  I will try that again but meanwhile I post this ...



number_of_threads = omp_get_max_threads() / 2;   // HALF THE MAX NUMBER OF THREADS POSSIBLE

mkl_set_num_threads( number_of_threads );           // USE THE SAME NUMBER INSIDE SGEMM below

#pragma omp parallel num_threads( number_of_threads )


     (.. parallel region ... )

     begin loop over pieces of data until "done"

             #pragma omp single


                             sgemm( multiply the pieces of data ); // MKL will create NEW number_of_threads internally.  So, while sgemm is running

                                                                                    // omp_get_max_threads() = 2 * number_of_threads  will exist on the machine

            } // end single ( wait till my (number_of_threads) threads arrive here )

           ( .. continue parallel region with number_of_threads ... )

     end loop over pieces of data (all threads will loop as we will flush "done" or "not done" )

} // end parallel region