Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Question about MKL Warning in Documentation

Dishaw__Jim
Beginner
492 Views
In the MKL documentation on page 11-42 (8.0.2, pg 11-43 in 8.1) that states
1. It is not recommended to simultaneously parallelize your program and employ the Intel MKL internal threading because this will slow down performance. Note that in case 3 above, DFT computation is automatically initiated in a sngle threading mode.
Does this warning only apply when using DFT or does it apply to the MKL in general? The context of the warning seems to be limited to how one uses DFT. If one is using the BLAS and LAPACK routines is there any similiar restriction?
0 Kudos
3 Replies
TimP
Honored Contributor III
492 Views
I suppose this refers to threading outside OpenMP in such a way as to use all available cores/CPUs, then setting OMP_NUM_THREADS so as to induce MKL BLAS to initiate additional threads. As Intel OpenMP doesn't support nested parallelism, and the total number of threads is controlled by OMP_NUM_THREADS, one would think this warning doesn't apply to an Intel OpenMP threaded application.
0 Kudos
Dishaw__Jim
Beginner
492 Views
Would it be safe to conclude that if one was using the MKL BLAS and LAPACK routines in a program where one is also using the OpenMP Fortran compiler directives that there would be no conflict?
0 Kudos
Intel_C_Intel
Employee
492 Views

This is an interesting and important issue. The problem has nothing specific to do with the MKL FFT but is a general issue. It doesn't even have anything specific to do with MKL, but rather is an OpenMP issue. Here's the deal.

When you compile SW with OMP directives or pragmas the compiler you use adds quite a bit of code relating to threading and puts into the modified code a lot of function calls that are supported by a runtime library. In the case of MKL, that RTL is libguide, the threading library used by the Intel compilers. In addition to the function calls, the RTL creates buffers and accounting information, such as how many threads it is managing and can provide such information as whether you are in a threaded region or not.

If you compile your program with, say, the PGI compiler and thread, it will do all the things I mentioned in the previous paragraph and will need its RTL to support threading.

Where does the problem arise? If you call MKL froma threaded region, MKL cannot know that it is in a threaded region and will thread according to the number of threads specified in the environment variable SET_OMP_NUM_THREADS, leading to an over subscription of threads. In addition, which RTL will supply the OpenMP function calls both in MKL and in your program?

The over subsciption of resources can have an enormous impact on performance. We have seen a case where performance was reduced by over a factor of 10 with creating 4 threads/thread on a 4 processor system.

--Bruce

0 Kudos
Reply