I had a problem with nested parallelism using OpenMP and MKl fft.
The part of my code that uses MKL calls within omp parallel region looks like this:
#pragma omp parallel
#pragma omp single
//call of the fft
The problem is that when m > 1 fft call is working with only 1 thread regardless of value of n.
I will be grateful for any help.
How about Set the MKL_DYNAMIC environment variable to FALSE or call
mkl_set_dynamic(0) to use the suggested number of OpenMP threads whenever the algorithms permit and regardless of OpenMP overhead and data locality.
Basically, the Intel MKL-specific threading controls take precedence over their OpenMP equivalents. But when MKL was in nested OpenMP parallel region , we recommend to use 1 mkl thread for better performance. and that is reason why MKL run in sequential model by default.
But you can control them as MKL developer guide mentioned: https://software.intel.com/en-us/node/528550
If your application uses OpenMP* threading, you may need to provide additional settings:
- Set the environment variable OMP_NESTED=TRUE, or alternatively call omp_set_nested(1), to enable OpenMP nested parallelism.
- Set the environment variable MKL_DYNAMIC=FALSE, or alternatively call mkl_set_dynamic(0), to prevent Intel MKL from dynamically reducing the number of OpenMP threads in nested parallel regions.
Some nested parallel tips in MKL user guide for your reference:
To avoid simultaneous activities of multiple threading RTLs, link the program against the Intel MKL threading library that matches the compiler you use (see Linking Examples on how to do this). If this is not possible, use Intel MKL in the sequential mode. To do this, you should link with the appropriate threading library: libmkl_sequential.a or libmkl_sequential.so (see Appendix C: Directory Structure in Detail).