Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
6998 Discussions

BUG Using gelss with mkl_dynamic on does not always correctly restore threads on return

JonasDeGreef
Beginner
1,663 Views

After investigating a regression in performance on our side after updating MKL from version:
2022.0-Product Build 20211112
to verison:
2023.1-Product Build 2023030
we discovered the following regression:

When running ~gelss in an omp region with mkl_dynamic turned on, mkl is expected to dynamically modify the threading to fill the machine's physical capabilities. For example, running on a 24 core machine using an omp loop of 12 threads should let mkl_dynamic switch to mkl_get_max_threads=2. However, for certain matrix sizes, ~gelss exits with mkl_get_max_threads still set to 2. Due to mkl_dynamic being on, all subsequent calls will stay stuck running with 2, causing an immense performance loss.

My personal guess is that in the new MKL version a new path was added in gelss that returns early, skipping the expected mkl_dynamic restoring of the 2 threads back to 24. This bug does not reproduce for all matrix sizes, also indicating that this is a specific decision path in gelss that causes the issue, likely a performance improvement that only triggers for specific matrices.

I created and attached a reproducing case that illustrates the issue with both a failing matrix size and a succeeding matrix size. In case it is relevant, the seed size was 2 on my machine to reproduce the exact matrices, but from my tests it is mainly the size that seemed relevant.

0 Kudos
12 Replies
JonasDeGreef
Beginner
1,622 Views

For those struggling with the same bug, as the mkl_set_num_threads is unresponsive and uncapable of restoring the bugged mkl_dynamic to a higher value, the following "repair" after calling mkl's ~gelss allows your program to continue with proper mkl multithreading still:

 

! Issue
!$omp parallel
call zgelss(...)
!$omp end parallel

! Bandaid fix
!$omp parallel
dummy = mkl_set_num_threads_local(0)
!$omp end parallel
call mkl_set_num_threads(X)

 

As per the documentation of mkl_set_num_threads_local, calling it with a value 0 resets the omp thread's mkl threading settings. That seems to be sufficient to reenable the use of mkl_set_num_threads again, restoring the threads to a higher amount X for the rest of the program.

Edit: Clarified code to underline that the issue occurs with a parallel region containing ~gelss and not ~gelss on its own.

0 Kudos
VarshaS_Intel
Moderator
1,574 Views

Hi,

 

Thanks for posting in Intel Communities.

 

When we tried compiling the code provided by you, we faced some linking issues(linking.txt) and were unable to run the code. Could you please help us in resolving that error to reproduce similar behavior at our end?

 

Thanks & Regards,

Varsha

 

 

0 Kudos
JonasDeGreef
Beginner
1,566 Views

I set my environ by running:

"C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64 vs2019 --config="config.txt"

with a config.txt containing:

intelpython=exclude
compiler=2023.1.0
mkl=2023.1.0
mpi=2021.9.0

 This triggers the following on my machine:

:: NOTICE: Exclude flag found for "intelpython" component.
   The "intelpython" env\vars.bat script will not be processed by "setvars.bat".
:: initializing oneAPI environment...
   Initializing Visual Studio command-line environment...
   Visual Studio version 16.11.7 environment configured.
   "c:\apps\MVS16117\"
   Visual Studio command-line environment initialized for: 'x64'
:  advisor -- latest
:  compiler -- 2023.1.0
:  inspector -- latest
:  mkl -- 2023.1.0
:  mpi -- 2021.9.0
:  vtune -- latest
:: oneAPI environment initialized ::

I'm assuming you are missing the visual studio component.

In that cmd, after setting the environ, the bat-file reproduces on my side.

0 Kudos
VarshaS_Intel
Moderator
1,498 Views

Hi,

 

Thanks for your reply.

 

We are working on your issue internally. We will get back to you soon.

 

Thanks & Regards,

Varsha

 

0 Kudos
VarshaS_Intel
Moderator
1,469 Views

Hi,


Could you let us know to machine details as well as the results you are getting for mkl_get_max_threads() and omp_get_max_threads() after calling the proofOfConceptPR8690852() to understand more from our end?


Thanks & Regards,

Varsha



0 Kudos
JonasDeGreef
Beginner
1,463 Views

I can reproduce this on multiple machines. In fact, I have yet to find a machine that does not reproduce.

  • One is a Windows JonasDeGreef_0-1695741896661.png
  • Two other machines are 12-core VM that I am not allowed to share specific specs for, but one runs Linux and one runs Windows.

As for the behavior of proofOfConceptPR8690852, that depends on how you use it, as illustrated in

  • fails.f90
    • This matrix size of 10 fails the mkl_get_max_threads().eq.omp_get_max_threads()-check
    • mkl_get_max_threads() is 1 and omp_get_max_threads() is 24 on the above Windows
  • worksfine.f90
    • This matrix size of 200 succeeds the mkl_get_max_threads().eq.omp_get_max_threads()-check
    • mkl_get_max_threads() is 24 and omp_get_max_threads() is 24 on the above Windows

Maybe it only reproduces under a combination of matrix size and number of threads? Maybe only under a specific instruction set (AVX/...)?

0 Kudos
VarshaS_Intel
Moderator
1,066 Views

Hi,


Thanks for the reply.


Yes, the issue seems to be specific to dgelss (LAPACK?) till Intel MKL version 2023.1. Also, Dgemm routine doesn't trigger this behavior. 


Could you please try using the latest version of Intel MKL 2023.2 and let us know if you are observing the same behavior?


Thanks & Regards,

Varsha


0 Kudos
JonasDeGreef
Beginner
1,051 Views

This reproduces on MKL 2023.2, yes:

JonasDeGreef_0-1696411926017.png

 

I'm sorry, I'm having the impression you are still somehow not actively reproducing on your end and I am getting rather generic troubleshooting suggestions.

I have quickly asked a group of coworkers who all have different generations of machines to try to recompile and reproduce, and they all, each and every one, managed to reproduce. This puts us on 100% of machines that reproduce this issue. It only takes the execution of two cmd lines to reproduce as well (one of which is simply calling the .bat I provided), so I really struggle to understand where the disconnect in our communication comes from.

0 Kudos
VarshaS_Intel
Moderator
1,050 Views

Hi,


Sorry for the inconvenience caused to you.


We are able to reproduce your issue. We are working on your issue internally and we will get back to you soon with an update.


Thanks & Regards,

Varsha


0 Kudos
VarshaS_Intel
Moderator
934 Views

Hi,


Thanks for your patience and Apologies for the delay in the response.


As discussed internally, we regret to say that we were unable to reproduce this issue it is only occurring in the Intel MKL version 2023.1, but it was resolved in the latest version the Intel MKL 2023.2. 


Could you please let us know if you have any other queries?


Thanks & Regards,

Varsha


0 Kudos
JonasDeGreef
Beginner
924 Views

Thank you for the investigation. We will look at the next releases to fix the issue and we'll remove our workaround when it does.

 

Kind Regards,

Jonas

0 Kudos
VarshaS_Intel
Moderator
880 Views

Hi,


Thanks for your reply.


Sure, in case you run into any other issues please feel free to create a new thread.


Have a Good Day!


Thanks & Regards,

Varsha


0 Kudos
Reply