Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
6427 Discussions

MKL library scans available cores, disregards existing cpu affinity

EddyF
Beginner
265 Views

Hi, this is about the same issue as this earlier thread: https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-library-scans-available-cores-to...

 

Since the earlier thread never had any resolution, I'm reopening a new thread (as suggested in that thread).

 

Here is what I'm running to reproduce the problem:

conda create -n mkl -c intel mkl-service
conda activate mkl
taskset -c 2,4,5 strace -e trace=sched_setaffinity python -c 'import mkl; mkl.get_num_stripes()'

Output:

sched_setaffinity(0, 8, [2])            = 0
sched_setaffinity(0, 8, [4])            = 0
sched_setaffinity(0, 8, [5])            = 0
sched_setaffinity(0, 8, [2, 4, 5])      = 0
sched_setaffinity(0, 8, [2, 4, 5])      = 0
sched_setaffinity(0, 8, [2, 4, 5])      = 0
sched_setaffinity(0, 8, [0])            = 0
sched_setaffinity(0, 8, [1])            = 0
sched_setaffinity(0, 8, [2])            = 0
sched_setaffinity(0, 8, [2, 4, 5])      = 0
+++ exited with 0 +++

As we can see there are two phases happening here:

  1. First it scans through cpus 2, 4, and 5
  2. Later, it scans through cpus 0, 1, and 2

It seems that if the initial affinity set contains N cpus, then the second phase above will always scan through cpus 0 through N-1, regardless of which cpus were actually in the affinity set. This seems like a very strange and patently buggy behavior?

Using gdb, I was able to figure out that all of these sched_setaffinity calls are happening inside of a function called mkl_serv_get_num_stripes. Furthermore, the "first phase" (where we scan through the correct cpus) is happening inside of a sub-call to omp_get_num_procs ; the "second phase" (which is buggy) happens inside of mkl_serv_get_num_stripes itself.

What can be done to fix this? 

0 Kudos
6 Replies
RahulV_intel
Moderator
205 Views

Hi,


Thanks for posting on the MKL forum. I could reproduce your output with oneAPI 2021.4, but I would further need to check with the team internally whether MKL is causing this behavior.


Regards,

Rahul



Ruqiu_C_Intel
Employee
170 Views

Hi ,

As the early thread mentioned that the issue might be python itself or openMP rather than MKL.

You can test without MKL involved, the issue still exist. For more details, please refer to the early thread: https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-library-scans-available-cores-to...


Thanks,

Ruqiu



EddyF
Beginner
161 Views

Hi, thanks for the response! Unfortunately I'm quite sure that the issue is coming from inside of MKL. I have a simpler reproducible example than the earlier thread, which makes it more clear that MKL is the problem. 

This is how I create my python environment and install MKL:

conda create -n mkl -c intel mkl-service
conda activate mkl

As you can see, I'm only installing the mkl-service package (along with its dependencies), from the Intel conda channel to make sure it's the latest official version.

If I run the following then the issue appears:

Input:

taskset -c 2,4,5 strace -e trace=sched_setaffinity python -c 'import mkl; mkl.get_num_stripes()'

Output:

sched_setaffinity(0, 8, [2])            = 0
sched_setaffinity(0, 8, [4])            = 0
sched_setaffinity(0, 8, [5])            = 0
sched_setaffinity(0, 8, [2, 4, 5])      = 0
sched_setaffinity(0, 8, [2, 4, 5])      = 0
sched_setaffinity(0, 8, [2, 4, 5])      = 0
sched_setaffinity(0, 8, [0])            = 0
sched_setaffinity(0, 8, [1])            = 0
sched_setaffinity(0, 8, [2])            = 0
sched_setaffinity(0, 8, [2, 4, 5])      = 0
+++ exited with 0 +++

However, if I skip calling mkl.get_num_stripes(), then the issue does not appear:

Input:

taskset -c 2,4,5 strace -e trace=sched_setaffinity python -c 'import mkl'

Output:

+++ exited with 0 +++

This shows that the sched_setaffinity calls are happening inside of mkl.get_num_stripes().

As I was saying at the start of this thread, I have already used gdb to investigate further, and I figured out exactly where sched_setaffinity was being called from:

  1. The first few calls (scanning through 2, 4, 5) happen inside of on OpenMP function called omp_get_num_procs which is called from an MKL function called mkl_serv_get_num_stripes
  2. The remaining calls (scanning through 0, 1, 2, which is erroneous) happens directly inside of mkl_serv_get_num_stripes

I have also spent some time stepping through the execution of mkl_serv_get_num_stripes in gdb (one assembly instruction at a time) and I could see exactly where it was triggering the erroneous sched_setaffinity syscalls. Are you familiar with this function? I guess it is an internal MKL function and its source code is not publicly available. It would be great if someone within Intel who has access to the source code and knows how it is built could have a closer look and confirm if these observations make sense.

Ruqiu_C_Intel
Employee
116 Views

Hi Eddy,

Thanks for the information! We are investigating it internally and will let you know once there is any update.


Ruqiu_C_Intel
Employee
46 Views

Hi,


Have you tried to set MKL_DYNAMIC=false? Its default value is Ture, oneMKL can adjust number of threads to get the best performance. Switching off MKL_DYNAMIC will let user set whatever he wants. For more details, please check the MKL document here:

https://www.intel.com/content/www/us/en/develop/documentation/onemkl-windows-developer-guide/top/man...


Regards,

Ruqiu​


EddyF
Beginner
36 Views

I just tried it but it doesn't seem to affect anything and the end result is the same. Using the same minimal reproducible example from my earlier posts:

Input:

MKL_DYNAMIC=false taskset -c 2,4,5 strace -e trace=sched_setaffinity python -c 'import mkl; mkl.get_num_stripes()'

Output:

sched_setaffinity(0, 8, [2])            = 0
sched_setaffinity(0, 8, [4])            = 0
sched_setaffinity(0, 8, [5])            = 0
sched_setaffinity(0, 8, [2, 4, 5])      = 0
sched_setaffinity(0, 8, [2, 4, 5])      = 0
sched_setaffinity(0, 8, [2, 4, 5])      = 0
sched_setaffinity(0, 8, [0])            = 0
sched_setaffinity(0, 8, [1])            = 0
sched_setaffinity(0, 8, [2])            = 0
sched_setaffinity(0, 8, [2, 4, 5])      = 0
+++ exited with 0 +++

I also tried MKL_DYNAMIC=FALSE and the result is exactly the same. @Ruqiu_C_Intel is it unexpected that this didn't work? Does it work for you? Perhaps there are some additional environment variables that I need to set? I have tried many combinations and have not yet been able to find anything that works for skipping the faulty sched_setaffinity logic.

Reply