- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm running the following with numpy/mkl 2019. Environment is
"KMP_AFFINITY=verbose" MKL_NUM_THREADS=1 MKL_DOMAIN_NUM_THREADS="MKL_BLAS=1" MKL_DYNAMIC=FALSE OMP_D
YNAMIC=FALSE OMP_NUM_THREADS=1 MKL_VERBOSE=1
strace -e trace=sched_setaffinity taskset -cp 2-3 $(which python) -c 'import numpy as np; a = np.random.normal(size=(1000,1000)); np.dot(a, a)'
sched_setaffinity(0, 16, {c, 0}) = 0
Numpy + Intel(R) MKL: THREADING LAYER: (null)
Numpy + Intel(R) MKL: setting Intel(R) MKL to use INTEL OpenMP runtime
Numpy + Intel(R) MKL: preloading libiomp5.so runtime
sched_setaffinity(0, 16, 0) = -1 EFAULT (Bad address)
OMP: Info #211: KMP_AFFINITY: decoding x2APIC ids.
sched_setaffinity(0, 16, {4, 0}) = 0
sched_setaffinity(0, 16, {8, 0}) = 0
sched_setaffinity(0, 16, {c, 0}) = 0
OMP: Info #209: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {2,3}
OMP: Info #156: KMP_AFFINITY: 2 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 2 cores/pkg x 1 threads/core (2 total cores)
OMP: Info #249: KMP_AFFINITY: pid 3521664 tid 3521664 thread 0 bound to OS proc set {2,3}
sched_setaffinity(0, 16, {c, 0}) = 0
sched_setaffinity(0, 16, {c, 0}) = 0
sched_setaffinity(0, 16, {1, 0}) = 0
sched_setaffinity(0, 16, {2, 0}) = 0
sched_setaffinity(0, 16, {c, 0}) = 0
MKL_VERBOSE Intel(R) MKL 2019.0 Update 5 Product build 20190808 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions (Intel(R) AVX) enabled processors, Lnx 3.00GHz lp64 intel_thread
MKL_VERBOSE DGEMM(N,N,1000,1000,1000,0x7ffd61ee7270,0x7faf55733010,1000,0x7faf55733010,1000,0x7ffd61ee7278,0x7faf54f91010,1000) 92.50ms CNR:OFF Dyn:0 FastMM:1 TID:0 NThr:1
+++ exited with 0 +++
The problem is that MKL does not seem to be respecting the original taskset (only cores 2 and 3). After declaring that it does, it also probes cores 0 and 1 (with masks 1 and 2, respectively).
Is this a big deal? I can imagine that in most circumstances it is not. But what if you had a realtime process on either core 0 or 1? Well, if the calling process had regular priority, it never gets scheduled, so it hangs.
Please advise.
Thank you,
Vlad
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for reporting your issue. We will try this at our end and get back to you with an update.
Regards,
Rahul
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Vlad,
The issue might be python itself or openMP. Inside MKL we don’t have any unique mechanism for threading, we just rely on openMP. MKL printed that it got only 1 thread (MKL_BLAS=1) so it would use only 1 thread. Also MKL verbose info is printed before going to the execution of the MKL kernels.
If you remove np.dot and keep everything else, you will see the sched_setaffinity(0, 16, [0]) = 0 and sched_setaffinity(0, 16, [1]) = 0 still exist. So we can confirm that the issue is not relate to MKL. You can try it in your site.
Here is my logs:
# strace -e trace=sched_setaffinity taskset -c 2-3 $(which python) -c "import numpy as np; a = np.random.normal(size=(1000,1000)); "
sched_setaffinity(0, 16, [2, 3]) = 0
mkl-service + Intel(R) MKL: THREADING LAYER: (null)
mkl-service + Intel(R) MKL: setting Intel(R) MKL to use INTEL OpenMP runtime
mkl-service + Intel(R) MKL: preloading libiomp5.so runtime
OMP: Info #155: KMP_AFFINITY: Initial OS proc set respected: 2,3
OMP: Info #216: KMP_AFFINITY: decoding x2APIC ids.
sched_setaffinity(0, 16, [2]) = 0
sched_setaffinity(0, 16, [3]) = 0
sched_setaffinity(0, 16, [2, 3]) = 0
OMP: Info #157: KMP_AFFINITY: 2 available OS procs
OMP: Info #158: KMP_AFFINITY: Uniform topology
OMP: Info #287: KMP_AFFINITY: topology layer "LL cache" is equivalent to "socket".
OMP: Info #287: KMP_AFFINITY: topology layer "L3 cache" is equivalent to "socket".
OMP: Info #287: KMP_AFFINITY: topology layer "L2 cache" is equivalent to "core".
OMP: Info #287: KMP_AFFINITY: topology layer "L1 cache" is equivalent to "core".
OMP: Info #192: KMP_AFFINITY: 1 socket x 2 cores/socket x 1 thread/core (2 total cores)
OMP: Info #218: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #172: KMP_AFFINITY: OS proc 2 maps to socket 0 core 2 thread 0
OMP: Info #172: KMP_AFFINITY: OS proc 3 maps to socket 0 core 3 thread 0
OMP: Info #254: KMP_AFFINITY: pid 76728 tid 76728 thread 0 bound to OS proc set 2,3
sched_setaffinity(0, 16, [2, 3]) = 0
sched_setaffinity(0, 16, [2, 3]) = 0
sched_setaffinity(0, 16, [0]) = 0
sched_setaffinity(0, 16, [1]) = 0
sched_setaffinity(0, 16, [2, 3]) = 0
MKL_VERBOSE Intel(R) MKL 2021.0 Update 2 Product build 20210312 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) with support of Intel(R) Deep Learning Boost (Intel(R) DL Boost), Lnx 2.10GHz lp64 intel_thread
MKL_VERBOSE SDOT(2,0x556a2f4621b0,1,0x556a2f4621b0,1) 1.86ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1
+++ exited with 0 +++
Thanks,
Ruqiu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Since we didn't hear back from you, we are closing this thread for now. If you require any additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page