- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I installed Intel's Python distribution on my i9 7980XE running Windows 10 because I was curious to see how it performed compared to Python 3.7 with pip-installed numpy, particularly with dot products.
I first uninstalled Python 3.7 and then installed Intel's Python. I then ran the script
import numpy as np
A = np.random.rand(30000,30000)
B = np.dot(A, A)
and found that it runs far slower than under Python 3.7. This seems to be the result of Intel's Python not multi-threading the dot product because ASUS CAM shows my CPU only reaching about 6% utilization, whereas when run under Python 3.7, my CPU runs at 100% during the entire dot product calculation.
How do I get Intel's Python to multi-thread the dot product?
Thanks
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you please provide the output of conda list --explicit to make sure that the Intel Python environment has not been inadvertently altered.
Also, please inspect np.__config__.show() to make sure that NumPy has been configured to use MKL.
If so, please set MKL_VERBOSE=1 environment variable, execute your script, and provide the output. The output will show MKL version, as well as the number of threads used, function executed, and dimensions.
Thank you,
Oleksandr
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
E:\>c:\IntelPython3\Scripts\conda.exe list --explicit
# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: win-64
@EXPLICIT
https://conda.anaconda.org/intel/win-64/asn1crypto-0.24.0-py36_intel_1.tar.bz2
https://conda.anaconda.org/intel/win-64/backcall-0.1.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/backports-1.0-py36_intel_6.tar.bz2
https://conda.anaconda.org/intel/win-64/bleach-2.1.3-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/bzip2-1.0.6-vc14_intel_14.tar.bz2
https://conda.anaconda.org/intel/win-64/certifi-2018.1.18-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/cffi-1.11.5-py36_intel_1.tar.bz2
https://conda.anaconda.org/intel/win-64/chardet-3.0.4-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/colorama-0.3.9-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/conda-4.3.31-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/conda-env-2.6.0-0.tar.bz2
https://conda.anaconda.org/intel/win-64/cryptography-2.2.2-py36_intel_1.tar.bz2
https://conda.anaconda.org/intel/win-64/cycler-0.10.0-py36_intel_5.tar.bz2
https://conda.anaconda.org/intel/win-64/cython-0.28.2-py36_intel_4.tar.bz2
https://conda.anaconda.org/intel/win-64/daal-2018.0.3.20180405-0.tar.bz2
https://conda.anaconda.org/intel/win-64/decorator-4.3.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/entrypoints-0.2.3-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/freetype-2.9-vc14_intel_1.tar.bz2
https://conda.anaconda.org/intel/win-64/get_terminal_size-1.0.0-py36_intel_5.tar.bz2
https://conda.anaconda.org/intel/win-64/hdf5-1.10.1-vc14_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/html5lib-1.0.1-py36_intel_1.tar.bz2
https://conda.anaconda.org/intel/win-64/icc_rt-2018.0.3-intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/idna-2.6-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/impi_rt-2018.0.3-intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/intelpython-2018.0.3-0.tar.bz2
https://conda.anaconda.org/intel/win-64/ipykernel-4.6.1-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/ipyparallel-6.0.2-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/ipython-6.3.1-py36_intel_1.tar.bz2
https://conda.anaconda.org/intel/win-64/ipython_genutils-0.2.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/ipywidgets-7.0.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/jedi-0.12.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/jinja2-2.9.6-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/jsonschema-2.6.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/jupyter-1.0.0-py36_intel_5.tar.bz2
https://conda.anaconda.org/intel/win-64/jupyter_client-5.1.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/jupyter_console-5.1.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/jupyter_core-4.4.0-py36_intel_1.tar.bz2
https://conda.anaconda.org/intel/win-64/kiwisolver-1.0.1-py36_1.tar.bz2
https://conda.anaconda.org/intel/win-64/libpng-1.6.34-vc14_intel_1.tar.bz2
https://conda.anaconda.org/intel/win-64/llvmlite-0.23.0-py36_0.tar.bz2
https://conda.anaconda.org/intel/win-64/markupsafe-1.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/matplotlib-2.2.2-np114py36_intel_1.tar.bz2
https://conda.anaconda.org/intel/win-64/menuinst-1.4.1-py36_intel_4.tar.bz2
https://conda.anaconda.org/intel/win-64/mistune-0.7.4-py36_intel_1.tar.bz2
https://conda.anaconda.org/intel/win-64/mkl-2018.0.3-intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/mkl_fft-1.0.2-np114py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/mkl_random-1.0.1-np114py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/mpi4py-3.0.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/mpmath-1.0.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/nbconvert-5.2.1-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/nbformat-4.4.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/nose-1.3.7-py36_intel_16.tar.bz2
https://conda.anaconda.org/intel/win-64/notebook-5.2.2-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/numba-0.38.0-np114py36_intel_3.tar.bz2
https://conda.anaconda.org/intel/win-64/numexpr-2.6.4-np114py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/numpy-1.14.3-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/openmp-2018.0.3-intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/openssl-1.0.2o-vc14_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/pandas-0.22.0-np114py36_intel_4.tar.bz2
https://conda.anaconda.org/intel/win-64/pandocfilters-1.4.1-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/parso-0.2.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/path.py-11.0.1-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/pickleshare-0.7.4-py36_intel_1.tar.bz2
https://conda.anaconda.org/intel/win-64/pip-9.0.3-py36_0.tar.bz2
https://conda.anaconda.org/intel/win-64/prompt_toolkit-1.0.15-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/pycosat-0.6.3-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/pycparser-2.18-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/pydaal-2018.0.3.20180405-np114py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/pygments-2.2.0-py36_intel_1.tar.bz2
https://conda.anaconda.org/intel/win-64/pyopenssl-17.5.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/pyparsing-2.2.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/pysocks-1.6.7-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/pytables-3.4.2-np114py36_intel_4.tar.bz2
https://conda.anaconda.org/intel/win-64/python-3.6.3-intel_12.tar.bz2
https://conda.anaconda.org/intel/win-64/python-dateutil-2.6.0-py36_intel_3.tar.bz2
https://conda.anaconda.org/intel/win-64/pytz-2018.4-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/pyyaml-3.12-py36_intel_3.tar.bz2
https://conda.anaconda.org/intel/win-64/pyzmq-16.0.2-py36_intel_4.tar.bz2
https://conda.anaconda.org/intel/win-64/requests-2.18.4-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/ruamel_yaml-0.11.14-py36_intel_3.tar.bz2
https://conda.anaconda.org/intel/win-64/scikit-learn-0.19.1-np114py36_intel_29.tar.bz2
https://conda.anaconda.org/intel/win-64/scipy-1.0.1-np114py36_intel_3.tar.bz2
https://conda.anaconda.org/intel/win-64/setuptools-39.0.1-py36_0.tar.bz2
https://conda.anaconda.org/intel/win-64/simplegeneric-0.8.1-py36_intel_5.tar.bz2
https://conda.anaconda.org/intel/win-64/six-1.11.0-py36_2.tar.bz2
https://conda.anaconda.org/intel/win-64/sqlite-3.23.1-vc14_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/sympy-1.1.1-py36_intel_3.tar.bz2
https://conda.anaconda.org/intel/win-64/tbb-2018.0.4-vc14_0.tar.bz2
https://conda.anaconda.org/intel/win-64/tbb4py-2018.0.4-py36_vc14_0.tar.bz2
https://conda.anaconda.org/intel/win-64/tcl-8.6.4-vc14_intel_19.tar.bz2
https://conda.anaconda.org/intel/win-64/testpath-0.3.1-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/tk-8.6.4-vc14_intel_26.tar.bz2
https://conda.anaconda.org/intel/win-64/tornado-4.5.2-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/traitlets-4.3.2-py36_intel_1.tar.bz2
https://conda.anaconda.org/intel/win-64/urllib3-1.22-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/vc-14.0-2.tar.bz2
https://conda.anaconda.org/intel/win-64/vs2015_runtime-14.0.25420-intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/wcwidth-0.1.7-py36_intel_5.tar.bz2
https://conda.anaconda.org/intel/win-64/webencodings-0.5.1-py36_0.tar.bz2
https://conda.anaconda.org/intel/win-64/wheel-0.31.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/widgetsnbextension-3.2.0-py36_0.tar.bz2
https://conda.anaconda.org/intel/win-64/win_inet_pton-1.0.1-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/win_unicode_console-0.5-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/wincertstore-0.2-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/xz-5.2.3-vc14_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/zlib-1.2.11-vc14_intel_3.tar.bz2
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>> np.__config__.show()
mkl_info:
libraries = ['mkl_rt']
library_dirs = ['.\\Library\\lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['.\\Library\\include']
blas_mkl_info:
libraries = ['mkl_rt']
library_dirs = ['.\\Library\\lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['.\\Library\\include']
blas_opt_info:
libraries = ['mkl_rt']
library_dirs = ['.\\Library\\lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['.\\Library\\include']
lapack_mkl_info:
libraries = ['mkl_rt']
library_dirs = ['.\\Library\\lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['.\\Library\\include']
lapack_opt_info:
libraries = ['mkl_rt']
library_dirs = ['.\\Library\\lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['.\\Library\\include']
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I inserted
import os
os.environ["MKL_VERBOSE"] = "1"
at the beginning of my script but don't get any feedback when I run it. So it seems MKL is not being used.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok, thank you for these response. Here is what I get:
(idp3) C:\Users\me\devel>ipython Python 3.6.2 |Intel Corporation| (default, Aug 15 2017, 11:34:02) [MSC v.1900 64 bit (AMD64)] Type 'copyright', 'credits' or 'license' for more information IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: import os In [2]: os.environ['MKL_VERBOSE']="1" In [3]: import numpy as np In [4]: x = np.random.randn(1000) In [5]: np.dot(x,x) MKL_VERBOSE Intel(R) MKL 2018.0 Product build 20170720 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Win 2.60GHz cdecl intel_thread NMICDev:0 MKL_VERBOSE DDOT(1000,000001D258374D20,1,000001D258374D20,1) 1.93ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:2 WDiv:HOST:+0.000 Out[5]: 928.24284740935025 In [6]:
Could you please try the following, to verify that MKL does not fail to load:
# coding: utf-8 import ctypes mkl = ctypes.cdll.LoadLibrary('mkl_rt.dll') # See https://software.intel.com/en-us/mkl-developer-reference-c-mkl-get-version-string fn = mkl.mkl_get_version_string fn.argtypes = (ctypes.POINTER(ctypes.c_char), ctypes.c_int) p = ctypes.create_string_buffer(198) fn(p, ctypes.sizeof(p)) # Output version string print(p.value.decode('ascii'))
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That code prints
Intel(R) Math Kernel Library Version 2018.0.3 Product Build 20180406 for Intel(R) 64 architecture applications
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for trying that and sorry for the trouble. The execution seem to confirm that MKL library can be loaded just fine, but for some reason fails to be called from numpy.dot.
I concocted a small script that calls DDOT manually, and then through numpy.dot
import ctypes, os os.environ['MKL_VERBOSE']='1' mkl = ctypes.cdll.LoadLibrary('mkl_rt.dll') import numpy as np x = np.random.randn(100) ddot = mkl.cblas_ddot ddot.argtypes = (ctypes.c_int, ctypes.c_void_p, ctypes.c_int, ctypes.c_void_p, ctypes.c_int) ddot.restype = ctypes.c_double print("DDOT result = {}".format(ddot(x.shape[0], x.ctypes.data_as(ctypes.POINTER(ctypes.c_double)), 1, x.ctypes.data_as(ctypes.POINTER(ctypes.c_double)), 1))) print("np.dot result = {}".format(np.dot(x,x)))
When I execute it on my dated Win-64 installation, I see the following output:
MKL_VERBOSE Intel(R) MKL 2018.0 Product build 20170720 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Win 2.60GHz lp64 intel_thread NMICDev:0 MKL_VERBOSE DDOT(100,00000175A0048180,1,00000175A0048180,1) 3.32ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:2 WDiv:HOST:+0.000 DDOT result = 90.116370539625 MKL_VERBOSE DDOT(100,00000175A0048180,1,00000175A0048180,1) 3.76us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:2 WDiv:HOST:+0.000 np.dot result = 90.116370539625
The exact value of the result can vary from run to run as I neglected to fix the seed.
The output of this experiment should confirm that numpy.dot fails to use BLAS and falls back on generic sequential loops if the second MKL_VERBOSE is not echoed to the terminal.
Please run that script.
I would suggest for you to create an alternative environment with minimal IDP installation as follows:
conda create -n idp_core -c intel intelpython3_core ipython
Then activate that environment and check whether problems persists.
Thank you,
Oleksandr
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I get this output:
MKL_VERBOSE Intel(R) MKL 2018.0 Update 3 Product build 20180406 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) enabled processors, Win 2.60GHz lp64 intel_thread
MKL_VERBOSE DDOT(100,000001524BB5B7A0,1,000001524BB5B7A0,1) 710.75us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:18
DDOT result = 96.64738197234854
MKL_VERBOSE DDOT(100,000001524BB5B7A0,1,000001524BB5B7A0,1) 686ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:18
np.dot result = 96.64738197234854
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've set up the core environment as you suggested, but I still only get single-threaded performance for my original matrix multiplication.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We established that numpy.dot actualy does call into MKL BLAS function.
Could you then please append your benchmark to the end of the script:
import ctypes, os os.environ['MKL_VERBOSE']='1' mkl = ctypes.cdll.LoadLibrary('mkl_rt.dll') import numpy as np x = np.random.randn(100) ddot = mkl.cblas_ddot ddot.argtypes = (ctypes.c_int, ctypes.c_void_p, ctypes.c_int, ctypes.c_void_p, ctypes.c_int) ddot.restype = ctypes.c_double print("DDOT result = {}".format(ddot(x.shape[0], x.ctypes.data_as(ctypes.POINTER(ctypes.c_double)), 1, x.ctypes.data_as(ctypes.POINTER(ctypes.c_double)), 1))) print("np.dot result = {}".format(np.dot(x,x))) n = 30*1000 A = np.random.randn(n, n) import timeit t0 = timeit.default_timer() B = np.dot(A, A) t1 = timeit.default_timer() print("Time {}".format(t1-t0))
You should be able to see all threads being used.
Sampling of 9*10**8 Gaussian variates may take longish time with numpy.random. You can try replacing np.random.randn sampling with numpy.random_intel.randn (present in IDP), in which case faster MKL sampling functions will be used.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a new problem now: As soon as the matrix A is done populating, my PC crashes. No BSOD, just a chirp from my motherboard and then power off.
My CPU is overclocked (all cores at 4.2GHz), but it's been stable for months. That doesn't mean it will be stable on everything, but I've never seen an unstable overclock not produce a BSOD in Windows 10 with Skylake.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've ruled out an unstable overclock: With all BIOS settings restored to defaults, I get exactly the same black screen of death when the script begins the dot product.
Running out of RAM is not a problem: I have 128GB.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank for trying it out and reporting the subsequent issue.
I think it is worthwhile to check if setting os.environ['MKL_ENABLE_ARCHITECTURE'] = 'AVX2' atop the script helps with the crash. This will instruct MKL to use AVX2 instruction set instead of AVX512.
It's also useful to try setting os.environ['MKL_THREADING_LAYER'] = 'TBB' , and see it that helps with the crash. This will replace use of Intel OpenMP with use of Intel TBB.
If it doesn't, I'd suggest to start with a smaller matrix size, let's say, 1k by 1k, and gradually increase the size to identify the size triggering the crash, so that this issue could not taken to the MKL engineering team.
Thank you,
Oleksandr
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried all three of your suggestions at the same time (os.environ['MKL_ENABLE_ARCHITECTURE'] = 'AVX2' , os.environ['MKL_THREADING_LAYER'] = 'TBB' , and a 1000x1000 matrix) and it still crashes my PC.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is my bad, I should have checked better. My suggestion should have been
# https://software.intel.com/en-us/mkl-linux-developer-guide-instruction-set-specific-dispatching-on-intel-architectures os.environ['MKL_ENABLE_INSTRUCTIONS'] = 'AVX2'
Regardless of whether this helps, please try lowering matrix size by binary search until the machine stops crashing. It would help MKL developers identify the problem behind the issues you are experiencing.
Thank you,
Oleksandr
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, I've already uninstalled Intel's Python Distribution. I can't afford to play Russian roulette with my hard drives any more.
I can tell you that my system has not had any problems with AVX512 instructions in other software.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for taking the time. I will try to reproduce the issue in house,
Oleksandr

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page