Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Geary__Robert
Beginner
1,356 Views

numpy.dot not multi-threading

I installed Intel's Python distribution on my i9 7980XE running Windows 10 because I was curious to see how it performed compared to Python 3.7 with pip-installed numpy, particularly with dot products.

I first uninstalled Python 3.7 and then installed Intel's Python.  I then ran the script

import numpy as np
A = np.random.rand(30000,30000)
B = np.dot(A, A)

and found that it runs far slower than under Python 3.7.  This seems to be the result of Intel's Python not multi-threading the dot product because ASUS CAM shows my CPU only reaching about 6% utilization, whereas when run under Python 3.7, my CPU runs at 100% during the entire dot product calculation.

How do I get Intel's Python to multi-thread the dot product?

Thanks

0 Kudos
17 Replies
1,356 Views

Hi, 

Could you please provide the output of conda list --explicit to make sure that the Intel Python environment has not been inadvertently altered.

Also, please inspect np.__config__.show() to make sure that NumPy has been configured to use MKL. 

If so, please set MKL_VERBOSE=1 environment variable, execute your script, and provide the output. The output will show MKL version, as well as the number of threads used, function executed, and dimensions.

Thank you,
Oleksandr

Geary__Robert
Beginner
1,356 Views

E:\>c:\IntelPython3\Scripts\conda.exe list --explicit
# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: win-64
@EXPLICIT
https://conda.anaconda.org/intel/win-64/asn1crypto-0.24.0-py36_intel_1.tar.bz2
https://conda.anaconda.org/intel/win-64/backcall-0.1.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/backports-1.0-py36_intel_6.tar.bz2
https://conda.anaconda.org/intel/win-64/bleach-2.1.3-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/bzip2-1.0.6-vc14_intel_14.tar.bz2
https://conda.anaconda.org/intel/win-64/certifi-2018.1.18-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/cffi-1.11.5-py36_intel_1.tar.bz2
https://conda.anaconda.org/intel/win-64/chardet-3.0.4-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/colorama-0.3.9-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/conda-4.3.31-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/conda-env-2.6.0-0.tar.bz2
https://conda.anaconda.org/intel/win-64/cryptography-2.2.2-py36_intel_1.tar.bz2
https://conda.anaconda.org/intel/win-64/cycler-0.10.0-py36_intel_5.tar.bz2
https://conda.anaconda.org/intel/win-64/cython-0.28.2-py36_intel_4.tar.bz2
https://conda.anaconda.org/intel/win-64/daal-2018.0.3.20180405-0.tar.bz2
https://conda.anaconda.org/intel/win-64/decorator-4.3.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/entrypoints-0.2.3-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/freetype-2.9-vc14_intel_1.tar.bz2
https://conda.anaconda.org/intel/win-64/get_terminal_size-1.0.0-py36_intel_5.tar.bz2
https://conda.anaconda.org/intel/win-64/hdf5-1.10.1-vc14_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/html5lib-1.0.1-py36_intel_1.tar.bz2
https://conda.anaconda.org/intel/win-64/icc_rt-2018.0.3-intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/idna-2.6-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/impi_rt-2018.0.3-intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/intelpython-2018.0.3-0.tar.bz2
https://conda.anaconda.org/intel/win-64/ipykernel-4.6.1-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/ipyparallel-6.0.2-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/ipython-6.3.1-py36_intel_1.tar.bz2
https://conda.anaconda.org/intel/win-64/ipython_genutils-0.2.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/ipywidgets-7.0.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/jedi-0.12.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/jinja2-2.9.6-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/jsonschema-2.6.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/jupyter-1.0.0-py36_intel_5.tar.bz2
https://conda.anaconda.org/intel/win-64/jupyter_client-5.1.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/jupyter_console-5.1.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/jupyter_core-4.4.0-py36_intel_1.tar.bz2
https://conda.anaconda.org/intel/win-64/kiwisolver-1.0.1-py36_1.tar.bz2
https://conda.anaconda.org/intel/win-64/libpng-1.6.34-vc14_intel_1.tar.bz2
https://conda.anaconda.org/intel/win-64/llvmlite-0.23.0-py36_0.tar.bz2
https://conda.anaconda.org/intel/win-64/markupsafe-1.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/matplotlib-2.2.2-np114py36_intel_1.tar.bz2
https://conda.anaconda.org/intel/win-64/menuinst-1.4.1-py36_intel_4.tar.bz2
https://conda.anaconda.org/intel/win-64/mistune-0.7.4-py36_intel_1.tar.bz2
https://conda.anaconda.org/intel/win-64/mkl-2018.0.3-intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/mkl_fft-1.0.2-np114py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/mkl_random-1.0.1-np114py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/mpi4py-3.0.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/mpmath-1.0.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/nbconvert-5.2.1-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/nbformat-4.4.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/nose-1.3.7-py36_intel_16.tar.bz2
https://conda.anaconda.org/intel/win-64/notebook-5.2.2-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/numba-0.38.0-np114py36_intel_3.tar.bz2
https://conda.anaconda.org/intel/win-64/numexpr-2.6.4-np114py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/numpy-1.14.3-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/openmp-2018.0.3-intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/openssl-1.0.2o-vc14_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/pandas-0.22.0-np114py36_intel_4.tar.bz2
https://conda.anaconda.org/intel/win-64/pandocfilters-1.4.1-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/parso-0.2.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/path.py-11.0.1-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/pickleshare-0.7.4-py36_intel_1.tar.bz2
https://conda.anaconda.org/intel/win-64/pip-9.0.3-py36_0.tar.bz2
https://conda.anaconda.org/intel/win-64/prompt_toolkit-1.0.15-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/pycosat-0.6.3-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/pycparser-2.18-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/pydaal-2018.0.3.20180405-np114py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/pygments-2.2.0-py36_intel_1.tar.bz2
https://conda.anaconda.org/intel/win-64/pyopenssl-17.5.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/pyparsing-2.2.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/pysocks-1.6.7-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/pytables-3.4.2-np114py36_intel_4.tar.bz2
https://conda.anaconda.org/intel/win-64/python-3.6.3-intel_12.tar.bz2
https://conda.anaconda.org/intel/win-64/python-dateutil-2.6.0-py36_intel_3.tar.bz2
https://conda.anaconda.org/intel/win-64/pytz-2018.4-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/pyyaml-3.12-py36_intel_3.tar.bz2
https://conda.anaconda.org/intel/win-64/pyzmq-16.0.2-py36_intel_4.tar.bz2
https://conda.anaconda.org/intel/win-64/requests-2.18.4-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/ruamel_yaml-0.11.14-py36_intel_3.tar.bz2
https://conda.anaconda.org/intel/win-64/scikit-learn-0.19.1-np114py36_intel_29.tar.bz2
https://conda.anaconda.org/intel/win-64/scipy-1.0.1-np114py36_intel_3.tar.bz2
https://conda.anaconda.org/intel/win-64/setuptools-39.0.1-py36_0.tar.bz2
https://conda.anaconda.org/intel/win-64/simplegeneric-0.8.1-py36_intel_5.tar.bz2
https://conda.anaconda.org/intel/win-64/six-1.11.0-py36_2.tar.bz2
https://conda.anaconda.org/intel/win-64/sqlite-3.23.1-vc14_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/sympy-1.1.1-py36_intel_3.tar.bz2
https://conda.anaconda.org/intel/win-64/tbb-2018.0.4-vc14_0.tar.bz2
https://conda.anaconda.org/intel/win-64/tbb4py-2018.0.4-py36_vc14_0.tar.bz2
https://conda.anaconda.org/intel/win-64/tcl-8.6.4-vc14_intel_19.tar.bz2
https://conda.anaconda.org/intel/win-64/testpath-0.3.1-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/tk-8.6.4-vc14_intel_26.tar.bz2
https://conda.anaconda.org/intel/win-64/tornado-4.5.2-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/traitlets-4.3.2-py36_intel_1.tar.bz2
https://conda.anaconda.org/intel/win-64/urllib3-1.22-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/vc-14.0-2.tar.bz2
https://conda.anaconda.org/intel/win-64/vs2015_runtime-14.0.25420-intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/wcwidth-0.1.7-py36_intel_5.tar.bz2
https://conda.anaconda.org/intel/win-64/webencodings-0.5.1-py36_0.tar.bz2
https://conda.anaconda.org/intel/win-64/wheel-0.31.0-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/widgetsnbextension-3.2.0-py36_0.tar.bz2
https://conda.anaconda.org/intel/win-64/win_inet_pton-1.0.1-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/win_unicode_console-0.5-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/wincertstore-0.2-py36_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/xz-5.2.3-vc14_intel_0.tar.bz2
https://conda.anaconda.org/intel/win-64/zlib-1.2.11-vc14_intel_3.tar.bz2

Geary__Robert
Beginner
1,356 Views

>>> np.__config__.show()
mkl_info:
    libraries = ['mkl_rt']
    library_dirs = ['.\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['.\\Library\\include']
blas_mkl_info:
    libraries = ['mkl_rt']
    library_dirs = ['.\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['.\\Library\\include']
blas_opt_info:
    libraries = ['mkl_rt']
    library_dirs = ['.\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['.\\Library\\include']
lapack_mkl_info:
    libraries = ['mkl_rt']
    library_dirs = ['.\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['.\\Library\\include']
lapack_opt_info:
    libraries = ['mkl_rt']
    library_dirs = ['.\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['.\\Library\\include']

Geary__Robert
Beginner
1,356 Views

I inserted

import os
os.environ["MKL_VERBOSE"] = "1"

at the beginning of my script but don't get any feedback when I run it.  So it seems MKL is not being used.

1,356 Views

Ok, thank you for these response. Here is what I get:

 

(idp3) C:\Users\me\devel>ipython
Python 3.6.2 |Intel Corporation| (default, Aug 15 2017, 11:34:02) [MSC v.1900 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import os

In [2]: os.environ['MKL_VERBOSE']="1"

In [3]: import numpy as np

In [4]: x = np.random.randn(1000)

In [5]: np.dot(x,x)
MKL_VERBOSE Intel(R) MKL 2018.0 Product build 20170720 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Win 2.60GHz cdecl intel_thread NMICDev:0
MKL_VERBOSE DDOT(1000,000001D258374D20,1,000001D258374D20,1) 1.93ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:2 WDiv:HOST:+0.000
Out[5]: 928.24284740935025

In [6]:

Could you please try the following, to verify that MKL does not fail to load:
 

# coding: utf-8
import ctypes
mkl = ctypes.cdll.LoadLibrary('mkl_rt.dll')

# See https://software.intel.com/en-us/mkl-developer-reference-c-mkl-get-version-string
fn = mkl.mkl_get_version_string
fn.argtypes = (ctypes.POINTER(ctypes.c_char), ctypes.c_int)
p = ctypes.create_string_buffer(198)
fn(p, ctypes.sizeof(p))

# Output version string
print(p.value.decode('ascii'))



 

Geary__Robert
Beginner
1,356 Views

That code prints

Intel(R) Math Kernel Library Version 2018.0.3 Product Build 20180406 for Intel(R) 64 architecture applications

 

1,356 Views

Thank you for trying that and sorry for the trouble. The execution seem to confirm that MKL library can be loaded just fine, but for some reason fails to be called from numpy.dot. 

I concocted a small script that calls DDOT manually, and then through numpy.dot

import ctypes, os
os.environ['MKL_VERBOSE']='1'
mkl = ctypes.cdll.LoadLibrary('mkl_rt.dll')
import numpy as np
x = np.random.randn(100)
ddot = mkl.cblas_ddot
ddot.argtypes = (ctypes.c_int, ctypes.c_void_p, ctypes.c_int, ctypes.c_void_p, ctypes.c_int)
ddot.restype = ctypes.c_double
print("DDOT result = {}".format(ddot(x.shape[0], x.ctypes.data_as(ctypes.POINTER(ctypes.c_double)), 1, x.ctypes.data_as(ctypes.POINTER(ctypes.c_double)), 1)))
print("np.dot result = {}".format(np.dot(x,x)))

When I execute it on my dated Win-64 installation, I see the following output:

MKL_VERBOSE Intel(R) MKL 2018.0 Product build 20170720 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Win 2.60GHz lp64 intel_thread NMICDev:0
MKL_VERBOSE DDOT(100,00000175A0048180,1,00000175A0048180,1) 3.32ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:2 WDiv:HOST:+0.000
DDOT result = 90.116370539625
MKL_VERBOSE DDOT(100,00000175A0048180,1,00000175A0048180,1) 3.76us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:2 WDiv:HOST:+0.000
np.dot result = 90.116370539625

The exact value of the result can vary from run to run as I neglected to fix the seed. 

The output of this experiment should confirm that numpy.dot fails to use BLAS and falls back on generic sequential loops if the second MKL_VERBOSE is not echoed to the terminal.

Please run that script. 

I would suggest for you to create an alternative environment with minimal IDP installation as follows:

conda create -n idp_core -c intel  intelpython3_core ipython

Then activate that environment and check whether problems persists.

Thank you,
Oleksandr

Geary__Robert
Beginner
1,356 Views

I get this output:

MKL_VERBOSE Intel(R) MKL 2018.0 Update 3 Product build 20180406 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) enabled processors, Win 2.60GHz lp64 intel_thread
MKL_VERBOSE DDOT(100,000001524BB5B7A0,1,000001524BB5B7A0,1) 710.75us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:18
DDOT result = 96.64738197234854
MKL_VERBOSE DDOT(100,000001524BB5B7A0,1,000001524BB5B7A0,1) 686ns CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:18
np.dot result = 96.64738197234854

Geary__Robert
Beginner
1,356 Views

I've set up the core environment as you suggested, but I still only get single-threaded performance for my original matrix multiplication.

1,356 Views

We established that numpy.dot actualy does call into MKL BLAS function.

Could you then please append your benchmark to the end of the script:

import ctypes, os
os.environ['MKL_VERBOSE']='1'
mkl = ctypes.cdll.LoadLibrary('mkl_rt.dll')
import numpy as np
x = np.random.randn(100)
ddot = mkl.cblas_ddot
ddot.argtypes = (ctypes.c_int, ctypes.c_void_p, ctypes.c_int, ctypes.c_void_p, ctypes.c_int)
ddot.restype = ctypes.c_double
print("DDOT result = {}".format(ddot(x.shape[0], x.ctypes.data_as(ctypes.POINTER(ctypes.c_double)), 1, x.ctypes.data_as(ctypes.POINTER(ctypes.c_double)), 1)))
print("np.dot result = {}".format(np.dot(x,x)))

n = 30*1000
A = np.random.randn(n, n)

import timeit
t0 = timeit.default_timer() 
B = np.dot(A, A)
t1 = timeit.default_timer()
print("Time {}".format(t1-t0))

You should be able to see all threads being used.

Sampling of 9*10**8 Gaussian variates may take longish time with numpy.random. You can try replacing np.random.randn sampling with numpy.random_intel.randn (present in IDP), in which case faster MKL sampling functions will be used.

Geary__Robert
Beginner
1,356 Views

I have a new problem now: As soon as the matrix A is done populating, my PC crashes.  No BSOD, just a chirp from my motherboard and then power off.

My CPU is overclocked (all cores at 4.2GHz), but it's been stable for months.  That doesn't mean it will be stable on everything, but I've never seen an unstable overclock not produce a BSOD in Windows 10 with Skylake.

Geary__Robert
Beginner
1,356 Views

I've ruled out an unstable overclock: With all BIOS settings restored to defaults, I get exactly the same black screen of death when the script begins the dot product.

Running out of RAM is not a problem: I have 128GB.

1,356 Views

Thank for trying it out and reporting the subsequent issue.

I think it is worthwhile to check if setting os.environ['MKL_ENABLE_ARCHITECTURE'] = 'AVX2' atop the script helps with the crash. This will instruct MKL to use AVX2 instruction set instead of AVX512.

It's also useful to try setting os.environ['MKL_THREADING_LAYER'] = 'TBB' , and see it that helps with the crash. This will replace use of Intel OpenMP with use of Intel TBB.

If it doesn't, I'd suggest to start with a smaller matrix size, let's say, 1k by 1k, and gradually increase the size to identify the size triggering the crash, so that this issue could not taken to the MKL engineering team.

Thank you,
Oleksandr

Geary__Robert
Beginner
1,356 Views

I tried all three of your suggestions at the same time (os.environ['MKL_ENABLE_ARCHITECTURE'] = 'AVX2' os.environ['MKL_THREADING_LAYER'] = 'TBB' , and a 1000x1000 matrix) and it still crashes my PC.

1,356 Views

It is my bad, I should have checked better. My suggestion should have been 

# https://software.intel.com/en-us/mkl-linux-developer-guide-instruction-set-specific-dispatching-on-i...
os.environ['MKL_ENABLE_INSTRUCTIONS'] = 'AVX2'

Regardless of whether this helps, please try lowering matrix size by binary search until the machine stops crashing. It would help MKL developers identify the problem behind the issues you are experiencing.

Thank you,
Oleksandr

Geary__Robert
Beginner
1,356 Views

Sorry, I've already uninstalled Intel's Python Distribution.  I can't afford to play Russian roulette with my hard drives any more.

I can tell you that my system has not had any problems with AVX512 instructions in other software.

1,356 Views

Thank you for taking the time. I will try to reproduce the issue in house,

Oleksandr

Reply