I recently received from Intel the Intel Distribution for Python (IDP) and installed it in an Intel based cluster. I used it to see how it accelerates the test cases (I wrote) presented in:
and did not see any gain with respect to Python in Anaconda and in IDP derived from Anaconda. I am really disappointed since I was expecting accelerations with IDP.
Could you letting me know if I need to do something (for instance specific installation procedures) in order to obtain better results with IDP?
Thank you in advance for your assistance.
IDP uses Intel MKL optimizations to accelerate numpy and scipy libraries. Do the test cases utilize either of these libraries? If yes, can you attach a sample code?
Thank you for responding to my request.
To simplify the process, I am interested in the two test cases presented In:
When I run them, IDP (that I obtained from Intel), Anaconda and IDP from Anaconda give the same elapsed times.
I also used the Gauss Legendre quadrature (see code below) and did not see any difference.
Thank you for your assistance.
import numpy as np
from scipy import integrate
from numpy import *
f = lambda x: np.exp(x)
order = int(sys.argv)
a = -3.0
b = 3.0
# Gauss-Legendre (default interval is [-1, 1])
x, w = np.polynomial.legendre.leggauss(order)
# Translate x values from the interval [-1, 1] to [a, b]
t = 0.5*(x + 1)*(b - a) + a
gauss = sum(w * f(t)) * 0.5*(b - a)
I am sorry that I provided the wrong link in my previous message. Her is the right one:
Intel is striving to enable as many Python developers/users as possible to utilize Intel hardware to its fullest.
Intel (R) Distribution for Python* was created to make fast delivery of these optimizations to the community possible, but the ultimate goal was to accomplish even wider adoption through upstreaming and partnership with Python distributors.
Anaconda recently adopted our patches, see https://github.com/AnacondaRecipes/numpy-feedstock/tree/master/recipe, and thus performance of NumPy-based Python code, as run in Intel Distribution for Python* and as run in default Anaconda, are comparable to each other.
Consider three conda environments:
conda create -n idp -c intel ipython numpy scipy python=3 --yes conda create -n anac5 ipython numpy scipy python=3 --yes conda create -n anac5-nomkl ipython nomkl numpy scipy python=3 --yes
I use the following snippet for performance comparison:
import numpy as np import datetime as dt import sys dim = 2000 x = np.random.randn(dim, dim) + 1j * np.random.randn(dim, dim) if len(sys.argv) < 1: print('Usage:') print(' ./fft.py N') print('Please specify the number of iterations.') sys.exit() N = int(sys.argv) begTime = dt.datetime.now() for __ in range(N): y = np.fft.fft2(x) endTime = dt.datetime.now() diffTime = endTime - begTime print('Time for 2D FFT calculations (',N,'):', diffTime.total_seconds(),'s')
With the following results:
(anac5) [20:01:02 skl-ubuntu perfQ]$ python fft.py 100 Time for 2D FFT calculations ( 100 ): 0.558279 s (anac5) [20:01:05 skl-ubuntu perfQ]$ . activate idp (idp) [07:11:36 skl-ubuntu perfQ]$ python fft.py 100 Time for 2D FFT calculations ( 100 ): 0.56407 s (idp) [07:11:39 skl-ubuntu perfQ]$ python fft.py 100 Time for 2D FFT calculations ( 100 ): 0.482773 s (idp) [07:11:48 skl-ubuntu perfQ]$ . activate anac5-nomkl (anac5-nomkl) [07:11:58 skl-ubuntu perfQ]$ python fft.py 100 Time for 2D FFT calculations ( 100 ): 21.026044 s (anac5-nomkl) [07:12:22 skl-ubuntu perfQ]$ . activate bare (bare) [07:12:41 skl-ubuntu perfQ]$ python fft.py 100 Time for 2D FFT calculations ( 100 ): 21.188223 s
Here the environment bare is Anaconda's CPython interpreter and pip-installed numpy and scipy. As you can see nomkl build of NumPy by Anaconda performs on par with NumPy distributed through PyPI, while MKL-optimized NumPy performs on par with IDP.
Thank you for your response. You clearly answered my question. I now understand that IDP, Anaconda and IDP derived from Anaconda should display comparable performance.
After the Latest Update I can't run module TBB or this Simple test:
import time import dask.array as da t0 = time.time() x = da.random.random((10000, 10000), chunks=(4096, 4096)) x.dot(x.T).sum().compute() print(time.time() - t0)
Edited: Solved by Todd (Intel), thanks