Intel® Distribution for Python*
Support and discussions for achieving faster Python* applications and core computational packages.
381 Discussions

## Use numba in intel python environment, but get more running time and memory error Beginner
595 Views

I am using Anaconda with python 3.6. Firstly, I have a code as below to create a 3d matrix. It is basically evaluate a phase term over the matrix, and the phase term is determined by location in the matrix.

```g = lambda p,m,n: np.exp(2j*np.pi*(kz*p+kx*m+ky*n)).astype(np.complex64)
shift = np.fromfunction(g,(2*nz,2*nx,2*ny))
shift = np.roll(np.roll(np.roll(shift,nz,0),nx,1),ny,2)```

kz,kx,ky are known parameters as float number. nz,nx,ny are the scale of the matrix.

Then I have a faster one doing same thing based on numba.jit.

```x = np.arange(2*nx)
y = np.arange(2*ny)
z = np.arange(2*nz)
zv, yv, xv = np.meshgrid(z,y, x)
@jit(nopython=True, parallel=True)
def f(z,x,y):
a = np.exp(2j*np.pi*(kx*x+ky*y)).astype(np.complex64)
return a
a = f(zv,xv,yv)
a = a.swapaxes(0,1)
a = np.roll(np.roll(np.roll(a,nz,0),nx,1),ny,2)```

This one is faster than the former one, but I am looking forward to an even faster one.

I installed intel python distribution. But I find that these two codes above don't work anymore. In IPD environment, the first code turns out to be "Memory Error", the second one costs super long time and finally kernel died.

Can anyone point out why?

5 Replies Employee
595 Views

Could you please indicate the amount of memory on the machine where you experience the issue in question, as well as ball-park values for nx, ny, nz.

Essentially, we need to be able to reproduce the issue with a reasonable effort to provide an answer.

Please also indicate which version of Intel Distribution for Python you are using. You can find this out by including the output of `conda list intelpython`.

Thank you Beginner
595 Views
```kx = -0.21445783132530122
ky = -0.21445783132530122
kz = 0

nx = 512
ny = 512
nz = 390```

I am using Anaconda3-5.0.0-Windows-x86_64 and Intel w_python3_p_2018.0.018, both are the latest version online.

The computer has Intel Core i7-7700HQ Quad Core cpu and 16G ram. Beginner
595 Views

Oleksandr P. (Intel) wrote:

Could you please indicate the amount of memory on the machine where you experience the issue in question, as well as ball-park values for nx, ny, nz.

Essentially, we need to be able to reproduce the issue with a reasonable effort to provide an answer.

Please also indicate which version of Intel Distribution for Python you are using. You can find this out by including the output of `conda list intelpython`.

Thank you

```kx = -0.21445783132530122
ky = -0.21445783132530122
kz = 0

nx = 512
ny = 512
nz = 390```

I am using Anaconda3-5.0.0-Windows-x86_64 and Intel w_python3_p_2018.0.018, both are the latest version online.

The computer has Intel Core i7-7700HQ Quad Core cpu and 16G ram. Employee
595 Views

OK, the array `shift` is 512*512*390 complex floats, which amounts to about 6GB of memory, so you should be able to fit it in, but any code that creates a lot of intermediate arrays may cause `MemoryError` exception.

The `np.fromfunction` passes to function `g` three arrays of doubles, containing arrays `p`, `m`, `n` each of shape `(2*nz, 2*nx, 2*ny)`, hence each taking 6GB of memory each.

Computing `kz*p + kx*m + ky*n` verbatim creates multiple intermediate arrays for each sub-operation, then two calls to `np.multiply` generate more intermediate arrays, then a call to `np.exp` creates another intermediate array of complex doubles, which is then cast into a newly allocated array for compex singles.

Using Anaconda 5, and Intel Distribution for Python, on Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz with 64GB of RAM, I get

```# for Anaconda 5
In: %time shift = np.fromfunction(g,(2*nz,2*nx,2*ny))
CPU times: user 1min 14s, sys: 31.6 s, total: 1min 46s
Wall time: 1min 48s

# for IDP 2018.0.0
In: %time shift = np.fromfunction(g,(2*nz,2*nx,2*ny))
CPU times: user 2min, sys: 26.1 s, total: 2min 26s
Wall time: 1min 31s
```

However, writing those steps individually, I am able to achieve better performance and use less memory:

```def g2(p,m,n):
tmp = kz*p
ph = tmp.copy()
np.copyto(tmp, m)
tmp *= kx
ph += tmp
np.copyto(tmp, n)
tmp *= ky
ph += tmp
ph *= 2*np.pi
np.cos(ph, out=tmp)
r = np.empty(tmp.shape, np.singlecomplex)
r.real[:] = tmp
np.sin(ph, out=tmp)
del ph
r.imag[:] = tmp
return r
```

Now, running

```# Anaconda 5
In: %time shift2 = np.fromfunction(g2,(2*nz,2*nx,2*ny))
CPU times: user 54.2 s, sys: 18.5 s, total: 1min 12s
Wall time: 1min 12s

# IDP 2018.0.0
In: %time shift2 = np.fromfunction(g2,(2*nz,2*nx,2*ny))
CPU times: user 1min 24s, sys: 17.6 s, total: 1min 42s
Wall time: 16.3 s
```

Similarly, in order to be mindful of intermediate expressions, you should perform `np.roll` on separate lines.

Furthermore, it is probably best to create `shift` array in blocks, and combine result. Employee
595 Views

I used a smaller machine and got MemoryError even for your numpy code:

```In : import numpy as np
...: (kx, ky, kz, nx, ny, nz) = (-0.21445783132530122, -0.21445783132530122, 0, 512, 512, 390)
...:

In : g = lambda p,m,n: np.exp(2j*np.pi*(kz*p+kx*m+ky*n)).astype(np.complex64)

In : %time shift = np.fromfunction(g,(2*nz,2*nx,2*ny))
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<timed exec> in <module>()

/home/miniconda3/envs/intel3/lib/python3.6/site-packages/numpy/core/numeric.py in fromfunction(function, shape, **kwargs)
2130     dtype = kwargs.pop('dtype', float)
2131     args = indices(shape, dtype=dtype)
-> 2132     return function(*args, **kwargs)
2133
2134

<ipython-input-2-89b938ec44bd> in <lambda>(p, m, n)
----> 1 g = lambda p,m,n: np.exp(2j*np.pi*(kz*p+kx*m+ky*n)).astype(np.complex64)

MemoryError:
```

then, I used Dask in order to split array in chunks:

```In : import dask 