- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I am using Anaconda with python 3.6. Firstly, I have a code as below to create a 3d matrix. It is basically evaluate a phase term over the matrix, and the phase term is determined by location in the matrix.

g = lambda p,m,n: np.exp(2j*np.pi*(kz*p+kx*m+ky*n)).astype(np.complex64) shift = np.fromfunction(g,(2*nz,2*nx,2*ny)) shift = np.roll(np.roll(np.roll(shift,nz,0),nx,1),ny,2)

kz,kx,ky are known parameters as float number. nz,nx,ny are the scale of the matrix.

Then I have a faster one doing same thing based on numba.jit.

x = np.arange(2*nx) y = np.arange(2*ny) z = np.arange(2*nz) zv, yv, xv = np.meshgrid(z,y, x) @jit(nopython=True, parallel=True) def f(z,x,y): a = np.exp(2j*np.pi*(kx*x+ky*y)).astype(np.complex64) return a a = f(zv,xv,yv) a = a.swapaxes(0,1) a = np.roll(np.roll(np.roll(a,nz,0),nx,1),ny,2)

This one is faster than the former one, but I am looking forward to an even faster one.

I installed intel python distribution. But I find that these two codes above don't work anymore. In IPD environment, the first code turns out to be "Memory Error", the second one costs super long time and finally kernel died.

Can anyone point out why?

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Could you please indicate the amount of memory on the machine where you experience the issue in question, as well as ball-park values for nx, ny, nz.

Essentially, we need to be able to reproduce the issue with a reasonable effort to provide an answer.

Please also indicate which version of Intel Distribution for Python you are using. You can find this out by including the output of `conda list intelpython`.

Thank you

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

kx = -0.21445783132530122 ky = -0.21445783132530122 kz = 0 nx = 512 ny = 512 nz = 390

I am using **Anaconda3-5.0.0-Windows-x86_64** and **Intel w_python3_p_2018.0.018**, both are the latest version online.

The computer has Intel Core i7-7700HQ Quad Core cpu and 16G ram.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Oleksandr P. (Intel) wrote:

Could you please indicate the amount of memory on the machine where you experience the issue in question, as well as ball-park values for nx, ny, nz.

Essentially, we need to be able to reproduce the issue with a reasonable effort to provide an answer.

Please also indicate which version of Intel Distribution for Python you are using. You can find this out by including the output of `conda list intelpython`.

Thank you

kx = -0.21445783132530122 ky = -0.21445783132530122 kz = 0 nx = 512 ny = 512 nz = 390

I am using Anaconda3-5.0.0-Windows-x86_64 and Intel w_python3_p_2018.0.018, both are the latest version online.

The computer has Intel Core i7-7700HQ Quad Core cpu and 16G ram.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

OK, the array `shift` is 512*512*390 complex floats, which amounts to about 6GB of memory, so you should be able to fit it in, but any code that creates a lot of intermediate arrays may cause `MemoryError` exception.

The `np.fromfunction` passes to function `g` three arrays of doubles, containing arrays `p`, `m`, `n` each of shape `(2*nz, 2*nx, 2*ny)`, hence each taking 6GB of memory each.

Computing `kz*p + kx*m + ky*n` verbatim creates multiple intermediate arrays for each sub-operation, then two calls to `np.multiply` generate more intermediate arrays, then a call to `np.exp` creates another intermediate array of complex doubles, which is then cast into a newly allocated array for compex singles.

Using Anaconda 5, and Intel Distribution for Python, on Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz with 64GB of RAM, I get

# for Anaconda 5 In[3]: %time shift = np.fromfunction(g,(2*nz,2*nx,2*ny)) CPU times: user 1min 14s, sys: 31.6 s, total: 1min 46s Wall time: 1min 48s # for IDP 2018.0.0 In[3]: %time shift = np.fromfunction(g,(2*nz,2*nx,2*ny)) CPU times: user 2min, sys: 26.1 s, total: 2min 26s Wall time: 1min 31s

However, writing those steps individually, I am able to achieve better performance and use less memory:

def g2(p,m,n): tmp = kz*p ph = tmp.copy() np.copyto(tmp, m) tmp *= kx ph += tmp np.copyto(tmp, n) tmp *= ky ph += tmp ph *= 2*np.pi np.cos(ph, out=tmp) r = np.empty(tmp.shape, np.singlecomplex) r.real[:] = tmp np.sin(ph, out=tmp) del ph r.imag[:] = tmp return r

Now, running

# Anaconda 5 In[4]: %time shift2 = np.fromfunction(g2,(2*nz,2*nx,2*ny)) CPU times: user 54.2 s, sys: 18.5 s, total: 1min 12s Wall time: 1min 12s # IDP 2018.0.0 In[4]: %time shift2 = np.fromfunction(g2,(2*nz,2*nx,2*ny)) CPU times: user 1min 24s, sys: 17.6 s, total: 1min 42s Wall time: 16.3 s

Similarly, in order to be mindful of intermediate expressions, you should perform `np.roll` on separate lines.

Furthermore, it is probably best to create `shift` array in blocks, and combine result.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I used a smaller machine and got MemoryError even for your numpy code:

In [1]: import numpy as np ...: (kx, ky, kz, nx, ny, nz) = (-0.21445783132530122, -0.21445783132530122, 0, 512, 512, 390) ...: In [2]: g = lambda p,m,n: np.exp(2j*np.pi*(kz*p+kx*m+ky*n)).astype(np.complex64) In [3]: %time shift = np.fromfunction(g,(2*nz,2*nx,2*ny)) --------------------------------------------------------------------------- MemoryError Traceback (most recent call last) <timed exec> in <module>() /home/miniconda3/envs/intel3/lib/python3.6/site-packages/numpy/core/numeric.py in fromfunction(function, shape, **kwargs) 2130 dtype = kwargs.pop('dtype', float) 2131 args = indices(shape, dtype=dtype) -> 2132 return function(*args, **kwargs) 2133 2134 <ipython-input-2-89b938ec44bd> in <lambda>(p, m, n) ----> 1 g = lambda p,m,n: np.exp(2j*np.pi*(kz*p+kx*m+ky*n)).astype(np.complex64) MemoryError:

then, I used Dask in order to split array in chunks:

In [10]: import dask ...: import dask.array as da In [11]: %time shift = da.fromfunction(g, shape=(2*nz,2*nx,2*ny), dtype=np.double, chunks=(32,32,32)).compute(get=dask.local.get_sync) CPU times: user 1min 7s, sys: 24.1 s, total: 1min 31s Wall time: 1min 34s

It works!

As for whether Numba in IDfP works slower, we cannot reproduce it having sufficient memory on the machine for the computation.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page