Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
7093 Discussions

oneMKL fft convolution routine not work for large array

Kohn-Sham
New Contributor I
876 Views

Hello.

 

I'm playing with vsl?ConvExec1D, and have problem for array size of over 2^25, throwing -2800 error.

I found it operate for array of size below 2^25. (but not confirmed the data is well calculated value yet.)

 

I checked for setvars for ILP64,MKL_ILP64 flag and libraries, and this flag works for DftiComputeForward routine with 2^32 size array.

I tried this code in Ubuntu environment but not work also.

 

My code is

 

#include <iostream>
#include <cstdio>
#include <cmath>
#include "mkl.h"


int main(){
MKL_INT64 size{(MKL_INT64)pow(2,29)};
MKL_INT64 xshape{size},yshape{size};
MKL_INT64 zshape{size*2-1};
MKL_INT64 xstride=1,ystride=1,zstride=1;
MKL_INT64 i;
int status;
VSLConvTaskPtr task;

MKL_Complex16 *x{(MKL_Complex16 *)mkl_malloc(xshape*sizeof(MKL_Complex16),64)};
MKL_Complex16 *y{(MKL_Complex16 *)mkl_malloc(yshape*sizeof(MKL_Complex16),64)};
MKL_Complex16 *z{(MKL_Complex16 *)mkl_malloc(zshape*sizeof(MKL_Complex16),64)};

// generate test data
for(i=0;i<xshape;i++){
x[i].real=1;
x[i].imag=0;
}
for(i=0;i<yshape;i++){
y[i].real=0;
y[i].imag=1;
}


vslzConvNewTask1D(&task,VSL_CONV_MODE_FFT,xshape,yshape,zshape);
status=vslzConvExec1D(task,x,xstride,y,ystride,z,zstride);
std::cout<<status<<std::endl;

mkl_free(x);
mkl_free(y);
mkl_free(z);
return 0;
}

 

Thank you for reading.

0 Kudos
4 Replies
Gennady_F_Intel
Moderator
733 Views

We confirmed the behavior which looks like some kind of limitations..... we will document this limitations into one of the future releases.

--Gennady


0 Kudos
Ruqiu_C_Intel
Moderator
275 Views

Hello Kohn-Sham,

Thanks for your patience. We have reproduced the issue with array sizes over 2^25 and are working on a fix. In the meantime, it would be helpful if you could tell us what typical large sizes your application uses. And do you work on sizes over 2^30?


Regards,

Ruqiu


0 Kudos
Kohn-Sham
New Contributor I
200 Views

Hello, Ruqiu.

I'm happy to hear update.

As for array size, I don't know exactly but I'm going to use entire DRAM (it may be around 2^32 will be maximum for latest client CPU like 14900K to run convolution routine).

In detail, the object is the simulation of electrodynamics. In this simulation, size of data will be few hundreds of gigabytes, so it will be nice if convolution routine may use entire DRAM of client PC.

I don't know about preserved memory size of internal buffer for convolution routine, so I cannot estimate the array size. But, have some infos that related to array size.

1. Data type is MKL_Complex16.

2. DRAM capacity is 128 GB.

3. Array size to be processed is over DRAM capacity, so the array is divided to multiple smaller arrays and each divided array is calculated convolution. The processed data are saved to disk. After convolution of each small arrays is done, each small arrays are joined as one entire array at last.

0 Kudos
Ruqiu_C_Intel
Moderator
112 Views

Hello Kohn-Sham,


Thanks for the information.

Looking at the reproducer, 3 arrays are allocated for a convolution of size N: 2 of size N and 1 of size 2*N - 1 so just for the input/output arrays a total of (4*N-1)*sizeof(MKL_Complex16) bytes will be allocated. For N = 2^30, then that would be 16*2^32 bytes = 64 GiB. Additional memory is probably needed for the application to run and internally for the convolution computation so using N = 2^31 would exceed DRAM capacity. Based on the information, we will work on a patch for support up to 2^30, but will not prioritize the support size beyond 2^30 currently.


We appreciate your understanding and are always happy to assist you with any additional questions or concerns you may have.


Best Regards,

Ruqiu


0 Kudos
Reply