Solved: oneMKL fft convolution routine not work for large array

Kohn-Sham · ‎01-24-2024

Hello.

I'm playing with vsl?ConvExec1D, and have problem for array size of over 2^25, throwing -2800 error.

I found it operate for array of size below 2^25. (but not confirmed the data is well calculated value yet.)

I checked for setvars for ILP64,MKL_ILP64 flag and libraries, and this flag works for DftiComputeForward routine with 2^32 size array.

I tried this code in Ubuntu environment but not work also.

My code is

#include <iostream>
#include <cstdio>
#include <cmath>
#include "mkl.h"

int main(){
MKL_INT64 size{(MKL_INT64)pow(2,29)};
MKL_INT64 xshape{size},yshape{size};
MKL_INT64 zshape{size*2-1};
MKL_INT64 xstride=1,ystride=1,zstride=1;
MKL_INT64 i;
int status;
VSLConvTaskPtr task;

MKL_Complex16 *x{(MKL_Complex16 *)mkl_malloc(xshape*sizeof(MKL_Complex16),64)};
MKL_Complex16 *y{(MKL_Complex16 *)mkl_malloc(yshape*sizeof(MKL_Complex16),64)};
MKL_Complex16 *z{(MKL_Complex16 *)mkl_malloc(zshape*sizeof(MKL_Complex16),64)};

// generate test data
for(i=0;i<xshape;i++){
x[i].real=1;
x[i].imag=0;
}
for(i=0;i<yshape;i++){
y[i].real=0;
y[i].imag=1;
}

vslzConvNewTask1D(&task,VSL_CONV_MODE_FFT,xshape,yshape,zshape);
status=vslzConvExec1D(task,x,xstride,y,ystride,z,zstride);
std::cout<<status<<std::endl;

mkl_free(x);
mkl_free(y);
mkl_free(z);
return 0;
}

Thank you for reading.

Ruqiu_C_Intel · ‎12-04-2024

The fix for supporting up to 2^30 will be available in oneMKL feature release.

View solution in original post

Gennady_F_Intel · ‎02-23-2024

We confirmed the behavior which looks like some kind of limitations..... we will document this limitations into one of the future releases.

--Gennady

Ruqiu_C_Intel · ‎09-09-2024

Hello Kohn-Sham,

Thanks for your patience. We have reproduced the issue with array sizes over 2^25 and are working on a fix. In the meantime, it would be helpful if you could tell us what typical large sizes your application uses. And do you work on sizes over 2^30?

Regards,

Ruqiu

Kohn-Sham · ‎09-19-2024

Hello, Ruqiu.

I'm happy to hear update.

As for array size, I don't know exactly but I'm going to use entire DRAM (it may be around 2^32 will be maximum for latest client CPU like 14900K to run convolution routine).

In detail, the object is the simulation of electrodynamics. In this simulation, size of data will be few hundreds of gigabytes, so it will be nice if convolution routine may use entire DRAM of client PC.

I don't know about preserved memory size of internal buffer for convolution routine, so I cannot estimate the array size. But, have some infos that related to array size.

1. Data type is MKL_Complex16.

2. DRAM capacity is 128 GB.

3. Array size to be processed is over DRAM capacity, so the array is divided to multiple smaller arrays and each divided array is calculated convolution. The processed data are saved to disk. After convolution of each small arrays is done, each small arrays are joined as one entire array at last.

Ruqiu_C_Intel · ‎10-11-2024

Hello Kohn-Sham,

Thanks for the information.

Looking at the reproducer, 3 arrays are allocated for a convolution of size N: 2 of size N and 1 of size 2*N - 1 so just for the input/output arrays a total of (4*N-1)*sizeof(MKL_Complex16) bytes will be allocated. For N = 2^30, then that would be 16*2^32 bytes = 64 GiB. Additional memory is probably needed for the application to run and internally for the convolution computation so using N = 2^31 would exceed DRAM capacity. Based on the information, we will work on a patch for support up to 2^30, but will not prioritize the support size beyond 2^30 currently.

We appreciate your understanding and are always happy to assist you with any additional questions or concerns you may have.

Best Regards,

Ruqiu

Ruqiu_C_Intel · ‎12-04-2024

The fix for supporting up to 2^30 will be available in oneMKL feature release.