Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Unable to perform 2D FFT with NX=NY=128

HarshM
Beginner
1,515 Views

Hi,

 

I am trying to perform 2D FFT of an image. However, My code gives segmentation fault when I give dimensions more than 32x32. I intend to perform FFT on 128x128.

 

Also, If you can help me finding stacktrace, It would be great. When I compile it with -g option and see stacktrace in lldb:

 

This is what I get:

```

➜ cmake-build-debug lldb IntelTest
(lldb) target create "IntelTest"
Current executable set to '/Users/harshmathur/CourseworkRepo/IntelTest/cmake-build-debug/IntelTest' (x86_64).
(lldb) run
Process 13947 launched: '/Users/harshmathur/CourseworkRepo/IntelTest/cmake-build-debug/IntelTest' (x86_64)
DftiGetValue DFTI_PACKED_FORMAT : 57
Process 13947 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x111844100)
frame #0: 0x0000000110568a4c libmkl_avx2.2.dylib`mkl_dft_avx2_ipps_cFFTfwd_64_64fc + 2428
libmkl_avx2.2.dylib`mkl_dft_avx2_ipps_cFFTfwd_64_64fc:
-> 0x110568a4c <+2428>: vmovupd %ymm2, (%rsi)
0x110568a50 <+2432>: vmovupd %ymm7, 0x200(%rsi)
0x110568a58 <+2440>: vaddpd %ymm13, %ymm6, %ymm2
0x110568a5d <+2445>: vsubpd %ymm13, %ymm6, %ymm6

```

 

Please help.

 

The code is here, If you change NN to 32, the code will work, it crashes for NN > 32:

 

```

#include <iostream>
#include <chrono>
#include <mkl.h>
#define NN 128
#define NPIXFFT NN * (1 + NN / 2)
using namespace std;

typedef struct {
double re;
double im;
} mkl_double_complex;


int getFFTWPlans(DFTI_DESCRIPTOR_HANDLE *descHandle);


int main() {
int i;
int alignment=256;
MKL_LONG status;
DFTI_DESCRIPTOR_HANDLE descHandle;
getFFTWPlans(&descHandle);
double *image = (double*)mkl_malloc(sizeof(double) * NN * NN, alignment);
double *recoveredImage = (double*)mkl_malloc(sizeof(double) * NN * NN, alignment);
mkl_double_complex *imageFT = (mkl_double_complex*) mkl_malloc(
NPIXFFT * sizeof(mkl_double_complex),alignment);

for (i=0; i < NN * NN; i++) {
image[i] = i * 2 + 1;
}


while (1) {

status = DftiComputeForward(descHandle, image, imageFT);
if (status != 0) {
cout <<"DftiComputeForward Failed: " << status << endl;
break;
}

status = DftiComputeBackward(descHandle, imageFT, recoveredImage);
if (status != 0) {
cout <<"DftiComputeBackward Failed: " << status << endl;
break;
}

}
return 0;
}

int getFFTWPlans(DFTI_DESCRIPTOR_HANDLE *descHandle){

MKL_LONG lengths[2];
lengths[0] = NN;
lengths[1] = NN;
MKL_LONG status = DftiCreateDescriptor(descHandle, DFTI_DOUBLE, DFTI_REAL, 2, lengths);

if (status != 0) {
cout << "DftiCreateDescriptor failed : " << status << endl;
return -1;
}
status = DftiSetValue(*descHandle, DFTI_PLACEMENT, DFTI_NOT_INPLACE);
if (status != 0) {
cout << "DftiSetValue DFTI_PLACEMENT failed : " << status << endl;
return -2;
}

status = DftiSetValue(*descHandle, DFTI_THREAD_LIMIT, 1);
if (status != 0) {
cout << "DftiSetValue DFTI_THREAD_LIMIT failed : " << status << endl;
return -2;
}

MKL_LONG format = 0;
status = DftiSetValue(*descHandle, DFTI_PACKED_FORMAT, DFTI_CCE_FORMAT);
if (status != 0) {
cout << "DftiSetValue DFTI_PACKED_FORMAT failed : " << status << endl;
return -3;
}

status = DftiGetValue(*descHandle, DFTI_PACKED_FORMAT, &format);
if (status != 0) {
cout << "DftiGetValue DFTI_PACKED_FORMAT failed : " << status << endl;
return -4;
}

cout << "DftiGetValue DFTI_PACKED_FORMAT : " << format << endl;

status = DftiCommitDescriptor(*descHandle);
if (status != 0) {
cout << "DftiCommitDescriptor failed : " << status << endl;
return -5;
}

return status;
}

```

0 Kudos
11 Replies
ShanmukhS_Intel
Moderator
1,440 Views

Hi Harsh,


Thanks for posting in Intel Communities.


Could you please share with us the MKL version, CMake file being used and steps to reproduce as it helps reproduce your issue at our environment.


Best Regards,

Shanmukh.SS


0 Kudos
HarshM
Beginner
1,411 Views

Hi Shanmukh,

 

Thank you for the response. But by trial and error, I figured out the issue. I am performing 2D FFT, however, my 2D image is stored in row-major order 1D array. Thus, After giving the input and output strides as (0,1 128) and DFTI_CONJUGATE_EVEN_STORAGE = DFTI_COMPLEX_COMPLEX, The code started working. However, It would be great if instead of Segmentation fault, maybe, a warning to the user is issued if the user makes such a mistake.

 

Cmakelists.txt File:

 

```

cmake_minimum_required(VERSION 3.14)
project(TipTiltAO)

set(CMAKE_BUILD_TYPE RELEASE)
set(CMAKE_CXX_STANDARD 17)

remove_definitions(-DUNICODE -D_UNICODE)

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Ofast -march=native")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Ofast -march=native")

set(FFTWLINK C:/PROGRA~2/Intel/oneAPI/mkl/2023.1.0/lib/intel64)
set(INCLUDE_FFTW_DIR C:/PROGRA~2/Intel/oneAPI/mkl/2023.1.0/include)
set(FFTWLINKLIB ${FFTWLINK}/mkl_intel_lp64.lib ${FFTWLINK}/mkl_core.lib ${FFTWLINK}/mkl_sequential.lib)

add_executable(MKLTest mkltest.cpp)
target_include_directories(MKLTest PRIVATE ${INCLUDE_FFTW_DIR})
target_link_libraries(MKLTest ${FFTWLINKLIB})

```

 

The Full Working Example:

```

#include <iostream>
#include <fstream>
#include <mkl.h>
#include <complex>
#include <chrono>
#define NN 256
#define NPIXFFT NN * (1 + (NN / 2))
using namespace std;

typedef struct {
double re;
double im;
} mkl_double_complex;


int getFFTWPlans(DFTI_DESCRIPTOR_HANDLE *descHandle);
int printFFT(mkl_double_complex *imageFT);


int main() {

chrono::high_resolution_clock::time_point t0, t1;
chrono::duration<double> dt;

int i, flag=0;
MKL_LONG status, count;
DFTI_DESCRIPTOR_HANDLE descHandle;
getFFTWPlans(&descHandle);

double *image = (double*)malloc(sizeof(double) * NN * NN);
double *recoveredImage = (double*)malloc(sizeof(double) * NN * NN);
mkl_double_complex *imageFT = (mkl_double_complex*) mkl_malloc(
NPIXFFT * sizeof(mkl_double_complex),64);

// ifstream currentImage("F:\\tiptilt\\20230421_121442\\CurrentImage_286.dat", ios::binary);

double TEMP;

for (int i=0; i<NN * NN; i++){
// currentImage.read((char*) &TEMP, sizeof(double));
image[i] = i * i + i * 2 + 1;
}

double x;
count = 0;
while (1) {
if (count == 0) {
t0 = chrono::high_resolution_clock::now();
}
status = DftiComputeForward(descHandle, image, imageFT);
if (status != 0) {
cout <<"DftiComputeForward Failed: " << status << endl;
break;
}
// if (flag == 0) {
// printFFT(imageFT);
// flag = 1;
// }
for (i=0; i<NPIXFFT; i++){
x = imageFT[i].re;
}
status = DftiComputeBackward(descHandle, imageFT, recoveredImage);
if (status != 0) {
cout <<"DftiComputeBackward Failed: " << status << endl;
break;
}
count += 1;
if (count == 3000) {
t1 = chrono::high_resolution_clock::now();
dt = chrono::duration_cast<chrono::duration<double>>(t1 - t0);
cout<< "Time (ms): "<<dt.count() * 1000 / (count - 1) <<"\r";
cout.flush();
}
}
return 0;
}

int getFFTWPlans(DFTI_DESCRIPTOR_HANDLE *descHandle){

MKL_LONG lengths[2];
lengths[0] = NN;
lengths[1] = NN;
MKL_LONG status = DftiCreateDescriptor(descHandle, DFTI_DOUBLE, DFTI_REAL, 2, lengths);

if (status != 0) {
cout << "DftiCreateDescriptor failed : " << status << endl;
return -1;
}
status = DftiSetValue(*descHandle, DFTI_PLACEMENT, DFTI_NOT_INPLACE);
if (status != 0) {
cout << "DftiSetValue DFTI_PLACEMENT failed : " << status << endl;
return -2;
}

status = DftiSetValue(*descHandle, DFTI_THREAD_LIMIT, 1);
if (status != 0) {
cout << "DftiSetValue DFTI_THREAD_LIMIT failed : " << status << endl;
return -3;
}

status = DftiSetValue(*descHandle, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_COMPLEX);
if (status != 0) {
cout << "DftiSetValue DFTI_CONJUGATE_EVEN_STORAGE failed : " << status << endl;
return -4;
}

status = DftiSetValue(*descHandle, DFTI_PACKED_FORMAT, DFTI_CCE_FORMAT);
if (status != 0) {
cout << "DftiSetValue DFTI_PACKED_FORMAT failed : " << status << endl;
return -5;
}

MKL_LONG strides[2];
strides[0] = NN;
strides[1] = 1;

status = DftiSetValue(*descHandle, DFTI_INPUT_STRIDES, strides);
if (status != 0) {
cout << "DftiSetValue DFTI_INPUT_STRIDES failed : " << status << endl;
return -6;
}

status = DftiSetValue(*descHandle, DFTI_OUTPUT_STRIDES, strides);
if (status != 0) {
cout << "DftiSetValue DFTI_OUTPUT_STRIDES failed : " << status << endl;
return -7;
}

MKL_LONG format;
status = DftiGetValue(*descHandle, DFTI_PACKED_FORMAT, &format);
if (status != 0) {
cout << "DftiGetValue DFTI_PACKED_FORMAT failed : " << status << endl;
return -8;
}
cout << "DftiGetValue DFTI_PACKED_FORMAT : " << format << endl;

status = DftiCommitDescriptor(*descHandle);
if (status != 0) {
cout << "DftiCommitDescriptor failed : " << status << endl;
return -9;
}

return status;
}

```

0 Kudos
ShanmukhS_Intel
Moderator
1,348 Views

Hi Harsh,

 

Glad to know that you were able to resolve the issue.

However, It would be great if instead of Segmentation fault, maybe, a warning to the user is issued if the user makes such a mistake.

>>We would like to inform you that there was a memory access violation and as a result, it is resulting in a segmentation fault. We will convey the suggestion to the corresponding team. 

 

Best Regards,

Shanmukh.SS

 

0 Kudos
HarshM
Beginner
1,340 Views

Thanks,

 

 

A bug in the code, although the code works (it won't crash), the FFT result is wrong. For correct fft result the strides should be:

 

MKL_LONG strides[3];
strides[0] = 0;
strides[1] = 1;
strides[2] = NN;

 

0 Kudos
ShanmukhS_Intel
Moderator
1,250 Views

Hi Harsh,

 

Thanks for sharing the details, As mentioned, we'll work on the code with the below-mentioned values.

MKL_LONG strides[3];

strides[0] = 0;

strides[1] = 1;

strides[2] = NN;

 

Best Regards,

Shanmukh.SS

 

0 Kudos
ShanmukhS_Intel
Moderator
1,186 Views

Hi Harsh,

 

We would like to inform you that the output strides should be set differently than the input strides and that is most likely what is causing the writes past allocation bounds in your case. 

 

For forward transforms, set input strides to {0, 128, 1} and output strides to {0, 65, 1}. For backward transforms, set input strides to {0, 65, 1} and output strides to {0, 128, 1}. (65 = 128/2 + 1) 

 

You could refer to this link as a reference.

 

As we could see it is an incorrect usage, for the descriptor to be used as configured, it must use an allocation of 128*128 double-precision complex values in the backward domain (currently using only 128*65 such values).

 

Accessing memory outside of an allocation (which we do due in the current case due to incorrect setting of strides) is undefined behavior in C and C++. It was given a SIGSEGV ('seg fault') by the operating system, as this is not necessary in this undefined behavior case; the output could have resulted in silent data corruption if practically used.

 

We would want to inform you that, there could not be a 'warning' replacement possible on our side, as only the user knows their actual allocation size (which they are providing incorrectly to us).

 

Best Regards,

Shanmukh.SS

 

0 Kudos
HarshM
Beginner
1,180 Views
would that mean there must be two descriptiors, one for forward and one for backward transform?
0 Kudos
ShanmukhS_Intel
Moderator
1,125 Views

Hi Harsh,

 

Please find the below link for more information regarding the input and output strides for your reference, it seems the hyperlink was missed earlier.

https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-c/2023-1/dfti-input-strides-dfti-output-strides.html

 

You could use one descriptor for forward and backward transform.

 

Best Regards,

Shanmukh.SS

 

0 Kudos
ShanmukhS_Intel
Moderator
975 Views

Hi Harsh,


A gentle reminder:

Could you please let us know if you need any other information or let us know if we could close this thread on our end?


Best Regards,

Shanmukh.SS


0 Kudos
HarshM
Beginner
965 Views
Yes. I have been able to run the FFT code after setting the correct strides and using only one descriptor for both forward and backward transform. This thread is resolved.

I have another thread on how to optimise for performance. I would request if you can resolve that soon as well.

Thank you.

Regards,
Harsh
0 Kudos
ShanmukhS_Intel
Moderator
895 Views

Hi Harsh,

 

It’s great to know that the issue has been resolved, in case you run into any other issues please feel free to create a new thread.

 

I have another thread on how to optimise for performance. I would request if you can resolve that soon as well.

>> We are working on this issue internally. We will get back to you soon with an update on the other thread URL.

 

Best Regards,

Shanmukh.SS

 

0 Kudos
Reply