topic optimize 1D FFT performance in Intel® oneAPI Math Kernel Library

optimize 1D FFT performance

Bo_Q_ — Fri, 18 Jul 2014 14:45:27 GMT

Hi,

I am trying to apply 1D FFT to a 3D matrix along a single direction. Below is the code I am currently using. It has a nested loop to loop through the other 2 dimensions. It works but I am just wondering if there is any ways to speedup this code. The size of the FFT is typically under 1024 points.

status = DftiCreateDescriptor(hFFT,DFTI_DOUBLE,DFTI_COMPLEX,1,nFFT)
status = DftiSetValue(hFFT,DFTI_COMPLEX_STORAGE,DFTI_REAL_REAL)
status = DftiCommitDescriptor(hFFT)

do j = 1,nz
    do i = 1,ny
        status = DftiComputeForward(hFFT,datarel(:,i,j),dataimg(:,i,j))
    end do
end do

status = DftiFreeDescriptor(hFFT)

Thanks!

Hi

Ying_H_Intel — Mon, 21 Jul 2014 07:31:12 GMT

The nested loop looks ok for me. and as you see from https://software.intel.com/en-us/node/433474#FFT

FFT.

For the list of FFT transforms that can be threaded, see Threaded FFT Problems.

1024 1D complex FFt is not multithreaded. So if you are working on mult-core machines, you may try the multi-thread the batched 1D 1024 point FFT by any methods. like in MKL userguide :

Examples of Using Multi-Threading for FFT Computation => Using Parallel Mode with a Common Descriptor

https://software.intel.com/en-us/articles/different-parallelization-techniques-and-intel-mkl-fft

Best Regards,

Ying