Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

optimize 1D FFT performance

Bo_Q_
Beginner
457 Views

Hi,

I am trying to apply 1D FFT to a 3D matrix along a single direction. Below is the code I am currently using. It has a nested loop to loop through the other 2 dimensions. It works but I am just wondering if there is any ways to speedup this code. The size of the FFT is typically under 1024 points.

status = DftiCreateDescriptor(hFFT,DFTI_DOUBLE,DFTI_COMPLEX,1,nFFT)
status = DftiSetValue(hFFT,DFTI_COMPLEX_STORAGE,DFTI_REAL_REAL)
status = DftiCommitDescriptor(hFFT)

do j = 1,nz
    do i = 1,ny
        status = DftiComputeForward(hFFT,datarel(:,i,j),dataimg(:,i,j))
    end do
end do

status = DftiFreeDescriptor(hFFT)

Thanks!

 

0 Kudos
1 Solution
Ying_H_Intel
Employee
457 Views

Hi 

The nested loop looks ok for me.  and  as you see from  https://software.intel.com/en-us/node/433474#FFT

1024 1D complex FFt is not multithreaded.  So if you are working on mult-core machines, you may try the multi-thread the batched 1D 1024 point FFT by any methods. like in MKL userguide : 

Examples of Using Multi-Threading for FFT Computation  => Using Parallel Mode with a Common Descriptor

or 

https://software.intel.com/en-us/articles/different-parallelization-techniques-and-intel-mkl-fft

Best Regards,

Ying 

View solution in original post

0 Kudos
1 Reply
Ying_H_Intel
Employee
458 Views

Hi 

The nested loop looks ok for me.  and  as you see from  https://software.intel.com/en-us/node/433474#FFT

1024 1D complex FFt is not multithreaded.  So if you are working on mult-core machines, you may try the multi-thread the batched 1D 1024 point FFT by any methods. like in MKL userguide : 

Examples of Using Multi-Threading for FFT Computation  => Using Parallel Mode with a Common Descriptor

or 

https://software.intel.com/en-us/articles/different-parallelization-techniques-and-intel-mkl-fft

Best Regards,

Ying 

0 Kudos
Reply