Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
7234 Discussions

Relative performance of real and complex FFTs in MKL

Van_Veen__Lennaert
542 Views

I am writing a code that uses many FFTs (billions)  of fairly long sequences (2^p for p in the range 17-20, i.e. 100,000 - 1,00,000). It is crucial to find the fastest way to execute those. I read a technical report:

https://epubs.stfc.ac.uk/manifestation/45434584/RAL-TR-2020-003.pdf

that claims that the execution time for complex-to-complex FFTs with MKL is about twice as LOW as that of real-to-real FFTs of the same length (in particular, for a length of around 1,000,000).

This seems counter-intuitive, as the complex-to-complex transform should require about twice the number of operations. In fact, one can process two real-to-real transforms in one complex-to-complex call. I can imagine the execution time being roughly equal if complex arithmetic is performed in parallel in the micro-architecture, but it would puzzle me if it is lower.
My first question is simply if this is true.
My second question is what settings this result depends on (e.g. mkl_num_threads, compiler optimization, storage scheme, flags to encourage SIMD processing, et cetera).

If this is true for -Ofast compiling and when allowing MKL to grab any number of threads (up to the number of physical cores) then I will commit to using complex-to-complex transforms.

Postscript: I found the following remark:

community.intel.com/t5/Software-Archive/How-to-get-peak-performnace-in-FFT/m-p/975037#M24838

from an Intel programmer, stating that more energy has been invested in optimizing the complex-to-complex FFTs, which might be a clue. However, it does not explain how or state that complex-to-complex is faster.

Labels (1)
0 Kudos
0 Replies
Reply