DFTI_REAL_REAL Speed?

Jason_R — Tue, 16 May 2017 14:36:23 GMT

I have some code that makes heavy use of 1-D DFTs using MKL (real-to-complex and complex-to-complex). I just realized that the library supports the DFTI_REAL_REAL layout, where the real and imaginary parts of complex numbers are stored in separate arrays. I know that this can result in more efficient implementations of some algorithms due to a reduced need for SIMD shuffles. I thought that I would ask here before rearchitecting my application to use split complex layout: could I expect any speedup in the DFT implementation by using split complex versus my current interleaved layout? I run this software on AVX, AVX2, and AVX512 platforms currently.

topic DFTI_REAL_REAL Speed? in Intel® oneAPI Math Kernel Library

DFTI_REAL_REAL Speed?