Is DFTI_NUMBER_OF_TRANSFORMS data-parallel?

M_A_1 · ‎08-28-2017

If I set DFTI_NUMBER_OF_TRANSFORMS to 4 on a AVX computer, or 8 on a AVX-512 KNL, will MKL's DftiComputeForward/Backward compute the FFT's of similar but independant, non-overlapping arrays simultaneously in SIMD or sequentially one after the other?

Thanks

Zhen_Z_Intel · ‎08-29-2017

Hi,

There's no direct relationship between value of DFTI_NUMBER_OF_TRANSFORMS and SIMD. The DFTI_NUMBER_OF_TRANSFORMS is actually for performing a number of FFT transforms with a single call. It is similar to writing a for loop to perform FFT backward/forward N times.

MKL FFT supports configuration setting variables to control parallel processing. You could use DFTI_THREAD_LIMIT to set parallel or sequential for each transform of single call methods (DFTI_NUMBER_OF_TRANSFORMS>1) when MKL is parallel mode.

By default, the FFT processing is parallel for large size, but sequential for small transform. If you are using a bunch of small transforms, each FFT transform would be sequential. But if you are using a bunch of large transform and DFTI_THREAD_LIMIT!=1, each transform would be parallel.

Best regards,
Fiona

M_A_1 · ‎08-30-2017

Thanks for the explanaition! I'm wondering if, for a number of small single precision FT's e.g. 24x24, it would be more efficient or less to run them in parallel in SIMD (in each thread), in particular with AVX-512. Has this been investigated by Intel?

Thanks

Zhen_Z_Intel · ‎09-04-2017

Hi,

If you are using a bunch of small transforms, where function call overhead comprises a noticeable part of the transform time, doing the bunch within a single call by DFTI_NUMBER_OF_TRANSFORMS probably would be more efficient. Thanks.

Best regards,
Fiona