topic Is DFTI_NUMBER_OF_TRANSFORMS data-parallel? in Intel® oneAPI Math Kernel Library

Is DFTI_NUMBER_OF_TRANSFORMS data-parallel?

M_A_1 — Mon, 28 Aug 2017 18:44:09 GMT

If I set DFTI_NUMBER_OF_TRANSFORMS to 4 on a AVX computer, or 8 on a AVX-512 KNL, will MKL's DftiComputeForward/Backward compute the FFT's of similar but independant, non-overlapping arrays simultaneously in SIMD or sequentially one after the other?

Thanks

Hi,

Zhen_Z_Intel — Wed, 30 Aug 2017 06:33:31 GMT

Hi,

There's no direct relationship between value of DFTI_NUMBER_OF_TRANSFORMS and SIMD. The DFTI_NUMBER_OF_TRANSFORMS is actually for performing a number of FFT transforms with a single call. It is similar to writing a for loop to perform FFT backward/forward N times.

MKL FFT supports configuration setting variables to control parallel processing. You could use DFTI_THREAD_LIMIT to set parallel or sequential for each transform of single call methods (DFTI_NUMBER_OF_TRANSFORMS>1) when MKL is parallel mode.

By default, the FFT processing is parallel for large size, but sequential for small transform. If you are using a bunch of small transforms, each FFT transform would be sequential. But if you are using a bunch of large transform and DFTI_THREAD_LIMIT!=1, each transform would be parallel.

Best regards,
Fiona

Thanks for the explanaition!

M_A_1 — Wed, 30 Aug 2017 16:15:00 GMT

Thanks for the explanaition! I'm wondering if, for a number of small single precision FT's e.g. 24x24, it would be more efficient or less to run them in parallel in SIMD (in each thread), in particular with AVX-512. Has this been investigated by Intel?

Thanks

Hi,

Zhen_Z_Intel — Tue, 05 Sep 2017 02:15:41 GMT

Hi,

If you are using a bunch of small transforms, where function call overhead comprises a noticeable part of the transform time, doing the bunch within a single call by DFTI_NUMBER_OF_TRANSFORMS probably would be more efficient. Thanks.

Best regards,
Fiona