- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

If I set DFTI_NUMBER_OF_TRANSFORMS to 4 on a AVX computer, or 8 on a AVX-512 KNL, will MKL's DftiComputeForward/Backward compute the FFT's of similar but independant, non-overlapping arrays simultaneously in SIMD or sequentially one after the other?

Thanks

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi,

There's no direct relationship between value of DFTI_NUMBER_OF_TRANSFORMS and SIMD. The DFTI_NUMBER_OF_TRANSFORMS is actually for performing a number of FFT transforms with a single call. It is similar to writing a for loop to perform FFT backward/forward N times.

MKL FFT supports configuration setting variables to control parallel processing. You could use DFTI_THREAD_LIMIT to set parallel or sequential for each transform of single call methods (DFTI_NUMBER_OF_TRANSFORMS>1) when MKL is parallel mode.

By default, the FFT processing is parallel for large size, but sequential for small transform. If you are using a bunch of small transforms, each FFT transform would be sequential. But if you are using a bunch of large transform and DFTI_THREAD_LIMIT!=1, each transform would be parallel.

Best regards,

Fiona

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Thanks for the explanaition! I'm wondering if, for a number of small single precision FT's e.g. 24x24, it would be more efficient or less to run them in parallel in SIMD (in each thread), in particular with AVX-512. Has this been investigated by Intel?

Thanks

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi,

If you are using a bunch of small transforms, where function call overhead comprises a noticeable part of the transform time, doing the bunch within a single call by DFTI_NUMBER_OF_TRANSFORMS probably would be more efficient. Thanks.

Best regards,

Fiona

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page