FFTW3 wrapper gains no speedup from multi-threaded linking, convert to native MKL?

klillevold · ‎11-02-2023

I have been using the FFTW3 wrapper code to implement DCT and DFT transforms in my code and it works great. Until recently I linked with the sequentual library. mkl_get_max_threads() naturally returns 1.

Now I have tried to link with the threaded library (TBB), and mkl_get_max_threads() returns the correct number of cores on my test systems - I have tried an AMD Ryzen 5 3600 (6 cores), an AWS instance (16 cores), and an M2 macBook Pro (8 cores).

However, there is no improvement in speed, and looking at the system load, it appears my program is utilizing only one thread.

So I surmise the FFTW3 MKL wrapper is not able to take advantage of multi-threading?

If I convert my code to use native Intel MKL DCT and DFT functions instead of the FFTW3 wrappers, will there be any advantage to be gained from multi-threaded linking?

klillevold · ‎11-04-2023

Further information, I am using 1-D transforms of size up to 3840, specifically fftwf_plan_r2r_1d() and fftwf_plan_dft_r2c_1d(). Test systems now also include an Intel processor.

Since the transforms are 1-dimensional and relatively small, I understand it might not be possible to run those transforms multi-threaded. I will have to implement threading in my own program and call the transforms in a parallel manner. I will read up thread safety for MKL and see if this is possible. Since these transforms are independent, this approach seems doable.

JilaniS_Intel · ‎11-06-2023

Hi,

Thanks for posting in Intel Communities.

We're glad to hear that the issue was resolved. If you have any further queries or concerns in future then please raise a new thread. We will be happy to help you. Thank you.

Regards,

Jilani

klillevold · ‎11-07-2023

[deleted]

klillevold · ‎11-07-2023

I apologize for deleting and then re-entering. I wanted to add more details. The issue has not been resolved.

I switched to using native MKL calls, and I created a Dfti descriptor handle to transform for example 100 transforms of 1440 size each.

I called DftiCreateDescriptor with float type, complex domain, one dimension. I set the parameters appropriately, including DFTI_NUMBER_OF_TRANSFORMS to 100. I now get the exact same numeric output from calling the forward transform once instead of 100 times sequentially.

Those transforms are independent and could potentially be run in parallel, yet I see that the process does not use any more threads than when linked with the sequential library, and the execution speed on a multi-core system is exactly the same.

klillevold · ‎11-08-2023

I figured out the problem after I finally found the right documentation.

https://www.intel.com/content/www/us/en/docs/onemkl/developer-guide-linux/2023-1/openmp-threaded-functions-and-problems.html#FFT

Multi-threading for FFT is only available under very limited conditions.

For example, the transform length has to be 2^N with N > 9, and one has to use double instead of single precision.

I created a test video with a resolution of 2048x2048, linked with OpenMP instead of TBB, and switched from float to double. This means that I run 512 complex to complex transforms of length 2048 per image of the video.

I can now see that threads are created, and on my 6 and 8-core test systems, I can see that all cores are fully utilized when I run my program.

However, it runs slightly slower than when using a single thread only. It is, therefore, more effective to let it run single-threaded, and leave the under-utilized cores available for other tasks. It will also use less memory, and I don't have to worry about extending the transform lengths from normal video sizes.

JilaniS_Intel · ‎11-13-2023

Hi,

Thank you for your response.

In consideration of your prior response, we understand that your issue has been resolved. Could you please confirm us the same. Thank you.

Regards,

Jilani

JilaniS_Intel · ‎11-20-2023

Hi,

A gentle reminder:

We haven't received any updates from you. Based on your previous response, it appears that your issue has been resolved. Could you kindly confirm this for us?

Regards,

Jilani

klillevold · ‎11-20-2023

Thanks - consider it resolved.

JilaniS_Intel · ‎11-21-2023

Hi,

Thanks for the confirmation.

It’s great to know that the issue has been resolved, in case you run into any other issues please feel free to create a new thread.

Regards,

Jilani