<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: FFTW3 wrapper gains no speedup from multi-threaded linking, convert to native MKL? in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/FFTW3-wrapper-gains-no-speedup-from-multi-threaded-linking/m-p/1541971#M35419</link>
    <description>&lt;P&gt;I figured out the problem after I finally found the right documentation.&lt;/P&gt;&lt;P&gt;&lt;A href="https://www.intel.com/content/www/us/en/docs/onemkl/developer-guide-linux/2023-1/openmp-threaded-functions-and-problems.html#FFT" target="_blank"&gt;https://www.intel.com/content/www/us/en/docs/onemkl/developer-guide-linux/2023-1/openmp-threaded-functions-and-problems.html#FFT&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Multi-threading for FFT is only available under very limited conditions.&lt;/P&gt;&lt;P&gt;For example, the transform length has to be 2^N with N &amp;gt; 9, and one has to use double instead of single precision.&lt;/P&gt;&lt;P&gt;I created a test video with a resolution of 2048x2048, linked with OpenMP instead of TBB, and switched from float to double. This means that I run 512&amp;nbsp; complex to complex transforms of length 2048 per image of the video.&lt;/P&gt;&lt;P&gt;I can now see that threads are created, and on my 6 and 8-core test systems, I can see that all cores are fully utilized when I run my program.&lt;/P&gt;&lt;P&gt;However, it runs slightly slower than when using a single thread only. It is, therefore, more effective to let it run single-threaded, and leave the under-utilized cores available for other tasks. It will also use less memory,&amp;nbsp; and I don't have to worry about extending the transform lengths from normal video sizes.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 08 Nov 2023 18:16:10 GMT</pubDate>
    <dc:creator>klillevold</dc:creator>
    <dc:date>2023-11-08T18:16:10Z</dc:date>
    <item>
      <title>FFTW3 wrapper gains no speedup from multi-threaded linking, convert to native MKL?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/FFTW3-wrapper-gains-no-speedup-from-multi-threaded-linking/m-p/1539768#M35387</link>
      <description>&lt;P&gt;I have been using the FFTW3 wrapper code to implement DCT and DFT transforms in my code and it works great. Until recently I linked with the sequentual library.&amp;nbsp;mkl_get_max_threads() naturally returns 1.&lt;/P&gt;&lt;P&gt;Now I have tried to link with the threaded library (TBB), and&amp;nbsp;mkl_get_max_threads() returns the correct number of cores on my test systems - I have tried an AMD Ryzen 5 3600 (6 cores), an AWS instance (16 cores), and an M2 macBook Pro (8 cores).&amp;nbsp;&lt;/P&gt;&lt;P&gt;However, there is no improvement in speed, and looking at the system load, it appears my program is utilizing only one thread.&lt;/P&gt;&lt;P&gt;So I surmise the FFTW3 MKL wrapper is not able to take advantage of multi-threading?&lt;/P&gt;&lt;P&gt;If I convert my code to use native Intel MKL DCT and DFT functions instead of the FFTW3 wrappers, will there be any advantage to be gained from multi-threaded linking?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 02 Nov 2023 09:24:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/FFTW3-wrapper-gains-no-speedup-from-multi-threaded-linking/m-p/1539768#M35387</guid>
      <dc:creator>klillevold</dc:creator>
      <dc:date>2023-11-02T09:24:40Z</dc:date>
    </item>
    <item>
      <title>Re: FFTW3 wrapper gains no speedup from multi-threaded linking, convert to native MKL?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/FFTW3-wrapper-gains-no-speedup-from-multi-threaded-linking/m-p/1540570#M35398</link>
      <description>&lt;P&gt;Further information, I am using 1-D transforms of size up to 3840, specifically&amp;nbsp;fftwf_plan_r2r_1d() and&amp;nbsp;fftwf_plan_dft_r2c_1d(). Test systems now also include an Intel processor.&lt;/P&gt;&lt;P&gt;Since the transforms are 1-dimensional and relatively small, I understand it might not be possible to run those transforms multi-threaded. I will have to implement threading in my own program and call the transforms in a parallel manner. I will read up thread safety for MKL and see if this is possible. Since these transforms are independent, this approach seems doable.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 04 Nov 2023 09:00:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/FFTW3-wrapper-gains-no-speedup-from-multi-threaded-linking/m-p/1540570#M35398</guid>
      <dc:creator>klillevold</dc:creator>
      <dc:date>2023-11-04T09:00:44Z</dc:date>
    </item>
    <item>
      <title>Re:FFTW3 wrapper gains no speedup from multi-threaded linking, convert to native MKL?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/FFTW3-wrapper-gains-no-speedup-from-multi-threaded-linking/m-p/1541270#M35408</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks for posting in Intel Communities.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;We're glad to hear that the issue was resolved. If you have any further queries or concerns in future then please raise a new thread. We will be happy to help you. Thank you.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Jilani&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 07 Nov 2023 07:25:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/FFTW3-wrapper-gains-no-speedup-from-multi-threaded-linking/m-p/1541270#M35408</guid>
      <dc:creator>JilaniS_Intel</dc:creator>
      <dc:date>2023-11-07T07:25:39Z</dc:date>
    </item>
    <item>
      <title>Re: FFTW3 wrapper gains no speedup from multi-threaded linking, convert to native MKL?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/FFTW3-wrapper-gains-no-speedup-from-multi-threaded-linking/m-p/1541339#M35410</link>
      <description>&lt;P&gt;[deleted]&lt;/P&gt;</description>
      <pubDate>Tue, 07 Nov 2023 15:30:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/FFTW3-wrapper-gains-no-speedup-from-multi-threaded-linking/m-p/1541339#M35410</guid>
      <dc:creator>klillevold</dc:creator>
      <dc:date>2023-11-07T15:30:45Z</dc:date>
    </item>
    <item>
      <title>Re: FFTW3 wrapper gains no speedup from multi-threaded linking, convert to native MKL?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/FFTW3-wrapper-gains-no-speedup-from-multi-threaded-linking/m-p/1541351#M35411</link>
      <description>&lt;P&gt;I apologize for deleting and then re-entering. I wanted to add more details. The issue has not been resolved.&lt;/P&gt;&lt;P&gt;I switched to using native MKL calls, and I created a Dfti descriptor handle to transform for example 100 transforms of 1440 size each.&lt;/P&gt;&lt;P&gt;I called DftiCreateDescriptor with float type, complex domain, one dimension.&amp;nbsp; I set the parameters appropriately, including&amp;nbsp;DFTI_NUMBER_OF_TRANSFORMS&amp;nbsp; to 100. I now get the exact same numeric output from calling the forward transform once instead of 100 times sequentially.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Those transforms are independent and could potentially be run in parallel, yet I see that the process does not use any more threads than when linked with the sequential library, and the execution speed on a multi-core system is exactly the same.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 07 Nov 2023 19:04:03 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/FFTW3-wrapper-gains-no-speedup-from-multi-threaded-linking/m-p/1541351#M35411</guid>
      <dc:creator>klillevold</dc:creator>
      <dc:date>2023-11-07T19:04:03Z</dc:date>
    </item>
    <item>
      <title>Re: FFTW3 wrapper gains no speedup from multi-threaded linking, convert to native MKL?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/FFTW3-wrapper-gains-no-speedup-from-multi-threaded-linking/m-p/1541971#M35419</link>
      <description>&lt;P&gt;I figured out the problem after I finally found the right documentation.&lt;/P&gt;&lt;P&gt;&lt;A href="https://www.intel.com/content/www/us/en/docs/onemkl/developer-guide-linux/2023-1/openmp-threaded-functions-and-problems.html#FFT" target="_blank"&gt;https://www.intel.com/content/www/us/en/docs/onemkl/developer-guide-linux/2023-1/openmp-threaded-functions-and-problems.html#FFT&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Multi-threading for FFT is only available under very limited conditions.&lt;/P&gt;&lt;P&gt;For example, the transform length has to be 2^N with N &amp;gt; 9, and one has to use double instead of single precision.&lt;/P&gt;&lt;P&gt;I created a test video with a resolution of 2048x2048, linked with OpenMP instead of TBB, and switched from float to double. This means that I run 512&amp;nbsp; complex to complex transforms of length 2048 per image of the video.&lt;/P&gt;&lt;P&gt;I can now see that threads are created, and on my 6 and 8-core test systems, I can see that all cores are fully utilized when I run my program.&lt;/P&gt;&lt;P&gt;However, it runs slightly slower than when using a single thread only. It is, therefore, more effective to let it run single-threaded, and leave the under-utilized cores available for other tasks. It will also use less memory,&amp;nbsp; and I don't have to worry about extending the transform lengths from normal video sizes.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Nov 2023 18:16:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/FFTW3-wrapper-gains-no-speedup-from-multi-threaded-linking/m-p/1541971#M35419</guid>
      <dc:creator>klillevold</dc:creator>
      <dc:date>2023-11-08T18:16:10Z</dc:date>
    </item>
    <item>
      <title>Re:FFTW3 wrapper gains no speedup from multi-threaded linking, convert to native MKL?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/FFTW3-wrapper-gains-no-speedup-from-multi-threaded-linking/m-p/1543318#M35435</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thank you for your response.&lt;/P&gt;&lt;P&gt;In consideration of your prior response, we understand that your issue has been resolved. Could you please confirm us the same. Thank you.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Jilani&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 13 Nov 2023 16:33:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/FFTW3-wrapper-gains-no-speedup-from-multi-threaded-linking/m-p/1543318#M35435</guid>
      <dc:creator>JilaniS_Intel</dc:creator>
      <dc:date>2023-11-13T16:33:13Z</dc:date>
    </item>
    <item>
      <title>Re:FFTW3 wrapper gains no speedup from multi-threaded linking, convert to native MKL?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/FFTW3-wrapper-gains-no-speedup-from-multi-threaded-linking/m-p/1545534#M35459</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;A gentle reminder:&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;We haven't received any updates from you. Based on your previous response, it appears that your issue has been resolved. Could you kindly confirm this for us?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Jilani&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 20 Nov 2023 11:11:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/FFTW3-wrapper-gains-no-speedup-from-multi-threaded-linking/m-p/1545534#M35459</guid>
      <dc:creator>JilaniS_Intel</dc:creator>
      <dc:date>2023-11-20T11:11:20Z</dc:date>
    </item>
    <item>
      <title>Re: FFTW3 wrapper gains no speedup from multi-threaded linking, convert to native MKL?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/FFTW3-wrapper-gains-no-speedup-from-multi-threaded-linking/m-p/1545635#M35465</link>
      <description>&lt;P&gt;Thanks - consider it resolved.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 20 Nov 2023 16:09:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/FFTW3-wrapper-gains-no-speedup-from-multi-threaded-linking/m-p/1545635#M35465</guid>
      <dc:creator>klillevold</dc:creator>
      <dc:date>2023-11-20T16:09:39Z</dc:date>
    </item>
    <item>
      <title>Re:FFTW3 wrapper gains no speedup from multi-threaded linking, convert to native MKL?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/FFTW3-wrapper-gains-no-speedup-from-multi-threaded-linking/m-p/1546354#M35473</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks for the confirmation.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;It’s great to know that the issue has been resolved, in case you run into any other issues please feel free to create a new thread.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Jilani&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 22 Nov 2023 05:13:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/FFTW3-wrapper-gains-no-speedup-from-multi-threaded-linking/m-p/1546354#M35473</guid>
      <dc:creator>JilaniS_Intel</dc:creator>
      <dc:date>2023-11-22T05:13:04Z</dc:date>
    </item>
  </channel>
</rss>

