<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic FFT parallelization comparison in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/FFT-parallelization-comparison/m-p/878811#M9300</link>
    <description>&lt;P&gt;I am using MKL 10.2.3; my specific use case is a 1 Dimensional complex to complex transform.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;At the beginning of Chapter 6, the user guide says "FFT" is threaded. It does not mention any of the restrictions listed in your reply.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;In previous versions the memory array given to the FFT needed to be a factor of 128 for best performance. Is this still the case?&lt;/P&gt;
&lt;P&gt;Does running the transform using out-of-place versus in-place memory make a difference?&lt;/P&gt;</description>
    <pubDate>Wed, 03 Mar 2010 17:48:23 GMT</pubDate>
    <dc:creator>Marshall__Michael_B</dc:creator>
    <dc:date>2010-03-03T17:48:23Z</dc:date>
    <item>
      <title>FFT parallelization comparison</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/FFT-parallelization-comparison/m-p/878809#M9298</link>
      <description>&lt;P&gt;Has any done any comparison of the different parallelization techniques for the 1 D FFTs?&lt;/P&gt;
&lt;P&gt;I'd like to know which method in general is faster? internal threading or user threading?&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Currently my application is single-threaded but improving the FFT time would warrant the added complexity of handling our own threads for this app.&lt;/P&gt;</description>
      <pubDate>Wed, 03 Mar 2010 14:35:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/FFT-parallelization-comparison/m-p/878809#M9298</guid>
      <dc:creator>Marshall__Michael_B</dc:creator>
      <dc:date>2010-03-03T14:35:15Z</dc:date>
    </item>
    <item>
      <title>FFT parallelization comparison</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/FFT-parallelization-comparison/m-p/878810#M9299</link>
      <description>&lt;P&gt;First of all, please look at the list of threaded functions into the version of MKL which you are using.You can find this list into MKL User's Guide ( see chapter 6 - "Using Intel MKL Parallelism").&lt;/P&gt;
&lt;P&gt;There are some restrictions for this functionality, e.g for the latest MKL 10.2 Update 4 :&lt;/P&gt;
&lt;P&gt;
&lt;/P&gt;&lt;DIV id="_mcePaste"&gt;1D real-to-complex and complex-to-real transforms are not threaded.&lt;/DIV&gt;
&lt;DIV id="_mcePaste"&gt;1D complex-to-complex transforms using split-complex layout are not threaded.&lt;/DIV&gt;
&lt;DIV id="_mcePaste"&gt;Prime-size complex-to-complex 1D transforms are not threaded.&lt;/DIV&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;and etc..&lt;/P&gt;
&lt;P&gt;--Gennady&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Mar 2010 15:53:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/FFT-parallelization-comparison/m-p/878810#M9299</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2010-03-03T15:53:17Z</dc:date>
    </item>
    <item>
      <title>FFT parallelization comparison</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/FFT-parallelization-comparison/m-p/878811#M9300</link>
      <description>&lt;P&gt;I am using MKL 10.2.3; my specific use case is a 1 Dimensional complex to complex transform.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;At the beginning of Chapter 6, the user guide says "FFT" is threaded. It does not mention any of the restrictions listed in your reply.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;In previous versions the memory array given to the FFT needed to be a factor of 128 for best performance. Is this still the case?&lt;/P&gt;
&lt;P&gt;Does running the transform using out-of-place versus in-place memory make a difference?&lt;/P&gt;</description>
      <pubDate>Wed, 03 Mar 2010 17:48:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/FFT-parallelization-comparison/m-p/878811#M9300</guid>
      <dc:creator>Marshall__Michael_B</dc:creator>
      <dc:date>2010-03-03T17:48:23Z</dc:date>
    </item>
    <item>
      <title>FFT parallelization comparison</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/FFT-parallelization-comparison/m-p/878812#M9301</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;You know, Gennady added that fragment with limitations from MKL 10.2.4 User's Gude.&lt;/P&gt;
&lt;P&gt;As to memory alignments: 16, or 128 or even page-alignment should provide better performance because using vectorizing code in DFT-kernels and compact page migrations (I mean DTLB misses).&lt;/P&gt;
&lt;P&gt;About comparison out-of-place versus in-place 1D&lt;/P&gt;
&lt;P&gt;for small sizes the difference isnonsignificant (see below Gfs for 1thread):&lt;/P&gt;
&lt;DIV&gt;
&lt;PRE&gt;Forward_DFT_C,     x210,    8.432,1th,1D,in-place
Forward_DFT_C,     x210,    8.502,1th,1D,out-of-place&lt;/PRE&gt;
&lt;/DIV&gt;
&lt;DIV&gt;
&lt;PRE&gt;Forward_DFT_C,     x504,    9.896,1th,1D,in-place
Forward_DFT_C,     x504,    9.885,1th,1D,out-of-place&lt;/PRE&gt;
&lt;/DIV&gt;
&lt;P&gt;but for large sizes (more that cache size) it will be significant difference&lt;/P&gt;
&lt;DIV&gt;
&lt;PRE&gt;Forward_DFT_C,     x3211264,    4.196,1th,1D,in-place
Forward_DFT_C,     x3211264,    4.312,1th,1D,out-of-place&lt;/PRE&gt;
&lt;PRE&gt;
Forward_DFT_C,     x6250000,    3.763,1th,1D,in-place
Forward_DFT_C,     x6250000,    3.837,1th,1D,out-of-place
&lt;/PRE&gt;
&lt;/DIV&gt;</description>
      <pubDate>Thu, 11 Mar 2010 10:02:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/FFT-parallelization-comparison/m-p/878812#M9301</guid>
      <dc:creator>barragan_villanueva_</dc:creator>
      <dc:date>2010-03-11T10:02:42Z</dc:date>
    </item>
  </channel>
</rss>

