<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic optimize 1D FFT performance in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/optimize-1D-FFT-performance/m-p/1004100#M18782</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;I am trying to apply 1D FFT to a 3D matrix along a single direction. Below is the code I am currently using. It has a nested loop to loop through the other 2 dimensions. It works but I am just wondering if there is any ways to speedup this code. The size of the FFT is typically under 1024 points.&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;status = DftiCreateDescriptor(hFFT,DFTI_DOUBLE,DFTI_COMPLEX,1,nFFT)
status = DftiSetValue(hFFT,DFTI_COMPLEX_STORAGE,DFTI_REAL_REAL)
status = DftiCommitDescriptor(hFFT)

do j = 1,nz
    do i = 1,ny
        status = DftiComputeForward(hFFT,datarel(:,i,j),dataimg(:,i,j))
    end do
end do

status = DftiFreeDescriptor(hFFT)
&lt;/PRE&gt;

&lt;P&gt;Thanks!&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 18 Jul 2014 14:45:27 GMT</pubDate>
    <dc:creator>Bo_Q_</dc:creator>
    <dc:date>2014-07-18T14:45:27Z</dc:date>
    <item>
      <title>optimize 1D FFT performance</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/optimize-1D-FFT-performance/m-p/1004100#M18782</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;I am trying to apply 1D FFT to a 3D matrix along a single direction. Below is the code I am currently using. It has a nested loop to loop through the other 2 dimensions. It works but I am just wondering if there is any ways to speedup this code. The size of the FFT is typically under 1024 points.&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;status = DftiCreateDescriptor(hFFT,DFTI_DOUBLE,DFTI_COMPLEX,1,nFFT)
status = DftiSetValue(hFFT,DFTI_COMPLEX_STORAGE,DFTI_REAL_REAL)
status = DftiCommitDescriptor(hFFT)

do j = 1,nz
    do i = 1,ny
        status = DftiComputeForward(hFFT,datarel(:,i,j),dataimg(:,i,j))
    end do
end do

status = DftiFreeDescriptor(hFFT)
&lt;/PRE&gt;

&lt;P&gt;Thanks!&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 18 Jul 2014 14:45:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/optimize-1D-FFT-performance/m-p/1004100#M18782</guid>
      <dc:creator>Bo_Q_</dc:creator>
      <dc:date>2014-07-18T14:45:27Z</dc:date>
    </item>
    <item>
      <title>Hi </title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/optimize-1D-FFT-performance/m-p/1004101#M18783</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;/P&gt;

&lt;P&gt;The nested loop looks ok for me. &amp;nbsp;and &amp;nbsp;as you see from &amp;nbsp;https://software.intel.com/en-us/node/433474#FFT&lt;/P&gt;

&lt;UL id="GUID-E3F8A448-C29A-4370-AEA0-9031E5FE0889"&gt;
	&lt;LI&gt;
		&lt;P id="GUID-27982994-92E2-4392-8058-2FC8371F2575" style="margin-bottom: 0.5em;"&gt;FFT.&lt;/P&gt;

		&lt;P id="GUID-631EDD61-AE53-4DA5-834F-45784D3E5BDB" style="margin-bottom: 0.5em;"&gt;For the list of FFT transforms that can be threaded, see&amp;nbsp;&lt;A href="https://software.intel.com/node/e210aa26-b0cb-4c84-a490-94ec68a06645#FFT"&gt;Threaded FFT Problems&lt;/A&gt;.&lt;/P&gt;
	&lt;/LI&gt;
&lt;/UL&gt;

&lt;P style="margin-bottom: 0.5em;"&gt;1024 1D complex FFt is not multithreaded. &amp;nbsp;So if you are working on mult-core machines,&amp;nbsp;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;you may try the multi-thread the batched 1D 1024 point FFT by any methods. like in MKL userguide :&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Examples of Using Multi-Threading for FFT Computation &amp;nbsp;=&amp;gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Using Parallel Mode with a Common Descriptor&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;or&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;A href="https://software.intel.com/en-us/articles/different-parallelization-techniques-and-intel-mkl-fft" target="_blank"&gt;https://software.intel.com/en-us/articles/different-parallelization-techniques-and-intel-mkl-fft&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Best Regards,&lt;/P&gt;

&lt;P&gt;Ying&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 21 Jul 2014 07:31:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/optimize-1D-FFT-performance/m-p/1004101#M18783</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2014-07-21T07:31:12Z</dc:date>
    </item>
  </channel>
</rss>

