<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic OPENMP and MKL FFT: strange behavior in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/OPENMP-and-MKL-FFT-strange-behavior/m-p/823032#M4931</link>
    <description>Thank you very much!!!&lt;BR /&gt;Private variables are not initialized!! I always forget about it!&lt;BR /&gt;&lt;BR /&gt;Clodxp</description>
    <pubDate>Tue, 11 May 2010 09:50:46 GMT</pubDate>
    <dc:creator>clodxp</dc:creator>
    <dc:date>2010-05-11T09:50:46Z</dc:date>
    <item>
      <title>OPENMP and MKL FFT: strange behavior</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/OPENMP-and-MKL-FFT-strange-behavior/m-p/823030#M4929</link>
      <description>Hi all!&lt;BR /&gt;I've collected an example wherein a VERY STRANGE BEHAVIOR happens: the use of an FFT within an OpenMP cycle, with OMP_NUM_THREADS=1, seems to go &lt;SPAN style="text-decoration: underline;"&gt;about 10 time faster&lt;/SPAN&gt; than the serial version!!&lt;BR /&gt;&lt;BR /&gt;&lt;B&gt;The code essentially is made by a cycle wherein an FFT is performed. I would like that each thread would perform a part of the M FFTS to be computed. &lt;/B&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;I&gt;CYCLE SERIAL VERSION -------------------------&lt;BR /&gt; &lt;BR /&gt; do xx=1,M&lt;BR /&gt; &lt;BR /&gt; !initalize data to be transformed&lt;BR /&gt; data_fft=xx+imag*xx*2.&lt;BR /&gt; &lt;BR /&gt; ! Perform FFT&lt;BR /&gt; Status=DftiComputeForward(Desc_Handle,data_fft)&lt;BR /&gt; &lt;BR /&gt; !perform sum of the elements&lt;BR /&gt; sum_vect(xx)=sum(data_fft)&lt;BR /&gt; &lt;BR /&gt; end do&lt;BR /&gt;!----------------------------------------------------------&lt;/I&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;I've parallelized this cycle in two version: correct and uncorrect. &lt;BR /&gt;&lt;BR /&gt;In the first one, the correct one (see FFT_2D_openmp.f90 attached) the cycle is parallelized as follows&lt;BR /&gt;&lt;BR /&gt;&lt;I&gt;CYCLE PARALLEL VERSION -----------------------------&lt;BR /&gt; !$omp parallel&lt;BR /&gt; !$omp do private(data_fft) schedule(static,num_it_schedule)&lt;BR /&gt; do xx=1,M&lt;BR /&gt; &lt;BR /&gt; !initalize data to be transformed&lt;BR /&gt; data_fft=xx+imag*xx*2.&lt;BR /&gt; &lt;BR /&gt; ! Perform FFT&lt;BR /&gt; Status=DftiComputeForward(Desc_Handle,data_fft)&lt;BR /&gt; &lt;BR /&gt; !perform sum of the elements&lt;BR /&gt; sum_vect(xx)=sum(data_fft)&lt;BR /&gt; &lt;BR /&gt; end do&lt;BR /&gt; !$omp end do&lt;BR /&gt; !$omp end parallel&lt;BR /&gt;--------------------------------------------------------------&lt;/I&gt;&lt;BR /&gt;&lt;BR /&gt;To obtain a correct functioning it is necessary to set DFTI_NUMBER_OF_USER_THREADS=to the number of running threads.&lt;BR /&gt;&lt;BR /&gt;OK!The correct version has been obtained after the wrong on (see FFT_2D_openmp_wrong.f90), showing the strange behavior.&lt;BR /&gt;In the wrong version i've made an error, since a declared as private in the cycle also the status and handle of the FFT: &lt;BR /&gt;&lt;BR /&gt;&lt;I&gt;CYCLE PARALLEL VERSION (WRONG!!) -----------------------------------------------------&lt;BR /&gt; !$omp parallel&lt;BR /&gt; !$omp do private(data_fft,Status,Desc_Handle) schedule(static,num_it_schedule)&lt;BR /&gt; do xx=1,M&lt;BR /&gt; &lt;BR /&gt; !initalize data to be transformed&lt;BR /&gt; data_fft=xx+imag*xx*2.&lt;BR /&gt; &lt;BR /&gt; ! Perform FFT&lt;BR /&gt; Status=DftiComputeForward(Desc_Handle,data_fft)&lt;BR /&gt; &lt;BR /&gt; !perform sum of the elements&lt;BR /&gt; sum_vect(xx)=sum(data_fft)&lt;BR /&gt; &lt;BR /&gt; end do&lt;BR /&gt; !$omp end do&lt;BR /&gt; !$omp end parallel&lt;BR /&gt;---------------------------------------------------------------------------------------------------&lt;/I&gt;&lt;BR /&gt;&lt;BR /&gt;Unfortunately, the result (the sum of the element of sum_vect) is correct (if compared to the results of the serial version) and the time is about 10 time lower!!!&lt;BR /&gt;&lt;BR /&gt;This is the execution of FFT_2D_openmp_wrong.f90 on my machine (Mac Pro 8-core).&lt;BR /&gt;&lt;BR /&gt;&lt;I&gt;---------- S t a r t&lt;BR /&gt;+ Matrix size Nx,Ny = 300.0000 300.0000&lt;BR /&gt;+ Cycle over M = 100.0000&lt;BR /&gt;--&amp;gt; SERIAL&lt;BR /&gt;+ Serial execution time = 0.120011000006343&lt;BR /&gt;+ Serial result (sum) = (4.5450000E+08,9.0900000E+08)&lt;BR /&gt;--&amp;gt; PARALLEL&lt;BR /&gt;+ Number of threads = 1&lt;BR /&gt;+ Parallel execution time = 9.684999997261912E-003&lt;BR /&gt;+ Parallel result (sum) = (4.5450000E+08,9.0900000E+08)&lt;BR /&gt;--&amp;gt; SPEEDUP (ideal = nthread) = 12.3914300506218&lt;BR /&gt;--&amp;gt; EFFICIENCY (ideal =1) = 12.3914300506218&lt;BR /&gt;---------- S t o p&lt;/I&gt;&lt;BR /&gt;&lt;BR /&gt;The parallel execution time is &lt;SPAN style="text-decoration: underline;"&gt;about 12 time lower&lt;/SPAN&gt; than the serial one, while a correct working is obtained with FFT_2D_openmp.f90.&lt;BR /&gt;&lt;BR /&gt;Can someone explain this??!&lt;BR /&gt;And please can confirm the correct use of the MKL FFT for my needs??&lt;BR /&gt;&lt;BR /&gt;Thanks&lt;BR /&gt;&lt;BR /&gt;Clodxp&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 10 May 2010 16:25:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/OPENMP-and-MKL-FFT-strange-behavior/m-p/823030#M4929</guid>
      <dc:creator>clodxp</dc:creator>
      <dc:date>2010-05-10T16:25:05Z</dc:date>
    </item>
    <item>
      <title>OPENMP and MKL FFT: strange behavior</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/OPENMP-and-MKL-FFT-strange-behavior/m-p/823031#M4930</link>
      <description>&lt;P&gt;Hi Clodxp,&lt;BR /&gt;&lt;BR /&gt;The initial value of a private variable in an OpenMP sectionis undefined (FORTRAN OpenMP 2.0 &lt;A href="http://www.openmp.org/mp-documents/fspec20.pdf"&gt;http://www.openmp.org/mp-documents/fspec20.pdf&lt;/A&gt;, p. 35; OpenMP 3.0 &lt;A href="http://www.openmp.org/mp-documents/spec30.pdf"&gt;http://www.openmp.org/mp-documents/spec30.pdf&lt;/A&gt;, p.90).&lt;BR /&gt;Hence each call to DftiComputeForward in the parallel part of FFT_2D_openmp_wrong.f90 returns DFTI_BAD_DESCRIPTOR and doesn't change the input data.&lt;BR /&gt;&lt;BR /&gt;Ironically, we can't catch this error by checking, as in FFT_2D_openmp_wrong.f90, sums for a &lt;EM&gt;constant &lt;/EM&gt;signal v=(v[1], v[2], ..., v&lt;N&gt;)where v&lt;I&gt; =v&lt;J&gt;=c for all i and j, because FFT(v) = (c*N, 0, 0, ..., 0) in this case.&lt;BR /&gt;If DftiComputeForward succeeds, then v is replaced with FFT(v) and we get sum(FFT(v)) = c*N.&lt;BR /&gt;If DftiComputeForward fails, then v isn't changed and we getsum(v) = c*N.&lt;BR /&gt;Hence you see the same sum in the sequential and "parallel" case...&lt;BR /&gt;&lt;BR /&gt;You may find useful the following Knowledge Base article&lt;A target="_blank" href="http://software.intel.com/en-us/articles/different-parallelization-techniques-and-intel-mkl-fft/"&gt;http://software.intel.com/en-us/articles/different-parallelization-techniques-and-intel-mkl-fft/&lt;/A&gt;about parallelization of (2D) FFTs.&lt;BR /&gt;Given only FFT_2D_openmp_wrong.f90 and in FFT_2D_openmp.f90, it's hard to tell what are your needs and what would be the correct use of MKL FFT for you.&lt;/J&gt;&lt;/I&gt;&lt;/N&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 11 May 2010 09:02:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/OPENMP-and-MKL-FFT-strange-behavior/m-p/823031#M4930</guid>
      <dc:creator>Evgueni_P_Intel</dc:creator>
      <dc:date>2010-05-11T09:02:52Z</dc:date>
    </item>
    <item>
      <title>OPENMP and MKL FFT: strange behavior</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/OPENMP-and-MKL-FFT-strange-behavior/m-p/823032#M4931</link>
      <description>Thank you very much!!!&lt;BR /&gt;Private variables are not initialized!! I always forget about it!&lt;BR /&gt;&lt;BR /&gt;Clodxp</description>
      <pubDate>Tue, 11 May 2010 09:50:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/OPENMP-and-MKL-FFT-strange-behavior/m-p/823032#M4931</guid>
      <dc:creator>clodxp</dc:creator>
      <dc:date>2010-05-11T09:50:46Z</dc:date>
    </item>
  </channel>
</rss>

