<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic MKL DFTI problems inside a parallel region - MKL 10.3 in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-DFTI-problems-inside-a-parallel-region-MKL-10-3/m-p/787053#M1902</link>
    <description>I just upgrade from MKL 10.2 to 10.3 recently and noticed that the behavior of having&lt;DIV&gt;a DFTI call to setup a descriptor, inside an OpenMP parallel region has changed compared to&lt;/DIV&gt;&lt;DIV&gt;MKL 10.2. Previously if I have code like&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;!!$omp parallel do ....&lt;/DIV&gt;&lt;DIV&gt;do ....&lt;/DIV&gt;&lt;DIV&gt; call to Dfti descriptor setup&lt;/DIV&gt;&lt;DIV&gt; call to Dfti compute&lt;/DIV&gt;&lt;DIV&gt;enddo&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;performance is acceptable. Now with MKL 10.3, it seems like there is some sort of synchronization&lt;/DIV&gt;&lt;DIV&gt;inside the descriptor setup phase, and the performance drops dramatically, to the point that it behaves&lt;/DIV&gt;&lt;DIV&gt;as if I do not have any parallelism, esp when the sizes is small (I am doing complex 3D FFT). If I move&lt;/DIV&gt;&lt;DIV&gt;the descriptor setup phase outside the OpenMP region, the performance is back to what it was before&lt;/DIV&gt;&lt;DIV&gt;with MKL 10.2, may be a little better.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Anybody noticed this behavior ?&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Mon, 18 Jul 2011 20:49:11 GMT</pubDate>
    <dc:creator>butette</dc:creator>
    <dc:date>2011-07-18T20:49:11Z</dc:date>
    <item>
      <title>MKL DFTI problems inside a parallel region - MKL 10.3</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-DFTI-problems-inside-a-parallel-region-MKL-10-3/m-p/787053#M1902</link>
      <description>I just upgrade from MKL 10.2 to 10.3 recently and noticed that the behavior of having&lt;DIV&gt;a DFTI call to setup a descriptor, inside an OpenMP parallel region has changed compared to&lt;/DIV&gt;&lt;DIV&gt;MKL 10.2. Previously if I have code like&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;!!$omp parallel do ....&lt;/DIV&gt;&lt;DIV&gt;do ....&lt;/DIV&gt;&lt;DIV&gt; call to Dfti descriptor setup&lt;/DIV&gt;&lt;DIV&gt; call to Dfti compute&lt;/DIV&gt;&lt;DIV&gt;enddo&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;performance is acceptable. Now with MKL 10.3, it seems like there is some sort of synchronization&lt;/DIV&gt;&lt;DIV&gt;inside the descriptor setup phase, and the performance drops dramatically, to the point that it behaves&lt;/DIV&gt;&lt;DIV&gt;as if I do not have any parallelism, esp when the sizes is small (I am doing complex 3D FFT). If I move&lt;/DIV&gt;&lt;DIV&gt;the descriptor setup phase outside the OpenMP region, the performance is back to what it was before&lt;/DIV&gt;&lt;DIV&gt;with MKL 10.2, may be a little better.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Anybody noticed this behavior ?&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Mon, 18 Jul 2011 20:49:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-DFTI-problems-inside-a-parallel-region-MKL-10-3/m-p/787053#M1902</guid>
      <dc:creator>butette</dc:creator>
      <dc:date>2011-07-18T20:49:11Z</dc:date>
    </item>
    <item>
      <title>MKL DFTI problems inside a parallel region - MKL 10.3</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-DFTI-problems-inside-a-parallel-region-MKL-10-3/m-p/787054#M1903</link>
      <description>There have been a number of optimizations in different updates to MKL 10.3 -- please let us know which update you're using.&lt;BR /&gt;You mayfind useful the following article on calling MKL FFTs from OpenMP-parallelized code &lt;A target="_blank" href="http://software.intel.com/en-us/articles/different-parallelization-techniques-and-intel-mkl-fft/"&gt;http://software.intel.com/en-us/articles/different-parallelization-techniques-and-intel-mkl-fft/&lt;/A&gt;.</description>
      <pubDate>Tue, 19 Jul 2011 06:02:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-DFTI-problems-inside-a-parallel-region-MKL-10-3/m-p/787054#M1903</guid>
      <dc:creator>IDZ_A_Intel</dc:creator>
      <dc:date>2011-07-19T06:02:13Z</dc:date>
    </item>
    <item>
      <title>MKL DFTI problems inside a parallel region - MKL 10.3</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-DFTI-problems-inside-a-parallel-region-MKL-10-3/m-p/787055#M1904</link>
      <description>Hi Butette, &lt;BR /&gt;&lt;BR /&gt;I recalled there is similiar reports before. The problem is mainly be the descriptorDFTI_HANDLE, which will be used inall threads. So either as you did, &lt;BR /&gt;move the descriptror setup outside the OpenMP region.&lt;BR /&gt;or&lt;BR /&gt;would you like to try add before DFTICOMMITDESCRIPTOR:&lt;BR /&gt; STATUS = DFTISETVALUE(DFTI_HANDLE, DFTI_NUMBER_OF_USERS_THREADS, 4) !!! if 4 threads is used, depends on your number of CPUs, HT:on|off etc. &lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Ying H.&lt;BR /&gt;</description>
      <pubDate>Wed, 20 Jul 2011 02:37:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-DFTI-problems-inside-a-parallel-region-MKL-10-3/m-p/787055#M1904</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2011-07-20T02:37:58Z</dc:date>
    </item>
    <item>
      <title>MKL DFTI problems inside a parallel region - MKL 10.3</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-DFTI-problems-inside-a-parallel-region-MKL-10-3/m-p/787056#M1905</link>
      <description>Hi Ying,&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Thanks for the feedback... I don't believe that is the problem though. Something clearly has changed regarding this behavior between 10.2 and 10.3 (any update, even the latest one 10.3 Update 4). As it stands Example 2&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;META http-equiv="content-type" content="text/html; charset=utf-8" /&gt;&lt;A href="http://software.intel.com/en-us/articles/different-parallelization-techniques-and-intel-mkl-fft/"&gt;http://software.intel.com/en-us/articles/different-parallelization-techniques-and-intel-mkl-fft/&lt;/A&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;will have horrible performance with 10.3 (anay update), because that's exactly what I was doing before. The same code linked with 10.2 is probably 20-30 times faster than 10.3 when the transform sizes are small. I hvae tried it both with DFTI_NUMBER_OF_USERS_THREADS set to 1 or whatever the number of threads I am using... it won't matter.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Clearly there is a bug in the descriptor setup when it is inside a parallel region.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 21 Jul 2011 01:15:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-DFTI-problems-inside-a-parallel-region-MKL-10-3/m-p/787056#M1905</guid>
      <dc:creator>butette</dc:creator>
      <dc:date>2011-07-21T01:15:46Z</dc:date>
    </item>
    <item>
      <title>MKL DFTI problems inside a parallel region - MKL 10.3</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-DFTI-problems-inside-a-parallel-region-MKL-10-3/m-p/787057#M1906</link>
      <description>Hi,&lt;BR /&gt;&lt;BR /&gt;In general, DFTI descriptor setup should be used outside of parallel region. This will allow to get improvements of your application reusing the same descriptor from different threads via DFTI_NUMBER_OF_USERS_THREADS.&lt;BR /&gt;But, could you please share with us small reproducer to analyze your problem on our side?&lt;BR /&gt;</description>
      <pubDate>Thu, 21 Jul 2011 04:17:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-DFTI-problems-inside-a-parallel-region-MKL-10-3/m-p/787057#M1906</guid>
      <dc:creator>barragan_villanueva_</dc:creator>
      <dc:date>2011-07-21T04:17:55Z</dc:date>
    </item>
  </channel>
</rss>

