<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic This a 1D computation. And in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-inside-OpenMP-loop-MKL-2018/m-p/1159636#M27868</link>
    <description>&lt;P&gt;This a 1D computation. And after changing the code to serial,&amp;nbsp;&lt;SPAN style="font-size: 12px;"&gt;DftiCommitDescriptor was still the bottleneck. Clearly moving the&amp;nbsp;DftiCommitDescriptor outside of the loop&amp;nbsp;would help - it is just a&amp;nbsp;surprising result that DftiCommitDescriptor is so 'expensive'&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 26 Mar 2018 14:00:30 GMT</pubDate>
    <dc:creator>AndrewC</dc:creator>
    <dc:date>2018-03-26T14:00:30Z</dc:date>
    <item>
      <title>MKL FFT inside OpenMP loop (MKL 2018)</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-inside-OpenMP-loop-MKL-2018/m-p/1159634#M27866</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;I have an openmp loop&lt;/P&gt;

&lt;P&gt;#pragma openmp parallel for&lt;/P&gt;

&lt;P&gt;for (int i=0;i&amp;lt;n;i++){&lt;/P&gt;

&lt;P&gt;// routine that calls MKL FFT&lt;/P&gt;

&lt;P&gt;}&lt;/P&gt;

&lt;P&gt;The thread performance is pretty abysmal, on an 8 core machine, showing just over 1 core being used.&lt;/P&gt;

&lt;P&gt;What is surprising &amp;nbsp;is that Intel Amplifier shows that the time is spent in&amp;nbsp;DftiCommitDescriptor, not the actual computation.&lt;/P&gt;

&lt;P&gt;Function / Call Stack&amp;nbsp;&amp;nbsp; &amp;nbsp;CPU Time&amp;nbsp;&amp;nbsp; &amp;nbsp;Module&amp;nbsp;&amp;nbsp; &amp;nbsp;Function (Full)&amp;nbsp;&amp;nbsp; &amp;nbsp;Source File&amp;nbsp;&amp;nbsp; &amp;nbsp;Start Address&lt;BR /&gt;
	DftiCommitDescriptor&amp;nbsp;&amp;nbsp; &amp;nbsp;83.7%&amp;nbsp;&amp;nbsp; &amp;nbsp;mkl_rt.dll&amp;nbsp;&amp;nbsp; &amp;nbsp;DftiCommitDescriptor&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;0x180a45b68&lt;/P&gt;

&lt;P&gt;.....&lt;BR /&gt;
	DftiComputeForward&amp;nbsp;&amp;nbsp; &amp;nbsp;0.5%&amp;nbsp;&amp;nbsp; &amp;nbsp;mkl_rt.dll&amp;nbsp;&amp;nbsp; &amp;nbsp;DftiComputeForward&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;0x180a45f10&lt;/P&gt;

&lt;P&gt;Any suggested best practices here. typically the FFT function will be called with the same data length, say ,10K-20K..&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 25 Mar 2018 20:27:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-inside-OpenMP-loop-MKL-2018/m-p/1159634#M27866</guid>
      <dc:creator>AndrewC</dc:creator>
      <dc:date>2018-03-25T20:27:04Z</dc:date>
    </item>
    <item>
      <title>Hi vasci_ </title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-inside-OpenMP-loop-MKL-2018/m-p/1159635#M27867</link>
      <description>&lt;P&gt;Hi vasci_&amp;nbsp;&lt;/P&gt;

&lt;P&gt;How do you link mkl and the FFT is 1D or 2D?&amp;nbsp; If it is intel compiler and openmp, the code in parallel loop&amp;nbsp;is&amp;nbsp;supposed be&amp;nbsp;run in serial.&lt;/P&gt;

&lt;P&gt;According to "typically the FFT function will be called with the same data length​",&amp;nbsp;You may &amp;nbsp;try put the DftiCommitDescriptor​ out of the openmp for loop and see if there any improvements.&lt;BR /&gt;
	​or if needed, please submit one reproduce case to&amp;nbsp;&amp;nbsp;Online service center&amp;nbsp;https://supporttickets.intel.com/?lang=en-US?&lt;/P&gt;

&lt;P&gt;Moreover, MKL user guides have several&amp;nbsp;using&amp;nbsp; FFT in openmp parallel sample code for your reference:&lt;BR /&gt;
	&lt;A href="https://software.intel.com/en-us/mkl-developer-reference-c-examples-of-using-openmp-threading-for-fft-computation" target="_blank"&gt;https://software.intel.com/en-us/mkl-developer-reference-c-examples-of-using-openmp-threading-for-fft-computation&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Best Regards,&lt;/P&gt;

&lt;P&gt;​Ying&lt;/P&gt;</description>
      <pubDate>Mon, 26 Mar 2018 06:31:03 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-inside-OpenMP-loop-MKL-2018/m-p/1159635#M27867</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2018-03-26T06:31:03Z</dc:date>
    </item>
    <item>
      <title>This a 1D computation. And</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-inside-OpenMP-loop-MKL-2018/m-p/1159636#M27868</link>
      <description>&lt;P&gt;This a 1D computation. And after changing the code to serial,&amp;nbsp;&lt;SPAN style="font-size: 12px;"&gt;DftiCommitDescriptor was still the bottleneck. Clearly moving the&amp;nbsp;DftiCommitDescriptor outside of the loop&amp;nbsp;would help - it is just a&amp;nbsp;surprising result that DftiCommitDescriptor is so 'expensive'&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 26 Mar 2018 14:00:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-inside-OpenMP-loop-MKL-2018/m-p/1159636#M27868</guid>
      <dc:creator>AndrewC</dc:creator>
      <dc:date>2018-03-26T14:00:30Z</dc:date>
    </item>
    <item>
      <title>Related to this I have found</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-inside-OpenMP-loop-MKL-2018/m-p/1159637#M27869</link>
      <description>&lt;P&gt;Related to this I have found that after updating to MKL 2018 Update 2 and when a 1-D FFT is being called in a OpenMP parallel for loop I am getting a memory access exception.&lt;/P&gt;

&lt;P&gt;The crash is deep inside mkl_avx.dll.&lt;/P&gt;

&lt;P&gt;Removing the openmp directives stops the issue.&lt;/P&gt;</description>
      <pubDate>Fri, 13 Apr 2018 20:13:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-inside-OpenMP-loop-MKL-2018/m-p/1159637#M27869</guid>
      <dc:creator>AndrewC</dc:creator>
      <dc:date>2018-04-13T20:13:00Z</dc:date>
    </item>
    <item>
      <title>Following on my previous post</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-inside-OpenMP-loop-MKL-2018/m-p/1159638#M27870</link>
      <description>&lt;P&gt;Following on my previous post. This is a typical crash occurring in Update 2 but not Update 1.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Basically I have to remove all FFT calls within OpenMP parallel regions to avoid these crashes.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;CS = 0033 &amp;nbsp; FS = 0053 &amp;nbsp; GS = 002b&lt;/P&gt;

&lt;P&gt;Stack Trace (from fault):&lt;BR /&gt;
	[ &amp;nbsp;0] 0x000007fed1e21b2a &amp;nbsp; mkl_avx.dll+09181994 mkl_dft_avx_dft_zdscal+00000842&lt;BR /&gt;
	[ &amp;nbsp;1] 0x000007fed1fbcd9f &amp;nbsp; mkl_avx.dll+10866079 mkl_sparse_d_csr_ctd_sv_ker_i8_avx+00578415&lt;BR /&gt;
	[ &amp;nbsp;2] 0x000007fed1e234c8 &amp;nbsp; mkl_avx.dll+09188552 mkl_dft_avx_dfti_create_node+00000488&lt;BR /&gt;
	[ &amp;nbsp;3] 0x000007fed1e23af9 &amp;nbsp; mkl_avx.dll+09190137 mkl_dft_avx_dfti_create_sr1d+00000073&lt;BR /&gt;
	[ &amp;nbsp;4] 0x000007fee03d75d2 &amp;nbsp; &amp;nbsp;mkl_rt.dll+10909138 fftwf_sprint_plan+00001134&lt;BR /&gt;
	[ &amp;nbsp;5] 0x000007fee03bfe9a &amp;nbsp; &amp;nbsp;mkl_rt.dll+10813082 DftiCreateDescriptor_s_1d+00000366&lt;BR /&gt;
	....&lt;BR /&gt;
	[ &amp;nbsp;8] 0x000007fee5330ecc libiomp5md.dll+00593612 _kmp_invoke_microtask+00000140&lt;BR /&gt;
	[ &amp;nbsp;9] 0x000007fee52fc37d libiomp5md.dll+00377725 _kmp_acquire_nested_drdpa_lock+00037421&lt;BR /&gt;
	[ 10] 0x000007fee52fb494 libiomp5md.dll+00373908 _kmp_acquire_nested_drdpa_lock+00033604&lt;BR /&gt;
	[ 11] 0x000007fee5332e87 libiomp5md.dll+00601735 _kmp_launch_worker+00000407&lt;BR /&gt;
	[ 12] 0x00000000773859cd &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; C:\Windows\system32\kernel32.dll+00088525 BaseThreadInitThunk+00000013&lt;BR /&gt;
	[ 13] 0x00000000775ba561 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;C:\Windows\SYSTEM32\ntdll.dll+00173409 RtlUserThreadStart+00000033&lt;/P&gt;</description>
      <pubDate>Mon, 23 Apr 2018 02:38:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-inside-OpenMP-loop-MKL-2018/m-p/1159638#M27870</guid>
      <dc:creator>AndrewC</dc:creator>
      <dc:date>2018-04-23T02:38:59Z</dc:date>
    </item>
  </channel>
</rss>

