<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic I just read the the article in Intel® Integrated Performance Primitives</title>
    <link>https://community.intel.com/t5/Intel-Integrated-Performance/FFT-Open-MP/m-p/1131981#M25803</link>
    <description>&lt;P&gt;I just read the the article and I see how it is done, I will test it.&lt;/P&gt;&lt;P&gt;Thank you,&lt;/P&gt;&lt;P&gt;Adam Simek&lt;/P&gt;</description>
    <pubDate>Sat, 29 Feb 2020 17:10:38 GMT</pubDate>
    <dc:creator>simek__adam</dc:creator>
    <dc:date>2020-02-29T17:10:38Z</dc:date>
    <item>
      <title>FFT Open MP</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/FFT-Open-MP/m-p/1131974#M25796</link>
      <description>Greetings,

I have problems with running IPP 2019 FFT with OpenMP internal parallelisation, I have already tried approach: &lt;A href="https://software.intel.com/en-us/ipp-dev-guide-using-intel-integrated-performance-primitives-threading-layer-tl-functions" target="_blank"&gt;https://software.intel.com/en-us/ipp-dev-guide-using-intel-integrated-performance-primitives-threading-layer-tl-functions&lt;/A&gt;, but I can't find configuration of libraries to link to make this work.

Using ippsFFTFwd_CToC_64fc and trying to set ippSetNumThreads to 2. (I noticed on forums for internal paralelisation u cannot use more than 2 is this true ?)

For compiling i'm using gcc or icc linking includes with _tl suffix and lib &lt;IPP directory=""&gt;/lib/intel64/tl/openmp/ with _tl suffix.
I noticed you have to mix in some non _tl files still but cannot make it work, could you please provide list of files from include and lib to use to get fft working with openmp ?

My env:
Ubuntu 16.04 LTS
Gcc 6+
Icc from Parallel Studio XE 2020

Also adding small sample code (c++11) I use, I check parallelism with vtune-gui.

Thank you for response,

Adam Simek&lt;/IPP&gt;</description>
      <pubDate>Wed, 26 Feb 2020 00:09:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/FFT-Open-MP/m-p/1131974#M25796</guid>
      <dc:creator>simek__adam</dc:creator>
      <dc:date>2020-02-26T00:09:13Z</dc:date>
    </item>
    <item>
      <title>that's true: this function is</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/FFT-Open-MP/m-p/1131975#M25797</link>
      <description>&lt;P&gt;that's true: this function is not threaded internally and this function is not part of the threading layer (aka TL) yet.&lt;/P&gt;</description>
      <pubDate>Sat, 29 Feb 2020 05:29:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/FFT-Open-MP/m-p/1131975#M25797</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2020-02-29T05:29:19Z</dc:date>
    </item>
    <item>
      <title>Quote:Gennady F. (Blackbelt)</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/FFT-Open-MP/m-p/1131976#M25798</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Gennady F. (Blackbelt) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;that's true: this function is not threaded internally and this function is not part of the threading layer (aka TL) yet.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you for reply, so in case of computations multiple larger FFTs (2^20 - 2^24), where external parallelisation would most likely be slowed down by cache memory limits it is better to use Intel MKL or FFTW3 (I assume MKL uses FFTW or am I wrong?).&lt;/P&gt;</description>
      <pubDate>Sat, 29 Feb 2020 14:14:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/FFT-Open-MP/m-p/1131976#M25798</guid>
      <dc:creator>simek__adam</dc:creator>
      <dc:date>2020-02-29T14:14:21Z</dc:date>
    </item>
    <item>
      <title>It is absurd that IPP doesn't</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/FFT-Open-MP/m-p/1131977#M25799</link>
      <description>&lt;P&gt;It is absurd that IPP doesn't have an internally threaded FFT. Here is how to make it (see the source code of&amp;nbsp;&lt;A href="https://github.com/nickoneill/MatrixFFT"&gt;https://github.com/nickoneill/MatrixFFT&lt;/A&gt;)&lt;/P&gt;&lt;P&gt;1. Do all the 1D row FFTs &lt;STRONG&gt;threaded&lt;/STRONG&gt;. For optimal speed, use a vectorized 1D FFT, such as the one in &lt;A href="https://developer.apple.com/documentation/accelerate/vdsp?language=objc"&gt;vDSP&amp;nbsp;https://developer.apple.com/documentation/accelerate/vdsp?language=objc&lt;/A&gt;&amp;nbsp;or MKL&amp;nbsp;&lt;A href="https://software.intel.com/en-us/mkl"&gt;https://software.intel.com/en-us/mkl&lt;/A&gt;&lt;/P&gt;&lt;P&gt;2. Call IPP to transpose the entire image&lt;/P&gt;&lt;P&gt;3. Do all the 1D column( now row)&amp;nbsp;FFTs &lt;STRONG&gt;threaded&lt;/STRONG&gt;&amp;nbsp;again.&lt;/P&gt;&lt;P&gt;4. Call IPP to transpose the entire image (or do further work on the result image as if it were&amp;nbsp;transposed).&lt;/P&gt;&lt;P&gt;This way, the 2D FFT is threaded and not memory-bound.&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Adriaan van Os&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 29 Feb 2020 15:40:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/FFT-Open-MP/m-p/1131977#M25799</guid>
      <dc:creator>Adriaan_van_Os</dc:creator>
      <dc:date>2020-02-29T15:40:13Z</dc:date>
    </item>
    <item>
      <title>Quote:Adriaan van Os wrote:</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/FFT-Open-MP/m-p/1131978#M25800</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Adriaan van Os wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It is absurd that IPP doesn't have an internally threaded FFT. Here is how to make it (see the source code of&amp;nbsp;&lt;A href="https://github.com/nickoneill/MatrixFFT" rel="nofollow"&gt;https://github.com/nickoneill/MatrixFFT&lt;/A&gt;)&lt;/P&gt;&lt;P&gt;1. Do all the 1D row FFTs &lt;STRONG&gt;threaded&lt;/STRONG&gt;. For optimal speed, use a vectorized 1D FFT, such as the one in &lt;A href="https://developer.apple.com/documentation/accelerate/vdsp?language=objc" rel="nofollow"&gt;vDSP&amp;nbsp;https://developer.apple.com/documentation/accelerate/vdsp?language=objc&lt;/A&gt;&amp;nbsp;or MKL&amp;nbsp;&lt;A href="https://software.intel.com/en-us/mkl"&gt;https://software.intel.com/en-us/mkl&lt;/A&gt;&lt;/P&gt;&lt;P&gt;2. Call IPP to transpose the entire image&lt;/P&gt;&lt;P&gt;3. Do all the 1D column( now row)&amp;nbsp;FFTs &lt;STRONG&gt;threaded&lt;/STRONG&gt;&amp;nbsp;again.&lt;/P&gt;&lt;P&gt;4. Call IPP to transpose the entire image (or do further work on the result image as if it were&amp;nbsp;transposed).&lt;/P&gt;&lt;P&gt;This way, the 2D FFT is threaded and not memory-bound.&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Adriaan van Os&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you for answer, I am actually writing paper for internally threaded FFT so I am looking for some comparison material of 1D threaded FFTs, which one do you think is faster MKL or vDSP ?&lt;/P&gt;</description>
      <pubDate>Sat, 29 Feb 2020 16:27:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/FFT-Open-MP/m-p/1131978#M25800</guid>
      <dc:creator>simek__adam</dc:creator>
      <dc:date>2020-02-29T16:27:14Z</dc:date>
    </item>
    <item>
      <title>I haven't tried the MKL 1D</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/FFT-Open-MP/m-p/1131979#M25801</link>
      <description>&lt;P&gt;I haven't tried the MKL 1D FFT so far. The vDSP 1D FFT is not internally&amp;nbsp;threaded and that is what we need here, because the most efficient threading is per row here.&lt;/P&gt;&lt;P&gt;The following paper is quite interesting&amp;nbsp;&lt;A href="https://github.com/nickoneill/MatrixFFT/raw/master/FFTapps.pdf"&gt;https://github.com/nickoneill/MatrixFFT/raw/master/FFTapps.pdf&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Adriaan van Os&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 29 Feb 2020 16:36:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/FFT-Open-MP/m-p/1131979#M25801</guid>
      <dc:creator>Adriaan_van_Os</dc:creator>
      <dc:date>2020-02-29T16:36:18Z</dc:date>
    </item>
    <item>
      <title>because the most efficient</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/FFT-Open-MP/m-p/1131980#M25802</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;P&gt;because the most efficient threading is per row here.&amp;nbsp;W&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;Clarification: I mean subdividing the rows into a chunk of rows for each thread&amp;nbsp;to chew on. In&amp;nbsp;general, that is faster than interleaving rows.&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Adriaan van Os&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 29 Feb 2020 16:57:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/FFT-Open-MP/m-p/1131980#M25802</guid>
      <dc:creator>Adriaan_van_Os</dc:creator>
      <dc:date>2020-02-29T16:57:06Z</dc:date>
    </item>
    <item>
      <title>I just read the the article</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/FFT-Open-MP/m-p/1131981#M25803</link>
      <description>&lt;P&gt;I just read the the article and I see how it is done, I will test it.&lt;/P&gt;&lt;P&gt;Thank you,&lt;/P&gt;&lt;P&gt;Adam Simek&lt;/P&gt;</description>
      <pubDate>Sat, 29 Feb 2020 17:10:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/FFT-Open-MP/m-p/1131981#M25803</guid>
      <dc:creator>simek__adam</dc:creator>
      <dc:date>2020-02-29T17:10:38Z</dc:date>
    </item>
  </channel>
</rss>

