<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic 3D FFT in MKL with data larger than cache in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/3D-FFT-in-MKL-with-data-larger-than-cache/m-p/1031098#M20149</link>
    <description>&lt;P&gt;Hi,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;I am working on a 3D numerical integrator for a non-linear PDE using the parallel FFT library included in MKL.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;My arrays consist of 2^30 data points which is much much larger than the cache. This results in ~50% of cache references being misses leading to a massive amount of execution time being purely accessing memory.&lt;/P&gt;

&lt;P&gt;Is there a clever way I can deal with this? Is it expected to have 50% cache misses using an array this large?&lt;/P&gt;

&lt;P&gt;Any help would be much appreciated.&lt;/P&gt;

&lt;P&gt;Thanks,&lt;/P&gt;

&lt;P&gt;Dylan&lt;/P&gt;</description>
    <pubDate>Mon, 08 Jun 2015 21:35:45 GMT</pubDate>
    <dc:creator>Dylan_B_</dc:creator>
    <dc:date>2015-06-08T21:35:45Z</dc:date>
    <item>
      <title>3D FFT in MKL with data larger than cache</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/3D-FFT-in-MKL-with-data-larger-than-cache/m-p/1031098#M20149</link>
      <description>&lt;P&gt;Hi,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;I am working on a 3D numerical integrator for a non-linear PDE using the parallel FFT library included in MKL.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;My arrays consist of 2^30 data points which is much much larger than the cache. This results in ~50% of cache references being misses leading to a massive amount of execution time being purely accessing memory.&lt;/P&gt;

&lt;P&gt;Is there a clever way I can deal with this? Is it expected to have 50% cache misses using an array this large?&lt;/P&gt;

&lt;P&gt;Any help would be much appreciated.&lt;/P&gt;

&lt;P&gt;Thanks,&lt;/P&gt;

&lt;P&gt;Dylan&lt;/P&gt;</description>
      <pubDate>Mon, 08 Jun 2015 21:35:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/3D-FFT-in-MKL-with-data-larger-than-cache/m-p/1031098#M20149</guid>
      <dc:creator>Dylan_B_</dc:creator>
      <dc:date>2015-06-08T21:35:45Z</dc:date>
    </item>
    <item>
      <title>Hi Dilan B.,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/3D-FFT-in-MKL-with-data-larger-than-cache/m-p/1031099#M20150</link>
      <description>&lt;P&gt;Hi Dilan B.,&lt;/P&gt;

&lt;P&gt;Cache-miss rate of 50% is OK for large out-of-place FFTs. Did you try in-place 3D transforms?&lt;/P&gt;

&lt;P&gt;For most data points of large 3D transforms, the miss-hit pattern is MHHHMH for in-place transforms and MMHHMH for out-of-place transforms -- 33% and 50% cache-miss rate. Though real figures may be higher,&amp;nbsp;switching to in-lpace transforms may improve performance.&lt;/P&gt;

&lt;P&gt;Evgueni.&lt;/P&gt;</description>
      <pubDate>Tue, 09 Jun 2015 04:28:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/3D-FFT-in-MKL-with-data-larger-than-cache/m-p/1031099#M20150</guid>
      <dc:creator>Evgueni_P_Intel</dc:creator>
      <dc:date>2015-06-09T04:28:06Z</dc:date>
    </item>
    <item>
      <title>Hi Evgueni,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/3D-FFT-in-MKL-with-data-larger-than-cache/m-p/1031100#M20151</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;Evgueni,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Thanks for the prompt reply.&lt;/P&gt;

&lt;P&gt;I tried using in-place transforms and it improved the cache miss rate by approximately 5% compared to out-of-place transforms. I am still finding my performance underwhelming compared to a solver using FFTW3 I had written in the past and I am completely stumped on how or if I can further increase performance&lt;/P&gt;

&lt;P&gt;I have also noticed that certain runs can have a cache miss rate of as high as 65% with no changing of parameters in my source code.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Thanks for your help,&lt;/P&gt;

&lt;P&gt;Dylan&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 09 Jun 2015 05:02:03 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/3D-FFT-in-MKL-with-data-larger-than-cache/m-p/1031100#M20151</guid>
      <dc:creator>Dylan_B_</dc:creator>
      <dc:date>2015-06-09T05:02:03Z</dc:date>
    </item>
    <item>
      <title>FFT performance may depend on</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/3D-FFT-in-MKL-with-data-larger-than-cache/m-p/1031101#M20152</link>
      <description>&lt;P&gt;FFT performance may depend on the layout of the dataset in the memory, threading runtime settings, etc.&lt;/P&gt;

&lt;P&gt;To speedup investigation, please post a reproducer here or send it privately.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 09 Jun 2015 05:12:49 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/3D-FFT-in-MKL-with-data-larger-than-cache/m-p/1031101#M20152</guid>
      <dc:creator>Evgueni_P_Intel</dc:creator>
      <dc:date>2015-06-09T05:12:49Z</dc:date>
    </item>
  </channel>
</rss>

