<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Cache blocking techniques for element-wise math in large arrays - Fortran in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Cache-blocking-techniques-for-element-wise-math-in-large-arrays/m-p/919304#M12843</link>
    <description>&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;
&lt;DIV&gt;I have a CFD code that does a lot of element-wise (A(i,j)*B(i,j)) math with large arrays. roughly 500x500 R*8s, with most sections of the code using a half dozen of these (1.5MB) arrays at a time.&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;To clean up the code, and in a naive hope that IFC would figure out the best way to manage the work, we vectorized most of the code. &lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;VTune still shows a lot of time wasted with various stores, even with higher optimizations.&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;Is there a simple technique or a library that can block these operations to be efficient on a Xeon? Hopefully without rewriting all the code!&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;Thread moved to MKL from Fortran/Windows.&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;Thanks,&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;Art&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Thu, 13 Jan 2005 23:00:53 GMT</pubDate>
    <dc:creator>art-croucher</dc:creator>
    <dc:date>2005-01-13T23:00:53Z</dc:date>
    <item>
      <title>Cache blocking techniques for element-wise math in large arrays - Fortran</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Cache-blocking-techniques-for-element-wise-math-in-large-arrays/m-p/919304#M12843</link>
      <description>&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;
&lt;DIV&gt;I have a CFD code that does a lot of element-wise (A(i,j)*B(i,j)) math with large arrays. roughly 500x500 R*8s, with most sections of the code using a half dozen of these (1.5MB) arrays at a time.&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;To clean up the code, and in a naive hope that IFC would figure out the best way to manage the work, we vectorized most of the code. &lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;VTune still shows a lot of time wasted with various stores, even with higher optimizations.&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;Is there a simple technique or a library that can block these operations to be efficient on a Xeon? Hopefully without rewriting all the code!&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;Thread moved to MKL from Fortran/Windows.&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;Thanks,&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;Art&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 13 Jan 2005 23:00:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Cache-blocking-techniques-for-element-wise-math-in-large-arrays/m-p/919304#M12843</guid>
      <dc:creator>art-croucher</dc:creator>
      <dc:date>2005-01-13T23:00:53Z</dc:date>
    </item>
  </channel>
</rss>

