<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic I am 100% certain this had in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Is-mkl-speed-dependend-upon-how-contiguous-memory-is/m-p/1148053#M26883</link>
    <description>&lt;P&gt;I am 100% certain this had nothing to do with other processes (there were none). Very reproducibly, "sync ; echo 3 &amp;gt; /proc/sys/vm/drop_caches" improved the speed by about a factor of 1.5.&lt;/P&gt;&lt;P&gt;N.B., the code already has a number of timers in it.&lt;/P&gt;</description>
    <pubDate>Thu, 15 Nov 2018 16:14:39 GMT</pubDate>
    <dc:creator>L__D__Marks</dc:creator>
    <dc:date>2018-11-15T16:14:39Z</dc:date>
    <item>
      <title>Is mkl speed dependend upon how contiguous memory is?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Is-mkl-speed-dependend-upon-how-contiguous-memory-is/m-p/1148049#M26879</link>
      <description>Does the speed of the mkl blas/lapack library routines change significantly when one has contiguous memory versus not?

I have a strange problem that looks like a "Memory Cache Leak" (not a memory leak) leading to a slow down of a program.

Let me set the stage first. Reproducibly (using ganglia to monitor), on a cluster I have noticed that the cached memory is increasing, relatively slowly. When it becomes large, something like 2/3 of the total memory (Intel Gold with 32 cores &amp;amp; 192Gb) a program is running slower by about a factor of ~1.5. If I clear the cache and sync the disc (I have not tested which matter) with "sync ; echo 3 &amp;gt; /proc/sys/vm/drop_caches" the speed of the program increases back (~1.5 times faster).

The issue seems to be associated with I/O -- the relevant code uses mpi and only the core that is doing any I/O shows the cache leak. The program is doing a fair amount of I/O, but not massive amounts (10-40 Mb). I compile using ifort with -assume buffered_io. My suspicion is that may leave some cached files at the end, effectively a "cache leak".

The program uses a large number of blas/lapack calls. It is reasonable that the memory is less contiguous when the cached memory is large -- fragmented RAM. Can this lead to a speed change of the blas/lapack routines?</description>
      <pubDate>Wed, 24 Oct 2018 18:44:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Is-mkl-speed-dependend-upon-how-contiguous-memory-is/m-p/1148049#M26879</guid>
      <dc:creator>L__D__Marks</dc:creator>
      <dc:date>2018-10-24T18:44:36Z</dc:date>
    </item>
    <item>
      <title>Hello,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Is-mkl-speed-dependend-upon-how-contiguous-memory-is/m-p/1148050#M26880</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;

&lt;P&gt;Thanks for your question. I will investigate it and get back to you soon.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Thanks,&lt;/P&gt;

&lt;P&gt;Alice&lt;/P&gt;</description>
      <pubDate>Thu, 25 Oct 2018 04:17:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Is-mkl-speed-dependend-upon-how-contiguous-memory-is/m-p/1148050#M26880</guid>
      <dc:creator>Alice_H_Intel</dc:creator>
      <dc:date>2018-10-25T04:17:43Z</dc:date>
    </item>
    <item>
      <title>Did you find out anything?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Is-mkl-speed-dependend-upon-how-contiguous-memory-is/m-p/1148051#M26881</link>
      <description>&lt;P&gt;Did you find out anything?&lt;/P&gt;</description>
      <pubDate>Thu, 15 Nov 2018 15:45:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Is-mkl-speed-dependend-upon-how-contiguous-memory-is/m-p/1148051#M26881</guid>
      <dc:creator>L__D__Marks</dc:creator>
      <dc:date>2018-11-15T15:45:45Z</dc:date>
    </item>
    <item>
      <title>exporting MKL_VERBOSE=1 will</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Is-mkl-speed-dependend-upon-how-contiguous-memory-is/m-p/1148052#M26882</link>
      <description>&lt;P&gt;exporting MKL_VERBOSE=1 will you see changing the lapack/blas execution time? With the same routines and the same input problem sizes.&amp;nbsp;Are you sure that there is no&amp;nbsp;third party process running at the same time?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 15 Nov 2018 16:08:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Is-mkl-speed-dependend-upon-how-contiguous-memory-is/m-p/1148052#M26882</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2018-11-15T16:08:14Z</dc:date>
    </item>
    <item>
      <title>I am 100% certain this had</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Is-mkl-speed-dependend-upon-how-contiguous-memory-is/m-p/1148053#M26883</link>
      <description>&lt;P&gt;I am 100% certain this had nothing to do with other processes (there were none). Very reproducibly, "sync ; echo 3 &amp;gt; /proc/sys/vm/drop_caches" improved the speed by about a factor of 1.5.&lt;/P&gt;&lt;P&gt;N.B., the code already has a number of timers in it.&lt;/P&gt;</description>
      <pubDate>Thu, 15 Nov 2018 16:14:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Is-mkl-speed-dependend-upon-how-contiguous-memory-is/m-p/1148053#M26883</guid>
      <dc:creator>L__D__Marks</dc:creator>
      <dc:date>2018-11-15T16:14:39Z</dc:date>
    </item>
  </channel>
</rss>

