<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Performance of MKL BLAS routines vs self compiled BLAS in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-MKL-BLAS-routines-vs-self-compiled-BLAS/m-p/1097246#M23638</link>
    <description>&lt;P&gt;Hi&lt;/P&gt;

&lt;P&gt;I am using BLAS with my software, especially various GEMM &amp;amp; GEMV routines.&lt;/P&gt;

&lt;P&gt;I have used Intel vTune to profile my software, and found out that using my own BLAS library (compiled with Intel Fortran Compiler) I get better performance (run-time) than using Intel MKL by 5-10%.&lt;/P&gt;

&lt;P&gt;Does it make sense? Is it possible that taking BLAS sources from&amp;nbsp;www.netlib.org/blas/ and compiling them myself will result in better optimized library than Intel MKL?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Regards,&lt;/P&gt;

&lt;P&gt;Morag Agmon (Intel)&lt;/P&gt;</description>
    <pubDate>Thu, 21 Jan 2016 08:49:39 GMT</pubDate>
    <dc:creator>Morag_A_Intel</dc:creator>
    <dc:date>2016-01-21T08:49:39Z</dc:date>
    <item>
      <title>Performance of MKL BLAS routines vs self compiled BLAS</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-MKL-BLAS-routines-vs-self-compiled-BLAS/m-p/1097246#M23638</link>
      <description>&lt;P&gt;Hi&lt;/P&gt;

&lt;P&gt;I am using BLAS with my software, especially various GEMM &amp;amp; GEMV routines.&lt;/P&gt;

&lt;P&gt;I have used Intel vTune to profile my software, and found out that using my own BLAS library (compiled with Intel Fortran Compiler) I get better performance (run-time) than using Intel MKL by 5-10%.&lt;/P&gt;

&lt;P&gt;Does it make sense? Is it possible that taking BLAS sources from&amp;nbsp;www.netlib.org/blas/ and compiling them myself will result in better optimized library than Intel MKL?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Regards,&lt;/P&gt;

&lt;P&gt;Morag Agmon (Intel)&lt;/P&gt;</description>
      <pubDate>Thu, 21 Jan 2016 08:49:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-MKL-BLAS-routines-vs-self-compiled-BLAS/m-p/1097246#M23638</guid>
      <dc:creator>Morag_A_Intel</dc:creator>
      <dc:date>2016-01-21T08:49:39Z</dc:date>
    </item>
    <item>
      <title>Morag,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-MKL-BLAS-routines-vs-self-compiled-BLAS/m-p/1097247#M23639</link>
      <description>&lt;P&gt;Morag,&lt;/P&gt;

&lt;P&gt;that's not expected from our side. Where do you see 5-10% of MKL's performance gap? &amp;nbsp;is that ?gemm routine? what is the problem size?&lt;/P&gt;

&lt;P&gt;why do you use VTune ( did you use hotspot analys?) instead of directly measure execution time of these routines? &amp;nbsp;What is CPU type you are running on?&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 21 Jan 2016 14:54:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-MKL-BLAS-routines-vs-self-compiled-BLAS/m-p/1097247#M23639</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2016-01-21T14:54:14Z</dc:date>
    </item>
  </channel>
</rss>

