<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic what are the problem sizes in in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-numbers-MKL-11-0-vs-Eigen/m-p/989984#M17965</link>
    <description>what are the problem sizes in that case?
it might happens for the smal inputs</description>
    <pubDate>Thu, 06 Sep 2012 12:30:16 GMT</pubDate>
    <dc:creator>Gennady_F_Intel</dc:creator>
    <dc:date>2012-09-06T12:30:16Z</dc:date>
    <item>
      <title>performance numbers MKL 11.0 vs Eigen?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-numbers-MKL-11-0-vs-Eigen/m-p/989983#M17964</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;I found the results here a bit surprising specially the MVM one (matrix vector multiplication with and without transposition) ... how come MKL that has even AVX and is heavily optimized gets lower performance than Eigen that only has implemented SSE2? &lt;A href="http://eigen.tuxfamily.org/index.php?title=Benchmark" target="_blank"&gt;http://eigen.tuxfamily.org/index.php?title=Benchmark&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;They also show that the benchmarks correspond to the latest MKL 11.0&lt;/P&gt;
&lt;P&gt;I understand they outperform MKL for "complex expressions" using expression templates, it is clear but how come they still show to outperform MKL in MVM primitives???&lt;/P&gt;
&lt;P&gt;Thanks in advance,&lt;/P&gt;
&lt;P&gt;Best regards,&lt;/P&gt;
&lt;P&gt;Giovanni&lt;/P&gt;</description>
      <pubDate>Wed, 05 Sep 2012 22:33:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-numbers-MKL-11-0-vs-Eigen/m-p/989983#M17964</guid>
      <dc:creator>Azua_Garcia__Giovann</dc:creator>
      <dc:date>2012-09-05T22:33:24Z</dc:date>
    </item>
    <item>
      <title>what are the problem sizes in</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-numbers-MKL-11-0-vs-Eigen/m-p/989984#M17965</link>
      <description>what are the problem sizes in that case?
it might happens for the smal inputs</description>
      <pubDate>Thu, 06 Sep 2012 12:30:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-numbers-MKL-11-0-vs-Eigen/m-p/989984#M17965</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2012-09-06T12:30:16Z</dc:date>
    </item>
    <item>
      <title>Indeed, the sizes at the MV</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-numbers-MKL-11-0-vs-Eigen/m-p/989985#M17966</link>
      <description>Indeed, the sizes at the MV chart are 100-1000 that's very small and quite unusual for HPC. As you can see, there's a significant drop near 1000 that means the task doesn't fit into last level cache anymore. Frankly speaking, it makes sence to assess memory limited MV operation starting nearly from this point (but not finishing measurements there). And another unclear aspect of all those charts is using only 1 threads on the machine w/ 4 cores. I can only guess that the reason is that the majority of Eigen operations are not threaded.

Considering only 1-thread MV performance on such small sizes - yes, it might be that Eigen is faster than all other libraries for this particular case. But this is due to all the libraries has additional  overhead associated with calling stack and, probably, because this case has the lowest priority for real tasks.

BTW, Eigen provides an easy way to use Intel(R) MKL as a backend:
&lt;A href="http://eigen.tuxfamily.org/dox-devel/TopicUsingIntelMKL.html" target="_blank"&gt;http://eigen.tuxfamily.org/dox-devel/TopicUsingIntelMKL.html&lt;/A&gt;</description>
      <pubDate>Fri, 07 Sep 2012 05:38:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-numbers-MKL-11-0-vs-Eigen/m-p/989985#M17966</guid>
      <dc:creator>Konstantin_A_Intel</dc:creator>
      <dc:date>2012-09-07T05:38:14Z</dc:date>
    </item>
    <item>
      <title>With respect to AVX -p lease</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-numbers-MKL-11-0-vs-Eigen/m-p/989986#M17967</link>
      <description>With respect to AVX - please notice that Intel(R) Core(TM)2 Quad CPU Q9400 used in measurements doesn't support AVX yet.</description>
      <pubDate>Fri, 07 Sep 2012 05:45:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-numbers-MKL-11-0-vs-Eigen/m-p/989986#M17967</guid>
      <dc:creator>Konstantin_A_Intel</dc:creator>
      <dc:date>2012-09-07T05:45:46Z</dc:date>
    </item>
    <item>
      <title>Indeed, this benchmark is</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-numbers-MKL-11-0-vs-Eigen/m-p/989987#M17968</link>
      <description>Indeed, this benchmark is quite old and was performed on a CPU with no AVX support. Activating multi-threading for a matrix-vector operation makes little since most of the time the application is paralelized at a higher level (e.g., matrix factorization). The benchmark goes to matrix sizes of 3000 (not 1000). For larger matrices, all libraries perform poorly since caching strategies cannot be used for level2 operations. The good performance of Eigen here is mainly due to a clever trick to completely avoid unaligned memory access in all situations: we form one unaligned packet from two aligned loads. More details in the code!</description>
      <pubDate>Fri, 07 Sep 2012 15:47:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/performance-numbers-MKL-11-0-vs-Eigen/m-p/989987#M17968</guid>
      <dc:creator>Gael_G_</dc:creator>
      <dc:date>2012-09-07T15:47:34Z</dc:date>
    </item>
  </channel>
</rss>

