<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic C++ MKL BLAS wrappers vs expression templates  in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/C-MKL-BLAS-wrappers-vs-expression-templates/m-p/1031855#M20169</link>
    <description>&lt;P&gt;This is a conceptual question:&lt;/P&gt;

&lt;P&gt;Expression templates are a popular technique in C++ in order to implement Matrix and Array operations by avoiding unnecessary temporaries and loop unrolling. In other words using expression templates, an expression such as D = A+B+C, where D, A, B &amp;amp; C are matrices will not incur the temporaries usually resulting in a naive C++ implementation. How does this compare in performance terms by using C++ wrappers around the MKL BLAS routines. In other words will a naive implementation of a Matrix/Array class wrapping the optimized BLAS routines perform at least as well as an implementation using expression templates?&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;I&amp;nbsp;&lt;/SPAN&gt;realise&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;&amp;nbsp;this question is quite general in essence, but would be quite grateful if someone could provide me some hints on this.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Thanks!&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 23 Dec 2014 15:41:13 GMT</pubDate>
    <dc:creator>Anwar_Ludin</dc:creator>
    <dc:date>2014-12-23T15:41:13Z</dc:date>
    <item>
      <title>C++ MKL BLAS wrappers vs expression templates</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/C-MKL-BLAS-wrappers-vs-expression-templates/m-p/1031855#M20169</link>
      <description>&lt;P&gt;This is a conceptual question:&lt;/P&gt;

&lt;P&gt;Expression templates are a popular technique in C++ in order to implement Matrix and Array operations by avoiding unnecessary temporaries and loop unrolling. In other words using expression templates, an expression such as D = A+B+C, where D, A, B &amp;amp; C are matrices will not incur the temporaries usually resulting in a naive C++ implementation. How does this compare in performance terms by using C++ wrappers around the MKL BLAS routines. In other words will a naive implementation of a Matrix/Array class wrapping the optimized BLAS routines perform at least as well as an implementation using expression templates?&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;I&amp;nbsp;&lt;/SPAN&gt;realise&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;&amp;nbsp;this question is quite general in essence, but would be quite grateful if someone could provide me some hints on this.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Thanks!&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 23 Dec 2014 15:41:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/C-MKL-BLAS-wrappers-vs-expression-templates/m-p/1031855#M20169</guid>
      <dc:creator>Anwar_Ludin</dc:creator>
      <dc:date>2014-12-23T15:41:13Z</dc:date>
    </item>
    <item>
      <title>Have you looked at Eigen?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/C-MKL-BLAS-wrappers-vs-expression-templates/m-p/1031856#M20170</link>
      <description>&lt;P&gt;Have you looked at Eigen?&amp;nbsp;http://eigen.tuxfamily.org/index.php?title=Main_Page&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 24 Dec 2014 23:06:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/C-MKL-BLAS-wrappers-vs-expression-templates/m-p/1031856#M20170</guid>
      <dc:creator>Zhang_Z_Intel</dc:creator>
      <dc:date>2014-12-24T23:06:55Z</dc:date>
    </item>
    <item>
      <title>Yes I am aware of Armadillo,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/C-MKL-BLAS-wrappers-vs-expression-templates/m-p/1031857#M20171</link>
      <description>&lt;P&gt;Yes I am aware of Armadillo, Blitz, Eigen and recently Blaze, which all use expression templates in one form or the other to do loop unrolling and avoid temporaries. Eigen has a very spurious benchmark (in my opinion) where it asserts that it has performance similar to Intel MKL using expression templates (http://eigen.tuxfamily.org/index.php?title=Benchmark). After some digging I realized that their benchmark was using a single thread only.&lt;/P&gt;

&lt;P&gt;My original question is: If I write naive OOP wrappers around the MKL routines, will I get performance in par with libraries using expression templates but *not* the MKL routines? Most of the afore-mentioned libraries are very opaque in nature with very little documentation about their internals, which kinda makes it difficult to extend them.&lt;/P&gt;</description>
      <pubDate>Thu, 25 Dec 2014 00:07:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/C-MKL-BLAS-wrappers-vs-expression-templates/m-p/1031857#M20171</guid>
      <dc:creator>Anwar_Ludin</dc:creator>
      <dc:date>2014-12-25T00:07:41Z</dc:date>
    </item>
  </channel>
</rss>

