<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hi Benqiang, in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fortran-intrinsic-functions-v-s-mkl-functions-or-subroutines/m-p/987181#M17823</link>
    <description>&lt;P&gt;Hi Benqiang,&lt;/P&gt;
&lt;P&gt;MKL DGEMM is well optimized for the large problem size. For the matrix size of (100,100), dgemm expect to have a better performance.&amp;nbsp; There is a post discussed here: &lt;A href="http://software.intel.com/en-us/forums/topic/269726"&gt;http://software.intel.com/en-us/forums/topic/269726&lt;/A&gt;&lt;BR /&gt;matmul may be faster in a very small case, but for large problem size, MKL is well optimized and have performance.&lt;/P&gt;
&lt;P&gt;For the VML functions, both MKL and compiler provides vectorized functions and have good performance. In the MKL , it also provide precision control ( by setting VML_HA/VML_LA/VML_EP), so it provide more options to balance the precision and performance.&lt;/P&gt;
&lt;P&gt;For some dot_product function, the code is very simple. The compiler could well optimize the code,so&amp;nbsp; Both the compiler and MKL&amp;nbsp;can have good performance there.&lt;/P&gt;
&lt;P&gt;Thanks,&lt;BR /&gt;Chao&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 02 Sep 2013 02:17:53 GMT</pubDate>
    <dc:creator>Chao_Y_Intel</dc:creator>
    <dc:date>2013-09-02T02:17:53Z</dc:date>
    <item>
      <title>Fortran intrinsic functions v.s. mkl functions or subroutines</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fortran-intrinsic-functions-v-s-mkl-functions-or-subroutines/m-p/987180#M17822</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;
&lt;P&gt;I am wondering &amp;nbsp;which I should &amp;nbsp;use in my code, for example if I do matrix multiplication A(100,100)*B(100,100), matmul(A,B) or gemm()?&lt;/P&gt;
&lt;P&gt;The same uncertainty for other functions, e.g. dot_product, and those VML functions, e.g. exp(A) &amp;nbsp;v.s.&amp;nbsp;vsexp().&lt;/P&gt;
&lt;P&gt;Let's ignore parallelization, because mostly I do these operations for each openmp thread.&lt;/P&gt;
&lt;P&gt;Thank you.&lt;/P&gt;
&lt;P&gt;Benqiang&lt;/P&gt;</description>
      <pubDate>Fri, 30 Aug 2013 22:48:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fortran-intrinsic-functions-v-s-mkl-functions-or-subroutines/m-p/987180#M17822</guid>
      <dc:creator>zhubq</dc:creator>
      <dc:date>2013-08-30T22:48:39Z</dc:date>
    </item>
    <item>
      <title>Hi Benqiang,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fortran-intrinsic-functions-v-s-mkl-functions-or-subroutines/m-p/987181#M17823</link>
      <description>&lt;P&gt;Hi Benqiang,&lt;/P&gt;
&lt;P&gt;MKL DGEMM is well optimized for the large problem size. For the matrix size of (100,100), dgemm expect to have a better performance.&amp;nbsp; There is a post discussed here: &lt;A href="http://software.intel.com/en-us/forums/topic/269726"&gt;http://software.intel.com/en-us/forums/topic/269726&lt;/A&gt;&lt;BR /&gt;matmul may be faster in a very small case, but for large problem size, MKL is well optimized and have performance.&lt;/P&gt;
&lt;P&gt;For the VML functions, both MKL and compiler provides vectorized functions and have good performance. In the MKL , it also provide precision control ( by setting VML_HA/VML_LA/VML_EP), so it provide more options to balance the precision and performance.&lt;/P&gt;
&lt;P&gt;For some dot_product function, the code is very simple. The compiler could well optimize the code,so&amp;nbsp; Both the compiler and MKL&amp;nbsp;can have good performance there.&lt;/P&gt;
&lt;P&gt;Thanks,&lt;BR /&gt;Chao&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 02 Sep 2013 02:17:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fortran-intrinsic-functions-v-s-mkl-functions-or-subroutines/m-p/987181#M17823</guid>
      <dc:creator>Chao_Y_Intel</dc:creator>
      <dc:date>2013-09-02T02:17:53Z</dc:date>
    </item>
    <item>
      <title>Hi everybody,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fortran-intrinsic-functions-v-s-mkl-functions-or-subroutines/m-p/987182#M17824</link>
      <description>Hi everybody,

&amp;gt;&amp;gt;...
&amp;gt;&amp;gt;A triple do-loop takes 13.089 seconds.
&amp;gt;&amp;gt;A matmul(a,b) function takes 33.056 seconds.
&amp;gt;&amp;gt;A DGEMM subroutine takes 1.840 seconds
&amp;gt;&amp;gt;...

I'd like to add that test results at the end of the thread mentioned by Chao are questionable ( very outdated! ) and a classic ( triple do-loop ) &lt;STRONG&gt;can not&lt;/STRONG&gt; outperform Fortran's MATMUL function.

We recently tested several matrix multiplication functions and please take a look at a thread:

Forum Topic: &lt;STRONG&gt;Haswell GFLOPS&lt;/STRONG&gt;
Web-link: &lt;A href="http://software.intel.com/en-us/forums/topic/394248" target="_blank"&gt;http://software.intel.com/en-us/forums/topic/394248&lt;/A&gt;

Note: Page 2 of the thread has the most interesting information with test results for matrix sizes 4Kx4K, 8Kx8K and 16Kx16K.</description>
      <pubDate>Tue, 03 Sep 2013 01:10:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Fortran-intrinsic-functions-v-s-mkl-functions-or-subroutines/m-p/987182#M17824</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-09-03T01:10:55Z</dc:date>
    </item>
  </channel>
</rss>

