<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic The slides above refers to in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-matmul-vs-dgemm-for-small-size-matrices/m-p/1039279#M20621</link>
    <description>&lt;P&gt;The slides above refers to the MKL 11.2 beta release, and the name of this feature (and preprocessor macro) was changed to MKL_DIRECT_CALL (or MKL_DIRECT_CALL_SEQ). I'm sorry for the confusion.&lt;/P&gt;
&lt;P&gt;You can check the KB article here describing the feature: &lt;A href="https://software.intel.com/en-us/articles/improve-intel-mkl-performance-for-small-problems-the-use-of-mkl-direct-call"&gt;https://software.intel.com/en-us/articles/improve-intel-mkl-performance-for-small-problems-the-use-of-mkl-direct-call&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;MKL 11.2 User's guide also has a section on this: &lt;A href="https://software.intel.com/en-us/node/528553"&gt;https://software.intel.com/en-us/node/528553&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;One needs to use MKL 11.2 which is the first MKL release that supports MKL_DIRECT_CALL(_SEQ). This feature skips error checking and some of the intermediate function calls for small matrix operations to enhance their performance. In addition to this feature, MKL 11.2 has some small matrix improvements that should help for the above sizes.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you!&lt;/P&gt;</description>
    <pubDate>Wed, 29 Oct 2014 17:26:00 GMT</pubDate>
    <dc:creator>Murat_G_Intel</dc:creator>
    <dc:date>2014-10-29T17:26:00Z</dc:date>
    <item>
      <title>Performance of matmul vs dgemm for small size matrices</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-matmul-vs-dgemm-for-small-size-matrices/m-p/1039277#M20619</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;my question is regarding improving the performance of following line:&lt;/P&gt;

&lt;P&gt;------------------------&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;MKM = MD*FA1 - MATMUL(MATMUL(MATMUL(ME,MQ),TRANSPOSE(MG)),TRANSPOSE(ME)) + MATMUL(MATMUL(MATMUL(ME,MG),VA),VR)&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;------------------------&lt;/P&gt;

&lt;P&gt;this line is executed for every element within a finite element implementation and is the bottleneck according to performance wizard.&lt;/P&gt;

&lt;P&gt;All the matrices are max 12x12 by size. I have tried using DGEMM in the following way:&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12.8000001907349px; line-height: 15.609601020813px;"&gt;------------------------&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;CALL DGEMM('N', 'N', 12, 3, &amp;nbsp;12, 1.0D0, ME, &amp;nbsp; &amp;nbsp; &amp;nbsp;12, MQ, 12, 0, MDUMMY3, 12)&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;CALL DGEMM('N', 'T', 12, 12, 3, &amp;nbsp;1.0D0, MDUMMY3, 12, MG, 12, 0, MDUMMY4, 12)&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;CALL DGEMM('N', 'T', 12, 12, 12, 1.0D0, MDUMMY4, 12, ME, 12, 0, MDUMMY5, 12)&lt;/P&gt;

&lt;P&gt;CALL DGEMM('N', 'N', 12, 3, &amp;nbsp;12, 1.0D0, ME, &amp;nbsp; &amp;nbsp; &amp;nbsp;12, MG, 12, 0, MDUMMY6, 12)&lt;/P&gt;

&lt;P&gt;CALL DGEMM('N', 'N', 12, 1, &amp;nbsp;3, &amp;nbsp;1.0D0, MDUMMY6, 12, VA, 12, 0, MDUMMY7, 12)&lt;/P&gt;

&lt;P&gt;CALL DGEMM('N', 'N', 12, 12, 1, &amp;nbsp;1.0D0, MDUMMY7, 12, VR, 1, &amp;nbsp;0, MDUMMY8, 12)&lt;BR /&gt;
	&lt;BR /&gt;
	MKM = MD*FA1 - MDUMMY5 + MDUMMY8&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12.8000001907349px; line-height: 15.609601020813px;"&gt;------------------------&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;however it did not provide any improvement (I think it was even a little bit slower).&lt;/P&gt;

&lt;P&gt;I was wondering if you would know if any MKL function or setting would help to speed up this line.&lt;/P&gt;

&lt;P&gt;Thank you very much in advance,&lt;/P&gt;

&lt;P&gt;Murat&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 29 Oct 2014 10:47:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-matmul-vs-dgemm-for-small-size-matrices/m-p/1039277#M20619</guid>
      <dc:creator>e112974</dc:creator>
      <dc:date>2014-10-29T10:47:00Z</dc:date>
    </item>
    <item>
      <title>Check the write-up about MKL</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-matmul-vs-dgemm-for-small-size-matrices/m-p/1039278#M20620</link>
      <description>&lt;P&gt;Check the write-up about MKL_INLINE_SEQ e.g.&lt;/P&gt;

&lt;P&gt;&lt;A href="https://software.intel.com/sites/default/files/managed/8c/ef/Intel-MKL-11.2-beta-webinar--Introducing-new-features.pdf" target="_blank"&gt;https://software.intel.com/sites/default/files/managed/8c/ef/Intel-MKL-11.2-beta-webinar--Introducing-new-features.pdf&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;If you're using the opt-matmul option (set either explicitly or by -O3) it may not be surprising that you get similar results.&amp;nbsp; In the past, I got best matmul results by setting -O3 but turning off opt-matmul when the problem is not large enough to benefit from automatic threading.&amp;nbsp; You might also try setting MKL threads to 1 or linking the MKL sequential library, when using MKL explicitly or via opt-matmul, in case MKL may use too many threads when you don't specify it.&lt;/P&gt;</description>
      <pubDate>Wed, 29 Oct 2014 11:43:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-matmul-vs-dgemm-for-small-size-matrices/m-p/1039278#M20620</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2014-10-29T11:43:28Z</dc:date>
    </item>
    <item>
      <title>The slides above refers to</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-matmul-vs-dgemm-for-small-size-matrices/m-p/1039279#M20621</link>
      <description>&lt;P&gt;The slides above refers to the MKL 11.2 beta release, and the name of this feature (and preprocessor macro) was changed to MKL_DIRECT_CALL (or MKL_DIRECT_CALL_SEQ). I'm sorry for the confusion.&lt;/P&gt;
&lt;P&gt;You can check the KB article here describing the feature: &lt;A href="https://software.intel.com/en-us/articles/improve-intel-mkl-performance-for-small-problems-the-use-of-mkl-direct-call"&gt;https://software.intel.com/en-us/articles/improve-intel-mkl-performance-for-small-problems-the-use-of-mkl-direct-call&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;MKL 11.2 User's guide also has a section on this: &lt;A href="https://software.intel.com/en-us/node/528553"&gt;https://software.intel.com/en-us/node/528553&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;One needs to use MKL 11.2 which is the first MKL release that supports MKL_DIRECT_CALL(_SEQ). This feature skips error checking and some of the intermediate function calls for small matrix operations to enhance their performance. In addition to this feature, MKL 11.2 has some small matrix improvements that should help for the above sizes.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you!&lt;/P&gt;</description>
      <pubDate>Wed, 29 Oct 2014 17:26:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-matmul-vs-dgemm-for-small-size-matrices/m-p/1039279#M20621</guid>
      <dc:creator>Murat_G_Intel</dc:creator>
      <dc:date>2014-10-29T17:26:00Z</dc:date>
    </item>
    <item>
      <title>Guys, thank you very much for</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-matmul-vs-dgemm-for-small-size-matrices/m-p/1039280#M20622</link>
      <description>&lt;P&gt;Guys, thank you very much for your replies.&lt;/P&gt;

&lt;P&gt;I am compiling my code using Visual Studio 2010 + Intel Parallel XE 2011 (which I believe has MKL 10.3 ?).&lt;/P&gt;

&lt;P&gt;So I guess I can't make use of&amp;nbsp;&lt;SPAN style="font-size: 12px; line-height: 14.3999996185303px;"&gt;MKL_DIRECT_CALL in that case, right? But still, if I would get a later version of MKL, would there be a way to set this option from Visual Studio?&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;I should also mention that I am compiling a dynamic link library which I call within matlab. I don't know if this would make things even more complicated or not.&lt;/P&gt;

&lt;P&gt;Best regards,&lt;/P&gt;

&lt;P&gt;Murat&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 29 Oct 2014 17:41:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-matmul-vs-dgemm-for-small-size-matrices/m-p/1039280#M20622</guid>
      <dc:creator>e112974</dc:creator>
      <dc:date>2014-10-29T17:41:12Z</dc:date>
    </item>
    <item>
      <title>OK, I guess the up to date</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-matmul-vs-dgemm-for-small-size-matrices/m-p/1039281#M20623</link>
      <description>&lt;P&gt;OK, I guess the up to date MKL 11.2 slides were presented this week but aren't found by google search.&lt;/P&gt;

&lt;P&gt;It looks like you would need to add the specified INCLUDE in your source file, include path in compile properties, and make sure fpp preprocessing option is set after you get the new MKL version.&lt;/P&gt;</description>
      <pubDate>Wed, 29 Oct 2014 20:44:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-matmul-vs-dgemm-for-small-size-matrices/m-p/1039281#M20623</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2014-10-29T20:44:00Z</dc:date>
    </item>
    <item>
      <title>Hi e112974, </title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-matmul-vs-dgemm-for-small-size-matrices/m-p/1039282#M20624</link>
      <description>&lt;P&gt;Hi e112974,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Right, you&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;&amp;nbsp;can't make use of&amp;nbsp;MKL_DIRECT_CALL&amp;nbsp;&lt;/SPAN&gt;&amp;nbsp;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;using Visual Studio 2010 + Intel Parallel XE 2011 ( MKL 10.3, &amp;nbsp;please check &lt;A href="https://software.intel.com/en-us/articles/which-version-of-the-intel-ipp-intel-mkl-and-intel-tbb-libraries-are-included-in-the-intel-composer-bundles"&gt;here&lt;/A&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;)&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;and if you have MKL 11.2, you can set the option &amp;nbsp;&lt;SPAN style="color: rgb(102, 102, 102); font-family: Arial, Tahoma, Helvetica, sans-serif; font-size: 14px; line-height: 16.7999992370605px;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-weight: 700; color: rgb(102, 102, 102); font-family: Arial, Tahoma, Helvetica, sans-serif; font-size: 14px; line-height: 16.7999992370605px;"&gt;/DMKL_DIRECT_CALL &amp;nbsp;in MSVC IDE enironment. for example, open project property page=&amp;gt;C/C++ tab=&amp;gt;Command Line=&amp;gt;Addition Options.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="color: rgb(102, 102, 102); font-family: Arial, Tahoma, Helvetica, sans-serif; font-size: 14px; line-height: 16.7999992370605px;"&gt;For a program in the C language on Linux system, simply add -DMKL_DIRECT_CALL or -DMKL_DIRECT_CALL_SEQ. On Windows, the syntax is &lt;STRONG&gt;/DMKL_DIRECT_CALL or /DMKL_DIRECT_CALL_SEQ. Usually, the flag -std=c99&lt;/STRONG&gt; (/Qstd=c99 on Windows) is also needed. This has been tested on mainstream C and C++ compilers such as Intel C++ Compiler, GCC, Microsoft Visual Studio, etc&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="color: rgb(102, 102, 102); font-family: Arial, Tahoma, Helvetica, sans-serif; font-size: 14px; line-height: 16.7999992370605px;"&gt;Regarding mkl in Matlab usage, &amp;nbsp;the dynamic dll with the option, we haven't tried. But i guess it should be work (although not sure the performance gain), with&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin-left:21.0pt;"&gt;&amp;gt;"C:\Program Files (x86)\Intel\Composer XE 2013 SP1\mkl\bin\mklvars.bat" intel64&lt;/P&gt;

&lt;P style="margin-left:21.0pt;"&gt;&amp;gt;set BLAS_VERSION=mkl_rt.dll&lt;/P&gt;

&lt;P style="margin-left:21.0pt;"&gt;&amp;gt;set LAPACK_VERSION=mkl_rt.dll&lt;/P&gt;

&lt;P&gt;Best Regards,&lt;/P&gt;

&lt;P&gt;Ying&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Nov 2014 06:29:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-matmul-vs-dgemm-for-small-size-matrices/m-p/1039282#M20624</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2014-11-03T06:29:58Z</dc:date>
    </item>
  </channel>
</rss>

