<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Any plans to inline LSAME ? in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Any-plans-to-inline-LSAME/m-p/961691#M15955</link>
    <description>&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;
&lt;P&gt;&lt;FONT face="Times New Roman" size="3"&gt;This issue got to do mostly with small size problems, &lt;SPAN&gt;&lt;/SPAN&gt;in the case I encountered 70% of the time of DTRSM (with a 3x3 matrix with 36 rhs vectors to solve) was spent in the input error checking, but probably affect most mkl functions for small problems .&lt;SPAN&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Times New Roman" size="3"&gt;&lt;SPAN&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;FONT face="Times New Roman" size="3"&gt;The input error checking code call LSAME many times with different parameters, so the branch predictor has no chance.&lt;SPAN&gt; &lt;/SPAN&gt;Inlineing it would prevent that, and probably it can be made to have no conditional branches at all, since all the non standard ascii coding testing can be eliminated.&lt;/FONT&gt;&lt;/P&gt;&lt;/DIV&gt;</description>
    <pubDate>Sun, 20 Nov 2005 23:38:05 GMT</pubDate>
    <dc:creator>dshor1</dc:creator>
    <dc:date>2005-11-20T23:38:05Z</dc:date>
    <item>
      <title>Any plans to inline LSAME ?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Any-plans-to-inline-LSAME/m-p/961691#M15955</link>
      <description>&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;
&lt;P&gt;&lt;FONT face="Times New Roman" size="3"&gt;This issue got to do mostly with small size problems, &lt;SPAN&gt;&lt;/SPAN&gt;in the case I encountered 70% of the time of DTRSM (with a 3x3 matrix with 36 rhs vectors to solve) was spent in the input error checking, but probably affect most mkl functions for small problems .&lt;SPAN&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Times New Roman" size="3"&gt;&lt;SPAN&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;FONT face="Times New Roman" size="3"&gt;The input error checking code call LSAME many times with different parameters, so the branch predictor has no chance.&lt;SPAN&gt; &lt;/SPAN&gt;Inlineing it would prevent that, and probably it can be made to have no conditional branches at all, since all the non standard ascii coding testing can be eliminated.&lt;/FONT&gt;&lt;/P&gt;&lt;/DIV&gt;</description>
      <pubDate>Sun, 20 Nov 2005 23:38:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Any-plans-to-inline-LSAME/m-p/961691#M15955</guid>
      <dc:creator>dshor1</dc:creator>
      <dc:date>2005-11-20T23:38:05Z</dc:date>
    </item>
    <item>
      <title>Re: Any plans to inline LSAME ?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Any-plans-to-inline-LSAME/m-p/961692#M15956</link>
      <description>You are welcome to compile with the public source code of those functions you choose, and experiment with in-line and the like.  In-lining itself isn't likely to help branch prediction.  Straightening out your favored paths by PGO or simply using your knowledge of branches taken in your cases may help.  BLAS isn't generally suited to high performance with such small matrices.</description>
      <pubDate>Wed, 23 Nov 2005 01:44:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Any-plans-to-inline-LSAME/m-p/961692#M15956</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2005-11-23T01:44:33Z</dc:date>
    </item>
    <item>
      <title>Re: Any plans to inline LSAME ?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Any-plans-to-inline-LSAME/m-p/961693#M15957</link>
      <description>&lt;DIV&gt;&lt;/DIV&gt;
&lt;P&gt;&lt;SPAN&gt;3x3 is not the typical&lt;/SPAN&gt;&lt;FONT face="Times New Roman" size="3"&gt; size to use in blas, but that what PARDISO is using, so this issue got some relevance for some MKL users. &lt;SPAN&gt;&lt;/SPAN&gt;Since most of these functions are called with the same input type in many cases, inlineing will have 100% correct prediction for this code, without match effort &lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT size="3"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN&gt;&lt;/SPAN&gt;nounit = lsame_(diag, "N");&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT size="3"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN&gt;&lt;/SPAN&gt;upper = lsame_(uplo, "U");&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Times New Roman" size="3"&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT size="3"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN&gt;&lt;/SPAN&gt;info = 0;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT size="3"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN&gt;&lt;/SPAN&gt;if (! lside &amp;amp;&amp;amp; ! lsame_(side, "R")) {&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT size="3"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN&gt;&lt;/SPAN&gt;info = 1;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT size="3"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN&gt;&lt;/SPAN&gt;} else if (! upper &amp;amp;&amp;amp; ! lsame_(uplo, "L")) {&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT size="3"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN&gt;&lt;/SPAN&gt;info = 2;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT size="3"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN&gt;&lt;/SPAN&gt;} else if (! lsame_(transa, "N") &amp;amp;&amp;amp; ! lsame_(transa, "T") &lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT size="3"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN&gt;&lt;/SPAN&gt;&amp;amp;&amp;amp;! lsame_(transa, "C")) &lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT size="3"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN&gt;&lt;/SPAN&gt;{&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT size="3"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN&gt;&lt;/SPAN&gt;info = 3;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT size="3"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN&gt;&lt;/SPAN&gt;} else if (! lsame_(diag, "U") &amp;amp;&amp;amp; ! lsame_(diag, "N")) {&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT size="3"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN&gt;&lt;/SPAN&gt;info = 4;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT size="3"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN&gt;&lt;/SPAN&gt;} else if (*m &amp;lt; 0) {&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT size="3"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN&gt;&lt;/SPAN&gt;info = 5;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT size="3"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN&gt;&lt;/SPAN&gt;} else if (*n &amp;lt; 0) {&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT size="3"&gt;&lt;FONT face="Times New Roman"&gt;&lt;SPAN&gt;&lt;/SPAN&gt;info = 6;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Times New Roman" size="3"&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Times New Roman" size="3"&gt;I think it worth considering&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Times New Roman" size="3"&gt;&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 23 Nov 2005 22:11:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Any-plans-to-inline-LSAME/m-p/961693#M15957</guid>
      <dc:creator>dshor1</dc:creator>
      <dc:date>2005-11-23T22:11:07Z</dc:date>
    </item>
    <item>
      <title>Re: Any plans to inline LSAME ?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Any-plans-to-inline-LSAME/m-p/961694#M15958</link>
      <description>OK, you're looking for dead branch code elimination.  I suspect it won't happen in this case, even with in-lining.  You could easily find out, by comparing with a version which you simplify manually. Then, if you believe it's important for the compiler to improve its optimization, you could file a problem report/feature request. Profile guided optimization ought to accomplish the job, but you and I probably agree it's not the cleanest way in this situation.</description>
      <pubDate>Wed, 23 Nov 2005 22:52:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Any-plans-to-inline-LSAME/m-p/961694#M15958</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2005-11-23T22:52:57Z</dc:date>
    </item>
    <item>
      <title>Re: Any plans to inline LSAME ?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Any-plans-to-inline-LSAME/m-p/961695#M15959</link>
      <description>&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;Of course 3x3 matrices were not in the thinking of the developers of LAPACK and because of the generality of the software there needs to be quite a bit of parameter evaluation. It's not surprisingthat you are spending more time in lsame than in the computations.&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;I have to agree with Tim on the limits of what inlining can do.&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;If the processing is always the same (i.e., the results of lsame is always the same) you might want to strip out all the unnecessary parts of dtrsm and compile it for your application.&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;Bruce&lt;/DIV&gt;&lt;P&gt;Message Edited by bsgreer on &lt;SPAN class="date_text"&gt;11-30-2005&lt;/SPAN&gt; &lt;SPAN class="time_text"&gt;03:44 PM&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 01 Dec 2005 06:28:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Any-plans-to-inline-LSAME/m-p/961695#M15959</guid>
      <dc:creator>Intel_C_Intel</dc:creator>
      <dc:date>2005-12-01T06:28:28Z</dc:date>
    </item>
  </channel>
</rss>

