<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Vectorization - pragma asm interpretation in Intel® ISA Extensions</title>
    <link>https://community.intel.com/t5/Intel-ISA-Extensions/Vectorization-pragma-asm-interpretation/m-p/850755#M2067</link>
    <description>Two LEA instructions at the function end are simply fillers (NOPs) to ensure proper alignment for the next function -- they aren't part of the function epilogue.&lt;BR /&gt;&lt;BR /&gt;As for the prologue difference it is hard to tell without seeing the rest of the surrounding code. Most likely vectorization enables the compiler to "see" an opportunity for some other optimizations thus resulting in a bit shorter code which uses less variables.&lt;BR /&gt;</description>
    <pubDate>Sun, 26 Apr 2009 05:19:11 GMT</pubDate>
    <dc:creator>levicki</dc:creator>
    <dc:date>2009-04-26T05:19:11Z</dc:date>
    <item>
      <title>Vectorization - pragma asm interpretation</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Vectorization-pragma-asm-interpretation/m-p/850754#M2066</link>
      <description>Hello,&lt;BR /&gt;&lt;BR /&gt;Simply looking to interpret below things -&lt;BR /&gt;&lt;BR /&gt;(a) For multiple C++ package file, when I do vectorizations (calling of pragma's) within that file within section of code, I get starting and ending asm as -&lt;BR /&gt;{&lt;BR /&gt;44d960: 55 push %rbp&lt;BR /&gt;44d961: 48 83 ec 50 sub $0x50,%rsp&lt;BR /&gt;44d965: 49 89 f0mov %rsi,%r8&lt;BR /&gt;44d968: 4c 63 c9movslq %ecx,%r9&lt;BR /&gt;...&lt;BR /&gt;&lt;BR /&gt;...&lt;BR /&gt;44dc84: 48 83 c4 50 add $0x50,%rsp&lt;BR /&gt;44dc88: 5dpop %rbp&lt;BR /&gt;44dc89: c3retq&lt;BR /&gt;44dc8a: 90 nop&lt;BR /&gt;44dc8b: 48 8d 74 26 00 lea 0x0(%rsi),%rsi&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;(b) But the same code w/o using any pragma's call, the starting &amp;amp; ending asm are as -&lt;BR /&gt;{&lt;BR /&gt;44d960: 48 83 ec 68 sub $0x68,%rsp&lt;BR /&gt;44d964: 49 89 f9mov %rdi,%r9&lt;BR /&gt;44d967: 49 89 d0mov %rdx,%r8&lt;BR /&gt;44d96a:4c 63 d1movslq %ecx,%r10&lt;BR /&gt;..&lt;BR /&gt;..&lt;BR /&gt;..&lt;BR /&gt;44dc4e: 48 83 c4 68add $0x68,%rsp&lt;BR /&gt;44dc52: c3retq&lt;BR /&gt;44dc53: 90nop&lt;BR /&gt;44dc54: 48 8d 74 26 00lea 0x0(%rsi),%rsi&lt;BR /&gt;44dc59: 48 8d bf 00 00 00 00lea 0x0(%rdi),%rdi&lt;BR /&gt;}&lt;BR /&gt;---&lt;BR /&gt;&lt;BR /&gt;Query:&lt;BR /&gt;(1) Could the difference between having PUSH/POP call with pragma vectorization calls and not having w/o it be differentiated?&lt;BR /&gt;&lt;BR /&gt;(2) W/o pragma calls, the asm in (b) has "lea" calls twice and also the during starting it has - sub, mov, mov &amp;amp; movslq than with pragma calls, why pragma calls bring such a difference?&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;~BR &lt;BR /&gt;</description>
      <pubDate>Sun, 26 Apr 2009 01:50:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Vectorization-pragma-asm-interpretation/m-p/850754#M2066</guid>
      <dc:creator>srimks</dc:creator>
      <dc:date>2009-04-26T01:50:17Z</dc:date>
    </item>
    <item>
      <title>Re: Vectorization - pragma asm interpretation</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Vectorization-pragma-asm-interpretation/m-p/850755#M2067</link>
      <description>Two LEA instructions at the function end are simply fillers (NOPs) to ensure proper alignment for the next function -- they aren't part of the function epilogue.&lt;BR /&gt;&lt;BR /&gt;As for the prologue difference it is hard to tell without seeing the rest of the surrounding code. Most likely vectorization enables the compiler to "see" an opportunity for some other optimizations thus resulting in a bit shorter code which uses less variables.&lt;BR /&gt;</description>
      <pubDate>Sun, 26 Apr 2009 05:19:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Vectorization-pragma-asm-interpretation/m-p/850755#M2067</guid>
      <dc:creator>levicki</dc:creator>
      <dc:date>2009-04-26T05:19:11Z</dc:date>
    </item>
    <item>
      <title>Re: Vectorization - pragma asm interpretation</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Vectorization-pragma-asm-interpretation/m-p/850756#M2068</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/61352"&gt;Igor Levicki&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;Two LEA instructions at the function end are simply fillers (NOPs) to ensure proper alignment for the next function -- they aren't part of the function epilogue.&lt;BR /&gt;&lt;BR /&gt;As for the prologue difference it is hard to tell without seeing the rest of the surrounding code. Most likely vectorization enables the compiler to "see" an opportunity for some other optimizations thus resulting in a bit shorter code which uses less variables.&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;yeah, you are right for epilogue, code-generators normally generates NOP (no-operation) instructions to align instructions.&lt;BR /&gt;&lt;BR /&gt;Lets look for prologue part if possible.&lt;BR /&gt;&lt;BR /&gt;Thanks Igor.&lt;BR /&gt;&lt;BR /&gt;~BR</description>
      <pubDate>Sun, 26 Apr 2009 15:57:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Vectorization-pragma-asm-interpretation/m-p/850756#M2068</guid>
      <dc:creator>srimks</dc:creator>
      <dc:date>2009-04-26T15:57:20Z</dc:date>
    </item>
    <item>
      <title>Re: Vectorization - pragma asm interpretation</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Vectorization-pragma-asm-interpretation/m-p/850757#M2069</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/61352"&gt;Igor Levicki&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;Two LEA instructions at the function end are simply fillers (NOPs) to ensure proper alignment for the next function -- they aren't part of the function epilogue.&lt;BR /&gt;&lt;BR /&gt;As for the prologue difference it is hard to tell without seeing the rest of the surrounding code. Most likely vectorization enables the compiler to "see" an opportunity for some other optimizations thus resulting in a bit shorter code which uses less variables.&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
As qouted &lt;EM&gt;"As for the prologue difference it is hard to tell without seeing the rest of the surrounding code. Most likely vectorization enables the compiler to "see" an opportunity for some other optimizations thus resulting in a bit shorter code which uses less variables.", &lt;/EM&gt;probably iif you see prologues of both -&lt;BR /&gt;&lt;BR /&gt;(a) Prologue with pragma vectorization -&lt;BR /&gt;{&lt;BR /&gt;44d960: 55 push %rbp&lt;BR /&gt;44d961: 48 83 ec 50 sub $0x50,%rsp&lt;BR /&gt;44d965: 49 89 f0 mov %rsi,%r8&lt;BR /&gt;44d968: 4c 63 c9movslq %ecx,%r9&lt;BR /&gt;...&lt;BR /&gt;...&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;(b) The same code w/o using any pragma's call, the prologue asm are as -&lt;BR /&gt;{&lt;BR /&gt;44d960: 48 83 ec 68 sub $0x68,%rsp&lt;BR /&gt;44d964: 49 89 f9 mov %rdi,%r9&lt;BR /&gt;44d967: 49 89 d0 mov %rdx,%r8&lt;BR /&gt;44d96a:4c 63 d1 movslq %ecx,%r1&lt;BR /&gt;...&lt;BR /&gt;...&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;With above (a) i.e with pragma, the "PUSH %RBP" instructions is internally split into two micro-operations which can be represented as "SUB RSP, 4" and "MOV [RDI], %r9" . The advantage of this is that the "SUB RSP, 4" micro-operation can be executed even if the vale of RBP is not ready yet. &lt;BR /&gt;&lt;BR /&gt;I don't think much gain can be obtained with both the prologues with and w/o pragma vectorization, their meanings are same, the only important factor which makes a difference is having "lea" instructions twice for alignment with pragma call of vectorization.&lt;BR /&gt;&lt;BR /&gt;But the questions arises - why the "sub $0x68,%rsp" &amp;amp; "mov %rdi,%r9" w/o pragma have been replaced with single "push %rbp"?&lt;BR /&gt;&lt;BR /&gt;is it becoz "push %rbp" has better latency and reciprocal throughput.&lt;BR /&gt;&lt;BR /&gt;~BR&lt;BR /&gt;</description>
      <pubDate>Wed, 29 Apr 2009 05:45:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Vectorization-pragma-asm-interpretation/m-p/850757#M2069</guid>
      <dc:creator>srimks</dc:creator>
      <dc:date>2009-04-29T05:45:38Z</dc:date>
    </item>
  </channel>
</rss>

