<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Loop Versioning in Intel compiler  in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/Loop-Versioning-in-Intel-compiler/m-p/784544#M442</link>
    <description>Hi all,&lt;BR /&gt;I have a question about the loop versioning in the Intel compiler.&lt;BR /&gt;Well, i compiled the SPEC 2006 with ICC -g3 and for one of the &lt;BR /&gt;codes (the bwaves code to be more specific) there where a &lt;BR /&gt;loop versioning and the generated code was :&lt;BR /&gt;&lt;BR /&gt;=================================================================&lt;BR /&gt; 404802: 48 f7 c2 0f 00 00 00  test $0xf,%rdx&lt;BR /&gt; 404809: 0f 84 71 00 00 00  je 404880 &lt;BI_CGSTAB_BLOCK_&gt;&lt;BR /&gt; 40480f: 90  nop&lt;BR /&gt; 404810: f2 42 0f 10 1c c3  movsd (%rbx,%r8,8),%xmm3&lt;BR /&gt; 404816: f2 42 0f 10 64 c3 10  movsd 0x10(%rbx,%r8,8),%xmm4&lt;BR /&gt; 40481d: 66 42 0f 16 5c c3 08  movhpd 0x8(%rbx,%r8,8),%xmm3&lt;BR /&gt; 404824: 66 42 0f 16 64 c3 18  movhpd 0x18(%rbx,%r8,8),%xmm4&lt;BR /&gt; 40482b: 66 43 0f 59 1c c1  mulpd (%r9,%r8,8),%xmm3&lt;BR /&gt; 404831: 66 43 0f 59 64 c1 10  mulpd 0x10(%r9,%r8,8),%xmm4&lt;BR /&gt; 404838: 66 0f 58 d3  addpd %xmm3,%xmm2&lt;BR /&gt; 40483c: 66 0f 58 cc  addpd %xmm4,%xmm1&lt;BR /&gt; 404840: f2 42 0f 10 6c c3 20  movsd 0x20(%rbx,%r8,8),%xmm5&lt;BR /&gt; 404847: f2 42 0f 10 74 c3 30  movsd 0x30(%rbx,%r8,8),%xmm6&lt;BR /&gt; 40484e: 66 42 0f 16 6c c3 28  movhpd 0x28(%rbx,%r8,8),%xmm5&lt;BR /&gt; 404855: 66 42 0f 16 74 c3 38  movhpd 0x38(%rbx,%r8,8),%xmm6&lt;BR /&gt; 40485c: 66 43 0f 59 6c c1 20  mulpd 0x20(%r9,%r8,8),%xmm5&lt;BR /&gt; 404863: 66 43 0f 59 74 c1 30  mulpd 0x30(%r9,%r8,8),%xmm6&lt;BR /&gt; 40486a: 66 0f 58 d5  addpd %xmm5,%xmm2&lt;BR /&gt; 40486e: 66 0f 58 ce  addpd %xmm6,%xmm1&lt;BR /&gt; 404872: 49 83 c0 08  add $0x8,%r8&lt;BR /&gt; 404876: 4c 3b c1  cmp %rcx,%r8&lt;BR /&gt; 404879: 72 95  jb 404810 &lt;BI_CGSTAB_BLOCK_&gt;&lt;BR /&gt; 40487b: eb 4e  jmp 4048cb &lt;BI_CGSTAB_BLOCK_&gt;&lt;BR /&gt; 40487d: 48 89 f6  mov %rsi,%rsi&lt;BR /&gt; 404880: 42 0f 28 1c c3  movaps (%rbx,%r8,8),%xmm3&lt;BR /&gt; 404885: 42 0f 28 64 c3 10  movaps 0x10(%rbx,%r8,8),%xmm4&lt;BR /&gt; 40488b: 66 43 0f 59 1c c1  mulpd (%r9,%r8,8),%xmm3&lt;BR /&gt; 404891: 66 43 0f 59 64 c1 10  mulpd 0x10(%r9,%r8,8),%xmm4&lt;BR /&gt; 404898: 66 0f 58 d3  addpd %xmm3,%xmm2&lt;BR /&gt; 40489c: 66 0f 58 cc  addpd %xmm4,%xmm1&lt;BR /&gt; 4048a0: 42 0f 28 6c c3 20  movaps 0x20(%rbx,%r8,8),%xmm5&lt;BR /&gt; 4048a6: 42 0f 28 74 c3 30  movaps 0x30(%rbx,%r8,8),%xmm6&lt;BR /&gt; 4048ac: 66 43 0f 59 6c c1 20  mulpd 0x20(%r9,%r8,8),%xmm5&lt;BR /&gt; 4048b3: 66 43 0f 59 74 c1 30  mulpd 0x30(%r9,%r8,8),%xmm6&lt;BR /&gt; 4048ba: 66 0f 58 d5  addpd %xmm5,%xmm2&lt;BR /&gt; 4048be: 66 0f 58 ce  addpd %xmm6,%xmm1&lt;BR /&gt; 4048c2: 49 83 c0 08  add $0x8,%r8&lt;BR /&gt; 4048c6: 4c 3b c1  cmp %rcx,%r8&lt;BR /&gt; 4048c9: 72 b5  jb 404880 &lt;BI_CGSTAB_BLOCK_&gt;&lt;BR /&gt; 4048cb: 49 3b cc  cmp %r12,%rcx&lt;BR /&gt; 4048ce: 0f 83 64 14 00 00  jae 405d38 &lt;BI_CGSTAB_BLOCK_&gt;&lt;BR /&gt; 4048d4: f2 0f 10 1c cb  movsd (%rbx,%rcx,8),%xmm3&lt;BR /&gt;=========================================================================&lt;BR /&gt;the first version starts at @ 40480f&lt;BR /&gt;the second at @ 404880&lt;BR /&gt;&lt;BR /&gt;So two strange line where there the : &lt;BR /&gt;40480f a nop &lt;BR /&gt;40487d a mov %rsi,%rsi &lt;BR /&gt;&lt;BR /&gt;so the jump to the second version of the loop is done to the&lt;BR /&gt;404880 and not the mov instr 40487d !. &lt;BR /&gt;So, i would like to know why these two instructions were generated &lt;BR /&gt;and why the second one is a mov and not a nop ?&lt;BR /&gt;&lt;BR /&gt;thanks in advence for your answers :) &lt;BR /&gt;&lt;/BI_CGSTAB_BLOCK_&gt;&lt;/BI_CGSTAB_BLOCK_&gt;&lt;/BI_CGSTAB_BLOCK_&gt;&lt;/BI_CGSTAB_BLOCK_&gt;&lt;/BI_CGSTAB_BLOCK_&gt;</description>
    <pubDate>Fri, 22 Jul 2011 17:11:12 GMT</pubDate>
    <dc:creator>zakaria-bendifallah</dc:creator>
    <dc:date>2011-07-22T17:11:12Z</dc:date>
    <item>
      <title>Loop Versioning in Intel compiler</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Loop-Versioning-in-Intel-compiler/m-p/784544#M442</link>
      <description>Hi all,&lt;BR /&gt;I have a question about the loop versioning in the Intel compiler.&lt;BR /&gt;Well, i compiled the SPEC 2006 with ICC -g3 and for one of the &lt;BR /&gt;codes (the bwaves code to be more specific) there where a &lt;BR /&gt;loop versioning and the generated code was :&lt;BR /&gt;&lt;BR /&gt;=================================================================&lt;BR /&gt; 404802: 48 f7 c2 0f 00 00 00  test $0xf,%rdx&lt;BR /&gt; 404809: 0f 84 71 00 00 00  je 404880 &lt;BI_CGSTAB_BLOCK_&gt;&lt;BR /&gt; 40480f: 90  nop&lt;BR /&gt; 404810: f2 42 0f 10 1c c3  movsd (%rbx,%r8,8),%xmm3&lt;BR /&gt; 404816: f2 42 0f 10 64 c3 10  movsd 0x10(%rbx,%r8,8),%xmm4&lt;BR /&gt; 40481d: 66 42 0f 16 5c c3 08  movhpd 0x8(%rbx,%r8,8),%xmm3&lt;BR /&gt; 404824: 66 42 0f 16 64 c3 18  movhpd 0x18(%rbx,%r8,8),%xmm4&lt;BR /&gt; 40482b: 66 43 0f 59 1c c1  mulpd (%r9,%r8,8),%xmm3&lt;BR /&gt; 404831: 66 43 0f 59 64 c1 10  mulpd 0x10(%r9,%r8,8),%xmm4&lt;BR /&gt; 404838: 66 0f 58 d3  addpd %xmm3,%xmm2&lt;BR /&gt; 40483c: 66 0f 58 cc  addpd %xmm4,%xmm1&lt;BR /&gt; 404840: f2 42 0f 10 6c c3 20  movsd 0x20(%rbx,%r8,8),%xmm5&lt;BR /&gt; 404847: f2 42 0f 10 74 c3 30  movsd 0x30(%rbx,%r8,8),%xmm6&lt;BR /&gt; 40484e: 66 42 0f 16 6c c3 28  movhpd 0x28(%rbx,%r8,8),%xmm5&lt;BR /&gt; 404855: 66 42 0f 16 74 c3 38  movhpd 0x38(%rbx,%r8,8),%xmm6&lt;BR /&gt; 40485c: 66 43 0f 59 6c c1 20  mulpd 0x20(%r9,%r8,8),%xmm5&lt;BR /&gt; 404863: 66 43 0f 59 74 c1 30  mulpd 0x30(%r9,%r8,8),%xmm6&lt;BR /&gt; 40486a: 66 0f 58 d5  addpd %xmm5,%xmm2&lt;BR /&gt; 40486e: 66 0f 58 ce  addpd %xmm6,%xmm1&lt;BR /&gt; 404872: 49 83 c0 08  add $0x8,%r8&lt;BR /&gt; 404876: 4c 3b c1  cmp %rcx,%r8&lt;BR /&gt; 404879: 72 95  jb 404810 &lt;BI_CGSTAB_BLOCK_&gt;&lt;BR /&gt; 40487b: eb 4e  jmp 4048cb &lt;BI_CGSTAB_BLOCK_&gt;&lt;BR /&gt; 40487d: 48 89 f6  mov %rsi,%rsi&lt;BR /&gt; 404880: 42 0f 28 1c c3  movaps (%rbx,%r8,8),%xmm3&lt;BR /&gt; 404885: 42 0f 28 64 c3 10  movaps 0x10(%rbx,%r8,8),%xmm4&lt;BR /&gt; 40488b: 66 43 0f 59 1c c1  mulpd (%r9,%r8,8),%xmm3&lt;BR /&gt; 404891: 66 43 0f 59 64 c1 10  mulpd 0x10(%r9,%r8,8),%xmm4&lt;BR /&gt; 404898: 66 0f 58 d3  addpd %xmm3,%xmm2&lt;BR /&gt; 40489c: 66 0f 58 cc  addpd %xmm4,%xmm1&lt;BR /&gt; 4048a0: 42 0f 28 6c c3 20  movaps 0x20(%rbx,%r8,8),%xmm5&lt;BR /&gt; 4048a6: 42 0f 28 74 c3 30  movaps 0x30(%rbx,%r8,8),%xmm6&lt;BR /&gt; 4048ac: 66 43 0f 59 6c c1 20  mulpd 0x20(%r9,%r8,8),%xmm5&lt;BR /&gt; 4048b3: 66 43 0f 59 74 c1 30  mulpd 0x30(%r9,%r8,8),%xmm6&lt;BR /&gt; 4048ba: 66 0f 58 d5  addpd %xmm5,%xmm2&lt;BR /&gt; 4048be: 66 0f 58 ce  addpd %xmm6,%xmm1&lt;BR /&gt; 4048c2: 49 83 c0 08  add $0x8,%r8&lt;BR /&gt; 4048c6: 4c 3b c1  cmp %rcx,%r8&lt;BR /&gt; 4048c9: 72 b5  jb 404880 &lt;BI_CGSTAB_BLOCK_&gt;&lt;BR /&gt; 4048cb: 49 3b cc  cmp %r12,%rcx&lt;BR /&gt; 4048ce: 0f 83 64 14 00 00  jae 405d38 &lt;BI_CGSTAB_BLOCK_&gt;&lt;BR /&gt; 4048d4: f2 0f 10 1c cb  movsd (%rbx,%rcx,8),%xmm3&lt;BR /&gt;=========================================================================&lt;BR /&gt;the first version starts at @ 40480f&lt;BR /&gt;the second at @ 404880&lt;BR /&gt;&lt;BR /&gt;So two strange line where there the : &lt;BR /&gt;40480f a nop &lt;BR /&gt;40487d a mov %rsi,%rsi &lt;BR /&gt;&lt;BR /&gt;so the jump to the second version of the loop is done to the&lt;BR /&gt;404880 and not the mov instr 40487d !. &lt;BR /&gt;So, i would like to know why these two instructions were generated &lt;BR /&gt;and why the second one is a mov and not a nop ?&lt;BR /&gt;&lt;BR /&gt;thanks in advence for your answers :) &lt;BR /&gt;&lt;/BI_CGSTAB_BLOCK_&gt;&lt;/BI_CGSTAB_BLOCK_&gt;&lt;/BI_CGSTAB_BLOCK_&gt;&lt;/BI_CGSTAB_BLOCK_&gt;&lt;/BI_CGSTAB_BLOCK_&gt;</description>
      <pubDate>Fri, 22 Jul 2011 17:11:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Loop-Versioning-in-Intel-compiler/m-p/784544#M442</guid>
      <dc:creator>zakaria-bendifallah</dc:creator>
      <dc:date>2011-07-22T17:11:12Z</dc:date>
    </item>
    <item>
      <title>Loop Versioning in Intel compiler</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Loop-Versioning-in-Intel-compiler/m-p/784545#M443</link>
      <description>Hello Zakaria,&lt;BR /&gt;I see you've already posted this message to the Intel C compiler forum &lt;A href="http://software.intel.com/en-us/forums/intel-c-compiler/" target="_blank"&gt;http://software.intel.com/en-us/forums/intel-c-compiler/&lt;/A&gt;.&lt;BR /&gt;That is the approriate forum.&lt;BR /&gt;Pat</description>
      <pubDate>Wed, 27 Jul 2011 19:51:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Loop-Versioning-in-Intel-compiler/m-p/784545#M443</guid>
      <dc:creator>Patrick_F_Intel1</dc:creator>
      <dc:date>2011-07-27T19:51:39Z</dc:date>
    </item>
    <item>
      <title>Loop Versioning in Intel compiler</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Loop-Versioning-in-Intel-compiler/m-p/784546#M444</link>
      <description>I know that my post is too late...&lt;BR /&gt;&lt;BR /&gt;...&lt;BR /&gt;404809: 0f 84 71 00 00 00  je 404880 &lt;BI_CGSTAB_BLOCK_&gt;&lt;BR /&gt;40480F: 90   &lt;STRONG&gt;nop&lt;/STRONG&gt;&lt;BR /&gt;404810: f2 42 0f 10 1c c3   movsd (%rbx,%r8,8),%xmm3&lt;BR /&gt;...&lt;BR /&gt;&lt;BR /&gt;It seems to me&lt;STRONG&gt;nop&lt;/STRONG&gt; instruction simplyprovides alignment for &lt;STRONG&gt;movsd&lt;/STRONG&gt; instruction. Try to divide 40480F by 4!&lt;BR /&gt;&lt;BR /&gt;40480F(base16) = 4212751(base10)&lt;BR /&gt;&lt;BR /&gt;4212751 / 4= 1053187.&lt;SPAN style="text-decoration: underline;"&gt;75&lt;BR /&gt;&lt;/SPAN&gt;&lt;/BI_CGSTAB_BLOCK_&gt;</description>
      <pubDate>Tue, 29 Nov 2011 05:07:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Loop-Versioning-in-Intel-compiler/m-p/784546#M444</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2011-11-29T05:07:57Z</dc:date>
    </item>
  </channel>
</rss>

