<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Opcode ordering in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Opcode-ordering/m-p/892277#M3833</link>
    <description>&lt;FONT size="2"&gt;&lt;FONT face="Arial"&gt;Im writing assembly code. Say I have three parallel, mostly independent, streams of execution, call them P, Q and R. Say that for the most part, opcodes in each stream are serially dependent (each opcode uses results from its predecessor). Im executing the code on a Penryn processor. For the sake of simplicity, assume all the opcodes have latency and throughput of 1 cycle. Is it better to:&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/FONT&gt;
&lt;P class="MsoNormal"&gt;&lt;SPAN&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI class="MsoNormal"&gt;&lt;FONT face="Arial" size="2"&gt;&lt;SPAN&gt;code one opcode from stream P, one from stream Q, one from stream R, then back to P again?&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt; 
&lt;/LI&gt;&lt;LI class="MsoNormal"&gt;&lt;FONT face="Arial" size="2"&gt;&lt;SPAN&gt;3 or 4 opcodes from P, 3 or 4 opcodes from Q, 3 or 4 opcodes from R, then back to P again&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt; 
&lt;/LI&gt;&lt;LI class="MsoNormal"&gt;&lt;FONT face="Arial" size="2"&gt;&lt;SPAN&gt;Something else?&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt; &lt;/LI&gt;&lt;/OL&gt;
&lt;P class="MsoNormal"&gt;&lt;FONT face="Arial" size="2"&gt;&lt;SPAN&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="MsoNormal"&gt;&lt;FONT face="Arial" size="2"&gt;&lt;SPAN&gt;Section 3.5.2.1&lt;FONT color="navy"&gt;&lt;SPAN&gt; &lt;FONT color="#000000"&gt;of the&lt;/FONT&gt; Intel  64 and IA-32 Architectures Optimization Reference Manual&lt;/SPAN&gt;&lt;/FONT&gt;, &lt;FONT color="navy"&gt;&lt;SPAN&gt;&lt;/SPAN&gt;&lt;/FONT&gt;ROB Read Port Stalls,&lt;FONT color="navy"&gt;&lt;SPAN&gt;&lt;/SPAN&gt;&lt;/FONT&gt; mildly suggests that programmers keep short dependency chains together, which makes one think that option B is preferred. On the other hand, if the RS is keeping tabs on 32 opcodes and choice A normally supplies 3 opcodes that are ready to go on every clock cycle, maybe it works just as well.&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="MsoNormal"&gt;&lt;FONT face="Arial" size="2"&gt;&lt;SPAN&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="MsoNormal"&gt;&lt;FONT face="Arial" size="2"&gt;&lt;SPAN&gt;I prefer option A simply because it helps me see opportunities to cram more work into fewer cycles.&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="MsoNormal"&gt;&lt;FONT face="Arial" size="2"&gt;&lt;SPAN&gt;Thanks,&lt;BR /&gt;Brian&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 06 May 2008 14:54:24 GMT</pubDate>
    <dc:creator>Intel_C_Intel</dc:creator>
    <dc:date>2008-05-06T14:54:24Z</dc:date>
    <item>
      <title>Opcode ordering</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Opcode-ordering/m-p/892277#M3833</link>
      <description>&lt;FONT size="2"&gt;&lt;FONT face="Arial"&gt;Im writing assembly code. Say I have three parallel, mostly independent, streams of execution, call them P, Q and R. Say that for the most part, opcodes in each stream are serially dependent (each opcode uses results from its predecessor). Im executing the code on a Penryn processor. For the sake of simplicity, assume all the opcodes have latency and throughput of 1 cycle. Is it better to:&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/FONT&gt;
&lt;P class="MsoNormal"&gt;&lt;SPAN&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI class="MsoNormal"&gt;&lt;FONT face="Arial" size="2"&gt;&lt;SPAN&gt;code one opcode from stream P, one from stream Q, one from stream R, then back to P again?&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt; 
&lt;/LI&gt;&lt;LI class="MsoNormal"&gt;&lt;FONT face="Arial" size="2"&gt;&lt;SPAN&gt;3 or 4 opcodes from P, 3 or 4 opcodes from Q, 3 or 4 opcodes from R, then back to P again&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt; 
&lt;/LI&gt;&lt;LI class="MsoNormal"&gt;&lt;FONT face="Arial" size="2"&gt;&lt;SPAN&gt;Something else?&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt; &lt;/LI&gt;&lt;/OL&gt;
&lt;P class="MsoNormal"&gt;&lt;FONT face="Arial" size="2"&gt;&lt;SPAN&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="MsoNormal"&gt;&lt;FONT face="Arial" size="2"&gt;&lt;SPAN&gt;Section 3.5.2.1&lt;FONT color="navy"&gt;&lt;SPAN&gt; &lt;FONT color="#000000"&gt;of the&lt;/FONT&gt; Intel  64 and IA-32 Architectures Optimization Reference Manual&lt;/SPAN&gt;&lt;/FONT&gt;, &lt;FONT color="navy"&gt;&lt;SPAN&gt;&lt;/SPAN&gt;&lt;/FONT&gt;ROB Read Port Stalls,&lt;FONT color="navy"&gt;&lt;SPAN&gt;&lt;/SPAN&gt;&lt;/FONT&gt; mildly suggests that programmers keep short dependency chains together, which makes one think that option B is preferred. On the other hand, if the RS is keeping tabs on 32 opcodes and choice A normally supplies 3 opcodes that are ready to go on every clock cycle, maybe it works just as well.&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="MsoNormal"&gt;&lt;FONT face="Arial" size="2"&gt;&lt;SPAN&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="MsoNormal"&gt;&lt;FONT face="Arial" size="2"&gt;&lt;SPAN&gt;I prefer option A simply because it helps me see opportunities to cram more work into fewer cycles.&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="MsoNormal"&gt;&lt;FONT face="Arial" size="2"&gt;&lt;SPAN&gt;Thanks,&lt;BR /&gt;Brian&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 06 May 2008 14:54:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Opcode-ordering/m-p/892277#M3833</guid>
      <dc:creator>Intel_C_Intel</dc:creator>
      <dc:date>2008-05-06T14:54:24Z</dc:date>
    </item>
  </channel>
</rss>

