<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Out of order execution in Intel® ISA Extensions</title>
    <link>https://community.intel.com/t5/Intel-ISA-Extensions/Out-of-order-execution/m-p/910198#M2939</link>
    <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/265147"&gt;Tal Uliel (Intel)&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;&lt;BR /&gt;Hi,&lt;BR /&gt;&lt;BR /&gt;To use Intel Architecture Code Analyzer you need to compile your source with start and end marks and than run the analyzer tool on it.&lt;BR /&gt;&lt;BR /&gt;for example:&lt;BR /&gt;Source main.c:&lt;BR /&gt;#include "iacaMarks.h"&lt;BR /&gt;&lt;BR /&gt;int main(){&lt;BR /&gt;&lt;BR /&gt;IACA_START&lt;BR /&gt;__asm vandps xmm0, xmm0, xmm1&lt;BR /&gt;IACA_END&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;Compile the source using AVX supported Compiler (lets assume you've created a main.exe file).&lt;BR /&gt;&lt;BR /&gt;now run iaca -o main.iaca.txt main.exe and the expected output will be written to the file main.iaca.txt.&lt;BR /&gt;&lt;BR /&gt;I suggest to use -o option instead of redirection (&amp;gt;) of the output as the tool truncate the lines at char 80 when the output is the screen. &lt;BR /&gt;&lt;BR /&gt;For further details please refer to the Intel Architecture Code Analyzer - User Guide Rev 1.1 available on the &lt;A href="http://software.intel.com/en-us/articles/intel-architecture-code-analyzer-download/"&gt;download page&lt;/A&gt;.&lt;BR /&gt;&lt;BR /&gt;Tal&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Tal, thanks -I just put the mov ebx,111 ... ... mov ebx,222 around the instructions, and everything worked.&lt;BR /&gt;areid, Could you tell me where you got the 17 and 19 from; the analyzer for the nehalem architecture predicted a 15 cycle total throughput for method2 and 12 for method 1.</description>
    <pubDate>Tue, 27 Oct 2009 22:56:43 GMT</pubDate>
    <dc:creator>tthsqe</dc:creator>
    <dc:date>2009-10-27T22:56:43Z</dc:date>
    <item>
      <title>Out of order execution</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Out-of-order-execution/m-p/910189#M2930</link>
      <description>Is there a simulator and/or ageneral procedure one can follow to predict what instructions will be executedin what order (assuming all data is in the L1 cache)? I'm having a hard time comprehending why a given instruction sequence executes much faster than another. I suspect it's due to the out of order execution and register renaming, but I've found no tangible reason yet. Any help would be appreciated.</description>
      <pubDate>Fri, 16 Oct 2009 06:53:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Out-of-order-execution/m-p/910189#M2930</guid>
      <dc:creator>tthsqe</dc:creator>
      <dc:date>2009-10-16T06:53:15Z</dc:date>
    </item>
    <item>
      <title>Re: Out of order execution</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Out-of-order-execution/m-p/910190#M2931</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/438679"&gt;tthsqe&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;Is there a simulator and/or a general procedure one can follow to predict what instructions will be executed in what order (assuming all data is in the L1 cache)? I'm having a hard time comprehending why a given instruction sequence executes much faster than another. I suspect it's due to the out of order execution and register renaming, but I've found no tangible reason yet. Any help would be appreciated.&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;I do not know about such a simulator, though I would be interested in one myself.&lt;BR /&gt;&lt;BR /&gt;Some more ideas where your speed difference can come from:
&lt;UL&gt;
&lt;LI&gt;misaligned target - the instruction fetch can load 16 bytes (aligned) in one cycle. If your target is misaligned the number of instructions to decode in one cycle decreases.&lt;/LI&gt;
&lt;LI&gt;If your instructions are large, changing the ordering changes the instruction decoder throughput. I don't know the details, but I've seen a simple change from using xmm[0-7] to using xmm[8-15] degrading performance noticeably, because the latter needs a larger coded instruction. &lt;/LI&gt;
&lt;/UL&gt;
On that topic I recommend to look at sections 2.1.2.2, 2.2.2, and 3.4 of the Intel Optimization Reference Manual.</description>
      <pubDate>Fri, 16 Oct 2009 09:59:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Out-of-order-execution/m-p/910190#M2931</guid>
      <dc:creator>Matthias_Kretz</dc:creator>
      <dc:date>2009-10-16T09:59:10Z</dc:date>
    </item>
    <item>
      <title>Re: Out of order execution</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Out-of-order-execution/m-p/910191#M2932</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
Perhaps if you could provide the two snippets of assembly someone could analyse the differences. (Please provide a few leading and trailing instructions too, so we can see context).&lt;BR /&gt;&lt;BR /&gt;Without the code it would be complete speculation on why one bit runs faster than another.&lt;BR /&gt;</description>
      <pubDate>Fri, 16 Oct 2009 11:48:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Out-of-order-execution/m-p/910191#M2932</guid>
      <dc:creator>craigj0</dc:creator>
      <dc:date>2009-10-16T11:48:19Z</dc:date>
    </item>
    <item>
      <title>Re: Out of order execution</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Out-of-order-execution/m-p/910192#M2933</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/447122"&gt;craigj0&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;Perhaps if you could provide the two snippets of assembly someone could analyse the differences. (Please provide a few leading and trailing instructions too, so we can see context).&lt;BR /&gt;&lt;BR /&gt;Without the code it would be complete speculation on why one bit runs faster than another.&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
You will recognize it as the mandelbrot iteration. The second method trades a multiplication for and addition and uses one less instruction. However, the first method is 5-10% faster, depending on how much the loop is unrolled. I think it might be that the multiplications can be done together with the additions and moves from the previous block, but I am not sure about this. Any analysis would be appreciated.&lt;BR /&gt;method1:&lt;BR /&gt;align 16 ; 0 1 2 3 &lt;BR /&gt;@@: ;--------------------------&lt;BR /&gt;movaps xmm3, [twos] ; X Y 2&lt;BR /&gt;mulpd xmm3, xmm1 ; X Y   2Y&lt;BR /&gt;mulpd xmm1, xmm1 ; X YY   2Y&lt;BR /&gt;mulpd xmm3, xmm0 ; X YY   2XY&lt;BR /&gt;mulpd xmm0, xmm0 ; XX YY 2XY&lt;BR /&gt;movaps xmm2, xmm0 ; XX YY XX 2XY&lt;BR /&gt;subpd xmm0, xmm1 ; XX-YY YY XX 2XY&lt;BR /&gt;addpd xmm1, xmm2 ; XX-YY XX+YY XX 2XY&lt;BR /&gt;addpd xmm0, &lt;X&gt; ; newX XX+YY XX 2XY&lt;BR /&gt;movaps [test1],xmm1 ; newX XX+YY XX 2XY&lt;BR /&gt;movaps xmm1, &lt;Y&gt; ;newX y XX 2XY&lt;BR /&gt;addpd xmm1, xmm3 ;newX newY&lt;BR /&gt;&lt;BR /&gt;movaps xmm3, [twos] &lt;BR /&gt;mulpd xmm3, xmm5 &lt;BR /&gt;mulpd xmm5, xmm5 &lt;BR /&gt;mulpd xmm3, xmm4 &lt;BR /&gt;mulpd xmm4, xmm4 &lt;BR /&gt;movaps xmm2, xmm4 &lt;BR /&gt;subpd xmm4, xmm5&lt;BR /&gt;addpd xmm5, xmm2&lt;BR /&gt;addpd xmm4, &lt;X&gt;&lt;BR /&gt;movaps [test2],xmm1&lt;BR /&gt;movaps xmm5, &lt;Y&gt;&lt;BR /&gt;addpd xmm5, xmm3&lt;BR /&gt;&lt;BR /&gt;movaps xmm3, [twos]&lt;BR /&gt;mulpd xmm3, xmm7&lt;BR /&gt;mulpd xmm7, xmm7&lt;BR /&gt;mulpd xmm3, xmm6&lt;BR /&gt;mulpd xmm6, xmm6&lt;BR /&gt;movaps xmm2, xmm6&lt;BR /&gt;subpd xmm6, xmm7&lt;BR /&gt;addpd xmm7, xmm2&lt;BR /&gt;addpd xmm6, &lt;X&gt;&lt;BR /&gt;movaps [test3],xmm1&lt;BR /&gt;movaps xmm7, &lt;Y&gt;&lt;BR /&gt;addpd xmm7, xmm3&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;method2:&lt;BR /&gt;align 16 ; 0 1 2 3 &lt;BR /&gt;@@: ;-------------------------&lt;BR /&gt;movaps xmm3,xmm1 ; X Y Y&lt;BR /&gt;mulpd xmm3,xmm1 ; X Y YY&lt;BR /&gt;addpd xmm1,xmm1 ; X 2Y YY&lt;BR /&gt;mulpd xmm1,xmm0 ; X 2XY YY&lt;BR /&gt;addpd xmm1,&lt;Y&gt; ; X newY YY&lt;BR /&gt;mulpd xmm0,xmm0 ; XXnewY YY&lt;BR /&gt;movaps xmm2,xmm0 ; XX newY XX YY&lt;BR /&gt;addpd xmm2,xmm3 ; XX newY XX+YY YY&lt;BR /&gt;movaps [test1],xmm2 ; XX newY XX+YY YY&lt;BR /&gt;subpd xmm0,xmm3 ; XX-YY newY XX+YY YY&lt;BR /&gt;addpd xmm0,&lt;X&gt; ; newX newY XX+YY YY&lt;BR /&gt;&lt;BR /&gt;movaps xmm3,xmm5&lt;BR /&gt;mulpd xmm3,xmm5&lt;BR /&gt;addpd xmm5,xmm5&lt;BR /&gt;mulpd xmm5,xmm4&lt;BR /&gt;addpd xmm5,&lt;Y&gt;&lt;BR /&gt;mulpd xmm4,xmm4&lt;BR /&gt;movaps xmm2,xmm4&lt;BR /&gt;subpd xmm4,xmm3 &lt;BR /&gt;addpd xmm2,xmm3 &lt;BR /&gt;movaps [test2],xmm2 &lt;BR /&gt;addpd xmm4,&lt;X&gt;&lt;BR /&gt;&lt;BR /&gt;movaps xmm3,xmm7&lt;BR /&gt;mulpd xmm3,xmm7&lt;BR /&gt;addpd xmm7,xmm7&lt;BR /&gt;mulpd xmm7,xmm6&lt;BR /&gt;addpd xmm7,&lt;Y&gt;&lt;BR /&gt;mulpd xmm6,xmm6&lt;BR /&gt;movaps xmm2,xmm6&lt;BR /&gt;subpd xmm6,xmm3&lt;BR /&gt;addpd xmm2,xmm3&lt;BR /&gt;movaps [test3],xmm2&lt;BR /&gt;addpd xmm6,&lt;X&gt;&lt;BR /&gt;&lt;/X&gt;&lt;/Y&gt;&lt;/X&gt;&lt;/Y&gt;&lt;/X&gt;&lt;/Y&gt;&lt;/Y&gt;&lt;/X&gt;&lt;/Y&gt;&lt;/X&gt;&lt;/Y&gt;&lt;/X&gt;</description>
      <pubDate>Fri, 16 Oct 2009 14:20:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Out-of-order-execution/m-p/910192#M2933</guid>
      <dc:creator>tthsqe</dc:creator>
      <dc:date>2009-10-16T14:20:18Z</dc:date>
    </item>
    <item>
      <title>Re: Out of order execution</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Out-of-order-execution/m-p/910193#M2934</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
Yes, on CPU models of the last 3 years, multiply and add can issue on the same cycle, so it may often be important to reduce the number of multiply instructions when those exceed the number of add instructions. However, your 1st method may have more opportunity for parallel expression evaluation within a single iteration, and shorter overall latency.&lt;BR /&gt;On the current (Core i7) CPUs, it may be worth while to keep the unrolling down within the limits where Loop Stream Detection is effective.&lt;BR /&gt;Do you get better performance than standard code compiled by vectorizing compiler?&lt;BR /&gt;</description>
      <pubDate>Fri, 16 Oct 2009 15:04:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Out-of-order-execution/m-p/910193#M2934</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2009-10-16T15:04:04Z</dc:date>
    </item>
    <item>
      <title>Re: Out of order execution</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Out-of-order-execution/m-p/910194#M2935</link>
      <description>In terms of a simulator, have you checked out the Intel Architecture Code Analyzer? It is intented to compare AVX to SSE kernels to estimate relative performance, but you can also compare 2 SSE kernels. The simulator assumes all data in the L1 cache and ideal out-of-order engine conditions. It is a free tool you can find here:&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;A href="http://software.intel.com/en-us/articles/intel-architecture-code-analyzer/" target="_blank"&gt;http://software.intel.com/en-us/articles/intel-architecture-code-analyzer/&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;After looking at your code I was curious, so I pasted it into a project and ran it through IACA. My initial test seemed to show that method 2 should be about 10% faster (17 cycles throughputinstead of 19). I only spent about 5 minutes on this, so perhaps a few more tests would give some insight.&lt;BR /&gt;&lt;BR /&gt;I've only started playing with IACA recently, so I'm still figuring it out myself. It doesn't simulate a specific processor, but it is intended to predict performance with Sandy Bridge, so it may not be as good for current generation processors. Specifically, there are 2 data load ports in the simulated hardware, and I think there may only be one in current processors.&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Sat, 17 Oct 2009 07:45:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Out-of-order-execution/m-p/910194#M2935</guid>
      <dc:creator>areid2</dc:creator>
      <dc:date>2009-10-17T07:45:42Z</dc:date>
    </item>
    <item>
      <title>Re: Out of order execution</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Out-of-order-execution/m-p/910195#M2936</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/71464"&gt;areid&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;In terms of a simulator, have you checked out the Intel Architecture Code Analyzer? It is intented to compare AVX to SSE kernels to estimate relative performance, but you can also compare 2 SSE kernels. The simulator assumes all data in the L1 cache and ideal out-of-order engine conditions. It is a free tool you can find here:&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;A href="http://software.intel.com/en-us/articles/intel-architecture-code-analyzer/" target="_blank"&gt;http://software.intel.com/en-us/articles/intel-architecture-code-analyzer/&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;After looking at your code I was curious, so I pasted it into a project and ran it through IACA. My initial test seemed to show that method 2 should be about 10% faster (17 cycles throughputinstead of 19). I only spent about 5 minutes on this, so perhaps a few more tests would give some insight.&lt;BR /&gt;&lt;BR /&gt;I've only started playing with IACA recently, so I'm still figuring it out myself. It doesn't simulate a specific processor, but it is intended to predict performance with Sandy Bridge, so it may not be as good for current generation processors. Specifically, there are 2 data load ports in the simulated hardware, and I think there may only be one in current processors.&lt;BR /&gt;&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Just to keep you updated, A new release of Intel Architecture Code Analyzer (1.1) is available with Nehalem and Westmere support (in adddtion to the Intel AVX version that was supported in the previous versions).&lt;BR /&gt;&lt;BR /&gt;Let me know if you need further assistance using Intel Architecture Code Analyzer.&lt;BR /&gt;&lt;BR /&gt;Tal&lt;BR /&gt;</description>
      <pubDate>Thu, 22 Oct 2009 08:36:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Out-of-order-execution/m-p/910195#M2936</guid>
      <dc:creator>Tal_U_Intel</dc:creator>
      <dc:date>2009-10-22T08:36:01Z</dc:date>
    </item>
    <item>
      <title>Re: Out of order execution</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Out-of-order-execution/m-p/910196#M2937</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/265147"&gt;Tal Uliel (Intel)&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;BR /&gt;Just to keep you updated, A new release of Intel Architecture Code Analyzer (1.1) is available with Nehalem and Westmere support (in adddtion to the Intel AVX version that was supported in the previous versions).&lt;BR /&gt;&lt;BR /&gt;Let me know if you need further assistance using Intel Architecture Code Analyzer.&lt;BR /&gt;&lt;BR /&gt;Tal&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Could you tell me how to use this code analyzer. I downloaded it, but all gotare a bunch of dll files, an application that does nothing when run, and a readme that is not much help. What steps do you go though to test a chuck of code?</description>
      <pubDate>Mon, 26 Oct 2009 05:16:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Out-of-order-execution/m-p/910196#M2937</guid>
      <dc:creator>tthsqe</dc:creator>
      <dc:date>2009-10-26T05:16:42Z</dc:date>
    </item>
    <item>
      <title>Re: Out of order execution</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Out-of-order-execution/m-p/910197#M2938</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/438679"&gt;tthsqe&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt; &lt;BR /&gt;Could you tell me how to use this code analyzer. I downloaded it, but all gotare a bunch of dll files, an application that does nothing when run, and a readme that is not much help. What steps do you go though to test a chuck of code?&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Hi,&lt;BR /&gt;&lt;BR /&gt;To use Intel Architecture Code Analyzer you need to compile your source with start and end marks and than run the analyzer tool on it.&lt;BR /&gt;&lt;BR /&gt;for example:&lt;BR /&gt;Source main.c:&lt;BR /&gt;#include "iacaMarks.h"&lt;BR /&gt;&lt;BR /&gt;int main(){&lt;BR /&gt;&lt;BR /&gt;IACA_START&lt;BR /&gt;__asm vandps xmm0, xmm0, xmm1&lt;BR /&gt;IACA_END&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;Compile the source using AVX supported Compiler (lets assume you've created a main.exe file).&lt;BR /&gt;&lt;BR /&gt;now run iaca -o main.iaca.txt main.exe and the expected output will be written to the file main.iaca.txt.&lt;BR /&gt;&lt;BR /&gt;I suggest to use -o option instead of redirection (&amp;gt;) of the output as the tool truncate the lines at char 80 when the output is  the screen. &lt;BR /&gt;&lt;SPAN class="sectionBodyText"&gt;&lt;/SPAN&gt;&lt;BR /&gt;For further details please refer to the Intel Architecture Code Analyzer - User Guide Rev 1.1 available on the &lt;A href="http://software.intel.com/en-us/articles/intel-architecture-code-analyzer-download/"&gt;download page&lt;/A&gt;.&lt;BR /&gt;&lt;BR /&gt;Tal&lt;BR /&gt;</description>
      <pubDate>Mon, 26 Oct 2009 15:29:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Out-of-order-execution/m-p/910197#M2938</guid>
      <dc:creator>Tal_U_Intel</dc:creator>
      <dc:date>2009-10-26T15:29:44Z</dc:date>
    </item>
    <item>
      <title>Re: Out of order execution</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Out-of-order-execution/m-p/910198#M2939</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/265147"&gt;Tal Uliel (Intel)&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;&lt;BR /&gt;Hi,&lt;BR /&gt;&lt;BR /&gt;To use Intel Architecture Code Analyzer you need to compile your source with start and end marks and than run the analyzer tool on it.&lt;BR /&gt;&lt;BR /&gt;for example:&lt;BR /&gt;Source main.c:&lt;BR /&gt;#include "iacaMarks.h"&lt;BR /&gt;&lt;BR /&gt;int main(){&lt;BR /&gt;&lt;BR /&gt;IACA_START&lt;BR /&gt;__asm vandps xmm0, xmm0, xmm1&lt;BR /&gt;IACA_END&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;Compile the source using AVX supported Compiler (lets assume you've created a main.exe file).&lt;BR /&gt;&lt;BR /&gt;now run iaca -o main.iaca.txt main.exe and the expected output will be written to the file main.iaca.txt.&lt;BR /&gt;&lt;BR /&gt;I suggest to use -o option instead of redirection (&amp;gt;) of the output as the tool truncate the lines at char 80 when the output is the screen. &lt;BR /&gt;&lt;BR /&gt;For further details please refer to the Intel Architecture Code Analyzer - User Guide Rev 1.1 available on the &lt;A href="http://software.intel.com/en-us/articles/intel-architecture-code-analyzer-download/"&gt;download page&lt;/A&gt;.&lt;BR /&gt;&lt;BR /&gt;Tal&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Tal, thanks -I just put the mov ebx,111 ... ... mov ebx,222 around the instructions, and everything worked.&lt;BR /&gt;areid, Could you tell me where you got the 17 and 19 from; the analyzer for the nehalem architecture predicted a 15 cycle total throughput for method2 and 12 for method 1.</description>
      <pubDate>Tue, 27 Oct 2009 22:56:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Out-of-order-execution/m-p/910198#M2939</guid>
      <dc:creator>tthsqe</dc:creator>
      <dc:date>2009-10-27T22:56:43Z</dc:date>
    </item>
  </channel>
</rss>

