<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Time to revisit REP;MOVS in Intel® Fortran Compiler</title>
    <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Time-to-revisit-REP-MOVS/m-p/796390#M35472</link>
    <description>&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Arial" size="2"&gt;Seth,&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Arial" size="2"&gt;Thanks for taking the time to reply. I do have a few observations:&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Arial" size="2"&gt;It is often observed (as in this case) that there is a dichotomy between what is envisioned and what is practiced. I will assume your perspective is that of a processor engineer familiar with the internal microcode. You presumably have mastered assembly language and may have a good grasp of C/C++. As such, your expectations (what you envision) of memory move operations are focused on what is performed by way of a subroutine call (memmove, memcpy). What is practiced is quite different. In Intel Visual Fortran the practice is to place the generalized memory move code in-line. This is done in an effort to optimize the shorter run memory moves by eliminating the overhead of a subroutine call. The consequence of this is code bloat and the side effect of adversely affecting the instruction cache.&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Arial" size="2"&gt;While it would be easy for you to prescribe to the compiler writers use subroutines in practice they cannot, because to do so might place the generated code at a competitive disadvantage or would be contrary to the user dictates of produce fastest possible code.&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT size="2"&gt;&lt;SPAN style="COLOR: black"&gt;&lt;FONT face="Arial"&gt;RE: &lt;/FONT&gt;&lt;/SPAN&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: Verdana"&gt;The suggestions made in this post of testing alignment, internal buffering and custom multiple code paths are very painful in micro-code, and the performance you obtain is often worse that what you would end up with if you just wrote that memory copy subroutine.&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: Verdana"&gt;&lt;P&gt;&lt;FONT size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT size="2"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: Verdana"&gt;This is only very painful once. And only painful if your threshold for pain is quite low. In my opinion, the effort to add some of the SSE3 instructions must be more complex than creating efficient REP MOVSx instructions. Should be no worse than root-canal.&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Arial" size="2"&gt;As suggested in my earlier post, create a study to examine some real world applications. My preference is in scientific computing using IVF but realistically you will have to include commercial packages regardless of the language. I will venture to guess that the preponderance of the in-line moves are good candidates for MOVSD or MOVSQ. i.e. the source and target strings for MOVSD are almost always dword aligned 
and the source and target strings for MOVSQ are generally qword aligned but are almost always dword aligned. Therefore, the concentration of the microcode should be to optimize dword aligned REP MOVSD. This will simplify the logic in the microcode. The compiler writers of IVF could easily integrate REP MOVSD as a code generation option. Later implementations of microcode could address other alignment circumstances.&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;I style="mso-bidi-font-style: normal"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Arial"&gt;If source or target not dword aligned  branch to old MOVSD microcode&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/I&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;I style="mso-bidi-font-style: normal"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/I&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;I style="mso-bidi-font-style: normal"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Arial"&gt;Initialize cache such that next items read/written are marked as the Least Recently Used&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/I&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;I style="mso-bidi-font-style: normal"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Arial"&gt;i.e. first to be retired. The intention is to not permit large memory moves to flush the data cache.&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/I&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;I style="mso-bidi-font-style: normal"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/I&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;I style="mso-bidi-font-style: normal"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Arial"&gt;Use an internal buffer of memory width (16 bytes now, 32 later, 64 whatever) to align writes to memory width (first write potentially a read/modify/write to accommodate skewed data).&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/I&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;I style="mso-bidi-font-style: normal"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/I&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;I style="mso-bidi-font-style: normal"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Arial"&gt;The microcode optimally performs prefetch.&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/I&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;I style="mso-bidi-font-style: normal"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/I&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;I style="mso-bidi-font-style: normal"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Arial"&gt;During move, perform interrupt early-exit after write, as presumably fetch of next input line is in progress. As with current REP MOVSD the instruction is interruptible.&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/I&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Arial" size="2"&gt;The advantages of performing the very painful task in microcode as opposed to subroutine or in-line are:&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Arial" size="2"&gt;The assembler code is not required to determine the underlying cache or memory width. True, a compiler option could specify the number of bytes for memory width. The width specified for current processors now may not be true for next generation processors later. Yes, the
 runtime system code could store the optimal width for use by the generated code but this then requires an additional memory read and use of register. For small transfers the overhead would be burdensome. Accommodating the cache/memory width would be attainable with a subroutine but incorporating this in-line is questionable.&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Arial" size="2"&gt;Determining the optimal prefetch distance is problematic in user code but well understood for a given processor design. The runtime system startup code could make this determination. A generalized memmove subroutine could take advantage of this but this adds to the subroutine initialization overhead. In-line code would likely not be able to take advantage of prefetch.&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Arial" size="2"&gt;Jim Dempsey&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 15 Aug 2006 00:55:02 GMT</pubDate>
    <dc:creator>jimdempseyatthecove</dc:creator>
    <dc:date>2006-08-15T00:55:02Z</dc:date>
    <item>
      <title>Time to revisit REP;MOVS</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Time-to-revisit-REP-MOVS/m-p/796377#M35459</link>
      <description>&lt;P&gt;My current programming language is Fortran (IVF). In looking at the disassembly window I notice an inordinent amount of overhead (code) generated to test for and to take advantage of memory alignment issues in particular for memory moves. Example&lt;/P&gt;
&lt;P&gt;real(8), pointer :: pFoo(:)&lt;BR /&gt;...&lt;BR /&gt; pFoo =&amp;gt; pSomewhere.aFoo ! make pointer local&lt;/P&gt;
&lt;P&gt;or&lt;/P&gt;
&lt;P&gt;real(8), pointer:: arrayA(:), arrayB(:)&lt;BR /&gt;...&lt;BR /&gt; arrayA = arrayB ! copy contents of array B to A&lt;/P&gt;
&lt;P&gt;or&lt;/P&gt;
&lt;P&gt; do I=1,size(arrayB)&lt;BR /&gt; arrayA(I) = arrayB(I)&lt;BR /&gt; end do&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;The above loop being unrolled by the compiler.&lt;BR /&gt;&lt;BR /&gt;It would seem to me that all the hoop jumping for memory alignment, as well as, optimal loop unrolling could be performed by the processor executing a&lt;/P&gt;
&lt;P&gt; REP; MOVS&lt;/P&gt;
&lt;P&gt;The processor could even be coded to handle unaligned moves optimally.&lt;/P&gt;
&lt;P&gt;I realize this may require saving and restoringESI and EDI as well as potentiallythe DF (depending on the rules of engagement). But, if you look at the code generated by the compiler to optimizemoves you will see it is (code wise) a better deal to use the REP; MOVS.&lt;/P&gt;
&lt;P&gt;I do realize that REP; MOVS is currently much slower than the code to test for and perform the alignment then perform a faster internal loopmov perhaps including SSE3 instructions. But, the "slowness" is only due to the lack of attention, by the processor engineers, to the REP; MOVS technique of moving data. Internally, even for byte moves, the processor could move multiple bytes (16) per iteration as well as perform alignment via temporary internal storage.&lt;/P&gt;
&lt;P&gt;Because Intel produces both Processors and Compilers it would seem that by incorporating this into both products would give you a temporary edge over AMD (as it will take them time to revamp their processors).&lt;/P&gt;
&lt;P&gt;Jim Dempsey&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 27 Jul 2006 01:21:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Time-to-revisit-REP-MOVS/m-p/796377#M35459</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2006-07-27T01:21:04Z</dc:date>
    </item>
    <item>
      <title>Re: Time to revisit REP;MOVS</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Time-to-revisit-REP-MOVS/m-p/796378#M35460</link>
      <description>the MOVS instructions arent quite as super as they may seem. On most of the newer CPU's, a simple loop moves memory faster than MOVS. Plus memory hasnt kept up with CPU speeds, so often the bottleneck is the memory, not the CPU.&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 28 Jul 2006 04:53:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Time-to-revisit-REP-MOVS/m-p/796378#M35460</guid>
      <dc:creator>grg99</dc:creator>
      <dc:date>2006-07-28T04:53:19Z</dc:date>
    </item>
    <item>
      <title>Re: Time to revisit REP;MOVS</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Time-to-revisit-REP-MOVS/m-p/796379#M35461</link>
      <description>As grg99 says, the simple loop, especially unrolled., is faster. Believe me, this is the sort of thing we pay VERY close attention to.&lt;BR /&gt;</description>
      <pubDate>Fri, 28 Jul 2006 08:10:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Time-to-revisit-REP-MOVS/m-p/796379#M35461</guid>
      <dc:creator>Steven_L_Intel1</dc:creator>
      <dc:date>2006-07-28T08:10:05Z</dc:date>
    </item>
    <item>
      <title>Re: Time to revisit REP;MOVS</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Time-to-revisit-REP-MOVS/m-p/796380#M35462</link>
      <description>&lt;P&gt;Once upon a time, back when it was Digital Fortran, I dumped the assembler code for a move and noticed that it started with a little adjustment and ended with a little wrapup, both processes set up such that the bulk of the move was done by busloads.&lt;/P&gt;
&lt;P&gt;I threw all of my assembler routines away.&lt;/P&gt;
&lt;P&gt;Bruce&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 28 Jul 2006 21:09:49 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Time-to-revisit-REP-MOVS/m-p/796380#M35462</guid>
      <dc:creator>dbruceg</dc:creator>
      <dc:date>2006-07-28T21:09:49Z</dc:date>
    </item>
    <item>
      <title>Re: Time to revisit REP;MOVS</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Time-to-revisit-REP-MOVS/m-p/796381#M35463</link>
      <description>&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black; FONT-FAMILY: 'Courier New'; mso-fareast-font-family: 'Times New Roman'; mso-bidi-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA"&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt;Apparently you guys did not read the part about:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt;But, the "slowness" is only due to the lack of attention, by the processor engineers. &lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt;Below my response are examples of the hoop jumping to perform the moves (IVF 9.1).&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt;The REP;MOVS on the other had could be optimized internally to the processor such that even for unaligned source and target addresses that the data is pipelined in and out such that, for large enough moves, the reads and writes iterate at 16 bytes (or whatever the width of the memory is).&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt;A lot of good effort was spent by the processor design team to optimize the pipelines (out of order read and write, branch prediction, register remapping, other tricks of the trade), and optimize FPU performance. By comparison it would be a relatively trivial task to optimize REP;MOVS to optimally move the data, even unaligned data, from point A to point B. &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt;Unless an interrupt or fault occurs the REP;MOVS could temporarily buffer the unaligned overlap such that each read and write occurs at memory bus width (except potentially the first and last). If an interrupt or fault occurs during the MOVS the ESI, EDI, ECX and EIP are set appropriately for later resumption.&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY:
 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt;Note, &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt;All too often the source and target data is (are) not optimally aligned for memory bus width moves. The current method of having the IA32/IA64 determine alignment and choose appropriate path cannot optimally handle misaligned data. A rewrite of REP;MOVS could.&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt;Additionally, the code bloat of having the IA32/IA64 determine alignment and choose appropriate path severely taxes the instruction cache.&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt;Finally, should the next generation processorincrease the memory width, the new old code with REP;MOVS runs faster. The old old code without REP;MOVS would potentially be slower.&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt;As it stands now&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt;Hardware design team: Nobody uses the stinkn REP;MOVS.&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt;Software design team: The REP;MOVS is too stinkn slow.&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt;While, nobody at Intel has realized the potential performance impact of revisiting REP;MOVS. I implore someone at Intel to do a study of impact onactual applications and not just on the handful of industry benchmarks.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'; mso-bi
di-font-family: 'Times New Roman'"&gt;Example 1:&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt;real(8), pointer:: arrayA(:), arrayB(:)&lt;BR /&gt;...&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt; do I=1,size(arrayB)&lt;BR /&gt; arrayA(I) = arrayB(I)&lt;BR /&gt; end do&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt;The code generated is:&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004013D8 A1 B8 CF 46 00&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;eax,dword ptr ds:[0046CFB8h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004013DD 89 45 F4&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;dword ptr [ebp-0Ch],eax &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004013E0 85 C0&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;test&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;eax,eax &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004013E2 0F 8E C7 01 00 00 jle&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;004015AF &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004013E8 83 F8 09&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;cmp&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;eax,9 &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004013EB 0F 82 5A 02 00 00 jb&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;0040164B &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;arrayA(I) = arrayB(I)&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004013F1 A1 BC CF 46 00&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;eax,dword ptr ds:[0046CFBCh
] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004013F6 89 45 E4&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;dword ptr [ebp-1Ch],eax &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;do I=1,size(arrayA)&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004013F9 83 F8 08&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;cmp&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;eax,8 &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004013FC 0F 85 1F 02 00 00 jne&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;00401621 &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;arrayA(I) = arrayB(I)&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401402 A1 E4 CF 46 00&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;eax,dword ptr ds:[0046CFE4h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401407 89 45 D4&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;dword ptr [ebp-2Ch],eax &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;do I=1,size(arrayA)&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040140A 83 F8 08&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;cmp&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;eax,8 &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040140D 0F 85 ED 01 00 00 jne&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;00401600 &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401413 8B 55 F4&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;edx,dword ptr [ebp-0Ch] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;arrayA(I) = arrayB(I)&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401416 8B 35 C0 CF 46 00 mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;esi,dword ptr ds:[46CFC0h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040141C 8B 0D E8 CF 46 00 mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;ecx,dword ptr ds:[46CFE8h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;do I=1,size(arrayA)&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401422 89 55 C8&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;dword ptr [ebp-38h],edx &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;arrayA(I) = arrayB(I)&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401425 8B 15 A0 CF 46 00 mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;edx,dword ptr ds:[46CFA0h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040142B 89 4D D0&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;dword ptr [ebp-30h],ecx &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040142E 8B FE&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;edi,esi &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401430 C1 E7 03&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;shl&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;edi,3 &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401433 F7 DF&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;neg&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;amp;n
bsp;&lt;/SPAN&gt;edi&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401435 03 FA&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;add&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;edi,edx &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401437 89 7D C0&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;dword ptr [ebp-40h],edi &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;do I=1,size(arrayA)&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040143A 8D 47 08&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;lea&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;eax,[edi+8] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;arrayA(I) = arrayB(I)&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040143D 8B 3D C8 CF 46 00 mov &lt;SPAN style="mso-spacerun: yes"&gt;&lt;/SPAN&gt;edi,dword ptr ds:[46CFC8h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;do I=1,size(arrayA)&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401443 83 E0 0F&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;and&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;eax,0Fh &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401446 89 45 CC&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;dword ptr [ebp-34h],eax &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401449 C1 E1 03&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;shl&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;ecx,3 &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040144C F7 D9&lt;SPAN style="mso-spacerun: yes"&gt;&amp;amp;n
bsp; &lt;/SPAN&gt;neg&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;ecx&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040144E 03 CF&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;add&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;ecx,edi &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401450 89 4D F8&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;dword ptr [ebp-8],ecx &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401453 8D 49 08&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;lea&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;ecx,[ecx+8] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401456 89 4D C4&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;dword ptr [ebp-3Ch],ecx &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401459 85 C0&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;test &lt;SPAN style="mso-spacerun: yes"&gt;&lt;/SPAN&gt;eax,eax &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040145B 74 2E&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;je&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;0040148B &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040145D A8 07&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;test&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;al,7 &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040145F 0F 85 94 01 00 00 jne&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;004015F9 &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401465 8B 45 F8&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;eax,dword ptr [ebp-8] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" st="" yle="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;arrayA(I) = arrayB(I)&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401468 F2 0F 10 40 08&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;movsd&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;xmm0,mmword ptr [eax+8] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;do I=1,size(arrayA)&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040146D 8D 48 10&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;lea&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;ecx,[eax+10h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;arrayA(I) = arrayB(I)&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401470 8B 45 C0&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;eax,dword ptr [ebp-40h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401473 F2 0F 11 40 08&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;movsd&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mmword ptr [eax+8],xmm0 &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;do I=1,size(arrayA)&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401478 89 4D C4&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;dword ptr [ebp-3Ch],ecx &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040147B 8B 4D F4&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;ecx,dword ptr [ebp-0Ch] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040147E 8D 49 FF&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;lea&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;ecx,[ecx-1] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401481 89 4D C8&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;dword ptr [ebp-38h],ecx &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401484 B9 01 00 00 00&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;ecx,1 &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401489 EB 02&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;jmp&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;0040148D &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040148B 33 C9&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;xor&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;ecx,ecx &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040148D 8B 45 C8&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;eax,dword ptr [ebp-38h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401490 83 E0 07&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;and&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;eax,7 &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401493 F7 D8&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;neg&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;eax&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401495 03 45 F4&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;add&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;eax,dword ptr [ebp-0Ch] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401498 89 45 C8&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;dword ptr [ebp-38h],eax &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040149B 8B 45 C4&lt;SPAN style="mso-space
run: yes"&gt; &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;eax,dword ptr [ebp-3Ch] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040149E A8 0F&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;test&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;al,0Fh &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004014A0 75 4E&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;jne&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;004014F0 &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004014A2 8B 45 F8&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;eax,dword ptr [ebp-8] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004014A5 89 7D DC&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;dword ptr [ebp-24h],edi &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004014A8 8B 7D C8&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;edi,dword ptr [ebp-38h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004014AB 89 75 F0&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;dword ptr [ebp-10h],esi &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004014AE 8B 75 C0&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;esi,dword ptr [ebp-40h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;arrayA(I) = arrayB(I)&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004014B1 66 0F 28 44 C8 08 movapd&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&lt;/SPAN&gt;xmm0,xmmword ptr [eax+ecx*8+8] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-lay
out-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004014B7 66 0F 29 44 CE 08 movapd&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;xmmword ptr [esi+ecx*8+8],xmm0 &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004014BD 66 0F 28 4C C8 18 movapd&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;xmm1,xmmword ptr [eax+ecx*8+18h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004014C3 66 0F 29 4C CE 18 movapd&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;xmmword ptr [esi+ecx*8+18h],xmm1 &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004014C9 66 0F 28 54 C8 28 movapd&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;xmm2,xmmword ptr [eax+ecx*8+28h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004014CF 66 0F 29 54 CE 28 movapd&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;xmmword ptr [esi+ecx*8+28h],xmm2 &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004014D5 66 0F 28 5C C8 38 movapd&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;xmm3,xmmword ptr [eax+ecx*8+38h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004014DB 66 0F 29 5C CE 38 movapd&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&lt;/SPAN&gt;xmmword ptr [esi+ecx*8+38h],xmm3 &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;do I=1,size(arrayA)&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004014E1 83 C1 08&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;add&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;ecx,8 &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;end do&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004014E4 3B CF&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;cmp&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;ecx,edi &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004014E6 72 C9&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;jb&lt;SPAN style="mso-spacerun: yes"&gt;&amp;amp;
nbsp; &lt;/SPAN&gt;004014B1 &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004014E8 8B 7D DC&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;edi,dword ptr [ebp-24h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004014EB 8B 75 F0&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;esi,dword ptr [ebp-10h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004014EE EB 64&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;jmp&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;00401554 &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004014F0 8B 45 F8&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;eax,dword ptr [ebp-8] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004014F3 89 7D DC&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;dword ptr [ebp-24h],edi &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004014F6 8B 7D C8&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;edi,dword ptr [ebp-38h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004014F9 89 75 F0&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;dword ptr [ebp-10h],esi &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004014FC 8B 75 C0&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;esi,dword ptr [ebp-40h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;arrayA(I) = arrayB(I)&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004014FF F2 0F 10 44 C8 08 movs
d&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;xmm0,mmword ptr [eax+ecx*8+8] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401505 66 0F 16 44 C8 10 movhpd&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;xmm0,qword ptr [eax+ecx*8+10h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040150B 66 0F 29 44 CE 08 movapd&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;xmmword ptr [esi+ecx*8+8],xmm0 &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401511 F2 0F 10 4C C8 18 movsd&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;xmm1,mmword ptr [eax+ecx*8+18h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401517 66 0F 16 4C C8 20 movhpd&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;xmm1,qword ptr [eax+ecx*8+20h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040151D 66 0F 29 4C CE 18 movapd&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;xmmword ptr [esi+ecx*8+18h],xmm1 &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401523 F2 0F 10 54 C8 28 movsd&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;xmm2,mmword ptr [eax+ecx*8+28h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401529 66 0F 16 54 C8 30 movhpd&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;xmm2,qword ptr [eax+ecx*8+30h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040152F 66 0F 29 54 CE 28 movapd&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;xmmword ptr [esi+ecx*8+28h],xmm2 &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401535 F2 0F 10 5C C8 38 movsd&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;xmm3,mmword ptr [eax+ecx*8+38h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040153B 66 0F 16 5C C8 40 movhpd&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;xmm3,qword ptr [eax+ecx*8+40h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401541 66 0F 29 5C CE 38 movapd&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;xmmword ptr [esi+ecx*8+38h],xmm3 &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;amp;nb
sp; &lt;/SPAN&gt;do I=1,size(arrayA)&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401547 83 C1 08&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;add&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;ecx,8 &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040154A 3B CF&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;cmp&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;ecx,edi &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040154C 72 B1&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;jb&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;004014FF &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040154E 8B 7D DC&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;edi,dword ptr [ebp-24h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401551 8B 75 F0&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;esi,dword ptr [ebp-10h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401554 8B 45 F4&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;eax,dword ptr [ebp-0Ch] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401557 3B C8&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;cmp&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;ecx,eax &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401559 73 54&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;jae&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;004015AF &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040155B 0F AF 75 E4&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;imul&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;esi,dword ptr [ebp-1Ch] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layo
ut-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040155F 8B 45 D4&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;eax,dword ptr [ebp-2Ch] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401562 89 4D CC&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;dword ptr [ebp-34h],ecx &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401565 8B 4D D0&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;ecx,dword ptr [ebp-30h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401568 0F AF C8&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;imul&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;ecx,eax &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040156B 2B D6&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;sub&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;edx,esi &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040156D 2B F9&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;sub&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;edi,ecx &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040156F 8B 75 E4&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;esi,dword ptr [ebp-1Ch] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401572 8B 4D CC&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;ecx,dword ptr [ebp-34h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401575 03 F8&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;add&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;edi,eax &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401577 03 D6&lt;SPAN style="mso-spacerun: yes"&gt;&amp;amp;nb
sp; &lt;/SPAN&gt;add&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;edx,esi &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401579 89 55 D8&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;dword ptr [ebp-28h],edx &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040157C 8B D1&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;edx,ecx &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040157E 0F AF D0&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;imul&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&lt;/SPAN&gt;edx,eax &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401581 8B C1&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;eax,ecx &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401583 0F AF C6&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;imul&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;eax,esi &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401586 89 55 C8&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;dword ptr [ebp-38h],edx &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401589 8B 55 D8&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;edx,dword ptr [ebp-28h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040158C 89 55 D8&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;dword ptr [ebp-28h],edx &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040158F 8B 55 C8&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;edx,dwor
d ptr [ebp-38h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;arrayA(I) = arrayB(I)&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401592 8B 75 D8&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;esi,dword ptr [ebp-28h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401595 F2 0F 10 04 3A&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;movsd&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&lt;/SPAN&gt;xmm0,mmword ptr [edx+edi] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;do I=1,size(arrayA)&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040159A 03 55 D4&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;add&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;edx,dword ptr [ebp-2Ch] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;arrayA(I) = arrayB(I)&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040159D F2 0F 11 04 30&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;movsd&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mmword ptr [eax+esi],xmm0 &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;do I=1,size(arrayA)&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004015A2 03 45 E4&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&lt;/SPAN&gt;add&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;eax,dword ptr [ebp-1Ch] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004015A5 8B 75 F4&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;esi,dword ptr [ebp-0Ch] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004015A8 83 C1 01&lt;SPAN style="mso-spacerun: yes"&gt;
 &lt;/SPAN&gt;add&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;ecx,1 &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004015AB 3B CE&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;cmp&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;ecx,esi &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;004015AD 72 E3&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;jb&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;00401592 &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: 'Times New Roman'"&gt; discontiguous code followed by remainder of do loop&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401600 8B 35 C0 CF 46 00 mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;esi,dword ptr ds:[46CFC0h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401606 A1 E8 CF 46 00&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;eax,dword ptr ds:[0046CFE8h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040160B 8B 3D C8 CF 46 00 mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;edi,dword ptr ds:[46CFC8h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401611 8B 15 A0 CF 46 00 mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;edx,dword ptr ds:[46CFA0h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401617 33 C9&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;xor&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;ecx,ecx &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401619 89 45 D0&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;dword ptr [ebp-30h],eax &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040161C E9 3A FF FF FF&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;jmp&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;0040155B &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align
: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401621 8B 35 C0 CF 46 00 mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;esi,dword ptr ds:[46CFC0h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401627 A1 E8 CF 46 00&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;eax,dword ptr ds:[0046CFE8h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040162C 8B 15 E4 CF 46 00 mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;edx,dword ptr ds:[46CFE4h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401632 8B 3D C8 CF 46 00 mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;edi,dword ptr ds:[46CFC8h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401638 33 C9&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;xor &lt;SPAN style="mso-spacerun: yes"&gt;&lt;/SPAN&gt;ecx,ecx &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040163A 89 45 D0&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;dword ptr [ebp-30h],eax &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;0040163D 89 55 D4&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;dword ptr [ebp-2Ch],edx &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: 'Courier New'"&gt;00401640 8B 15 A0 CF 46 00 mov&lt;SPAN style="mso-spacerun: yes"&gt; &lt;/SPAN&gt;edx,dword ptr ds:[46CFA0h] &lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-"&gt;&lt;/P&gt;&lt;/SPAN&gt;</description>
      <pubDate>Sat, 29 Jul 2006 00:52:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Time-to-revisit-REP-MOVS/m-p/796381#M35463</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2006-07-29T00:52:25Z</dc:date>
    </item>
    <item>
      <title>Re: Time to revisit REP;MOVS</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Time-to-revisit-REP-MOVS/m-p/796382#M35464</link>
      <description>I suggest filing a request with Intel Premier Support. That way it can get properly directed.&lt;BR /&gt;</description>
      <pubDate>Sat, 29 Jul 2006 02:15:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Time-to-revisit-REP-MOVS/m-p/796382#M35464</guid>
      <dc:creator>Steven_L_Intel1</dc:creator>
      <dc:date>2006-07-29T02:15:27Z</dc:date>
    </item>
    <item>
      <title>Re: Time to revisit REP;MOVS</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Time-to-revisit-REP-MOVS/m-p/796383#M35465</link>
      <description>&lt;P&gt;Sorry, Jim. I missed your point at first. I imagine the problem originated with the first step up from the 8080/Z80 into segmented memory. It seems to me that one &lt;EM&gt;should&lt;/EM&gt; be able to latch the whole bus to an &lt;EM&gt;arbitrary byte address&lt;/EM&gt;, thereby &lt;EM&gt;eliminating&lt;/EM&gt; the alignment problem, but that statement may just underscore my ignorance about hardware issues.&lt;/P&gt;
&lt;P&gt;More to the point, I'm quite probably suffering from the "REP;MOVS" problem in a stream I/O buffering system I just cobbled up. The idea of the buffering system is to be able to handle both very short and very long I/O operations. If I write, say, one I4 variable followed by a big R8 array, then if the length of the second aray exceeds the buffer length, I flush the integer in the buffer then write the big array directly. I probably take a big hit hereif IVF moves thedata to a C stream buffer, where it will be misaligned, but nobody seems to know much about how that works.&lt;/P&gt;
&lt;P&gt;When I &lt;EM&gt;read&lt;/EM&gt; the stream, I probably take &lt;EM&gt;two&lt;/EM&gt; hits - one moving the misaligned data from the C stream buffer to my own buffer, and another reading from my buffer.&lt;/P&gt;
&lt;P&gt;I could eliminate both of these problems by padding in the write buffer and skipping in the read buffer. Question - might I see a significant speed increase if I did so?&lt;/P&gt;
&lt;P&gt;Bruce&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 01 Aug 2006 22:22:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Time-to-revisit-REP-MOVS/m-p/796383#M35465</guid>
      <dc:creator>dbruceg</dc:creator>
      <dc:date>2006-08-01T22:22:13Z</dc:date>
    </item>
    <item>
      <title>Re: Time to revisit REP;MOVS</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Time-to-revisit-REP-MOVS/m-p/796384#M35466</link>
      <description>Perhaps you don't understand why speeding up the MOVxx instructions is unnecessary and very difficult.&lt;BR /&gt;&lt;BR /&gt;In the original 8088 and 8086 and 80286 and mayube even the 80386, all instructions were microcoded. That means for each instruction, a little microprogram was run. Microprogramming makes CPU design much easier, as the chip designer doesnt have to design hard-wired circuitry for each and every operation. &lt;BR /&gt;&lt;BR /&gt;But the downside is that every operation, no matter how simple, takes several micro-clock cycles.&lt;BR /&gt;&lt;BR /&gt;About the time of the 486, the designers realized that simple instructions, like move byte or move word, could be implemented in hardwareand be many times faster than running the microcode. But there was only a limited amount of chip space for this, so only some of the simpler instructions got the fast treatment-- the rest remained in microcode. That's when MOVxx fell behind-- it got left in microcode.&lt;BR /&gt;&lt;BR /&gt;Then with the Pentium things got worse-- moving memory not only happened in hardware, the Pentium had TWO pipes, both of which could do MOV's. So now you had TWO and FAST memory movers, compared to just ONE microcoded MOVxx. &lt;BR /&gt;And Oh, the new floating-point load and store instructions are also faster than MOVxx. And starting up a MOVxx microcoded instruction stops the other pipe, so it's a double loss-- MOVxx runs slower than the simpler instructions, and MOVxx prevents any other instructions from getting into the other pipe. Double-ungood.&lt;BR /&gt;&lt;BR /&gt;By this time most programmers and run0-time libraries realized this and changed their block-move routines to use the faster instructions. Now there's even less incentive to speed up MOVxx, as very few programs use it, and it has a huge handicap to overcome.&lt;BR /&gt;&lt;BR /&gt;Alongside this the CPU speeds have been climbing faster than memory speeds, so now it usually doesnt matter, in many cases MOVxx is faster than the memory bus! Not to mention the AMD CPU's have THREE integer CPU units, so you can have even more memory moving going on in each clock cycle. &lt;BR /&gt;&lt;BR /&gt;Write yourself a little test program that uses MOVSB MOVSW, MOVSD, and regular mov instructions (overlapped and unrolled a bit), and also try fp move. I think you'll find a simple mov instruction loop easily saturates the memory bus. &lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 03 Aug 2006 05:09:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Time-to-revisit-REP-MOVS/m-p/796384#M35466</guid>
      <dc:creator>grg99</dc:creator>
      <dc:date>2006-08-03T05:09:41Z</dc:date>
    </item>
    <item>
      <title>Re: Time to revisit REP;MOVS</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Time-to-revisit-REP-MOVS/m-p/796385#M35467</link>
      <description>&lt;P&gt;Thanks. That was very informative, and you're quite right-I was blissfully unaware of all that. But I'm not trying to speed up the MOV instructions.My questionis about half a dozen levels above the microcode.&lt;/P&gt;
&lt;P&gt;One presumably pays a penalty for moving misaligned data. How big is the penalty in Intel Fortran optimized for speed? I can obviously cobble up a bunch of tests to attempt to determine that myself, butsince I don't fully understand the matter, I could easily miss something important.I thought you guys might have a handle on it.&lt;/P&gt;
&lt;P&gt;Bruce&lt;/P&gt;</description>
      <pubDate>Thu, 03 Aug 2006 20:32:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Time-to-revisit-REP-MOVS/m-p/796385#M35467</guid>
      <dc:creator>dbruceg</dc:creator>
      <dc:date>2006-08-03T20:32:07Z</dc:date>
    </item>
    <item>
      <title>Re: Time to revisit REP;MOVS</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Time-to-revisit-REP-MOVS/m-p/796386#M35468</link>
      <description>Have you tried running your program under VTune? It can read the performance counters and see how many times misaligned data was found. As with any sort of performance question, the issue is what portion of the total application time is spent in this move.&lt;BR /&gt;</description>
      <pubDate>Thu, 03 Aug 2006 20:40:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Time-to-revisit-REP-MOVS/m-p/796386#M35468</guid>
      <dc:creator>Steven_L_Intel1</dc:creator>
      <dc:date>2006-08-03T20:40:06Z</dc:date>
    </item>
    <item>
      <title>Re: Time to revisit REP;MOVS</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Time-to-revisit-REP-MOVS/m-p/796387#M35469</link>
      <description>&lt;P&gt;I haven't yet tried VTune, but my project is not quite ready for fine-tuning. It'san FE analysis package with about 30 "applications" operating on million-equation systems. Clocking it crudely (i.e., with SECNDS), I obtain the expected results:I/O is a big factor, as it always has been in such programs. By knocking down the number of physical operations with buffering, I've picked up a LOT of speed, but I/O is STILL the bottleneck. I KNOW that I'm transferring misaligned data to and from the buffers, and I KNOW that those transfers are clocking a respectable amount of time. What I do NOT know is whether or not data alignment will speed it up enough to justify the recoding.&lt;/P&gt;
&lt;P&gt;Will VTune give me any help there? I'm planning on using it eventually anyway.&lt;/P&gt;
&lt;P&gt;Based on your experience, can one say that data misalignment increases data move time by 10%? - or 20%? - or 5%? If I can get about 15%, it would be worth going after.&lt;/P&gt;
&lt;P&gt;Bruce&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 03 Aug 2006 23:37:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Time-to-revisit-REP-MOVS/m-p/796387#M35469</guid>
      <dc:creator>dbruceg</dc:creator>
      <dc:date>2006-08-03T23:37:51Z</dc:date>
    </item>
    <item>
      <title>Re: Time to revisit REP;MOVS</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Time-to-revisit-REP-MOVS/m-p/796388#M35470</link>
      <description>I doubt it would be more than 10% on IA-32, but this is very application dependent. Yes, VTune will help a lot in telling you where to spend your effort and where not to.&lt;BR /&gt;</description>
      <pubDate>Fri, 04 Aug 2006 05:10:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Time-to-revisit-REP-MOVS/m-p/796388#M35470</guid>
      <dc:creator>Steven_L_Intel1</dc:creator>
      <dc:date>2006-08-04T05:10:45Z</dc:date>
    </item>
    <item>
      <title>Re: Time to revisit REP;MOVS</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Time-to-revisit-REP-MOVS/m-p/796389#M35471</link>
      <description>&lt;FONT size="2"&gt;
&lt;P&gt;The topic of this thread may have diverged, but for what it is worth, here is some information on REP MOVSD. Over the past 4 years, I have been involved with the analysis and design of REP MOVSD/STOSD in Intel processors. Let me try to respond to some of the discussion relevant to REP MOVSD/STOSD instructions.&lt;/P&gt;
&lt;P&gt;As has been pointed out, the very small opcode size (two bytes) has a dramatic impact on code size. Unfortunately, REP MOVSD does not always perform as well as custom code sequences. Many people within Intel have pointed this out, and there have been ongoing efforts to do something about it. If you examine the performance on the very latest CPUs (the new Intel Core 2 Duo processors and Xeon Processor 5100 series) you will see some of this effect-- REP MOVS/STOS instructions are substantially faster than they used to be. It is still possible to write code that is faster still, but the performance gap is not as large, and it is a little harder than it used to be to beat REP MOVSD/STOSD. I will put some detail on this at the end of the post.&lt;/P&gt;
&lt;P&gt;Misaligned code is still a big problem. It turns out that dealing with misaligned source and destination buffers is just as hard in the micro-code within the REP MOVSD instruction as it is to deal with in regular code. The suggestions made in this post of testing alignment, internal buffering and custom multiple code paths are very painful in micro-code, and the performance you obtain is often worse that what you would end up with if you just wrote that memory copy subroutine. Furthermore, the REP MOVSD instruction has other constraints-- it must be able to operate correctly when stopped by an interrupt and update all the register values properly, so that the instruction can be restarted from the point it was interrupted. This significantly complicates align/shift micro-code implementations. When we have looked into dealing with alignment in micro-code, we have found the overhead to be prohibitive. The ideas make sense, it seems like it should be easy, but we have found in practice that it gets ugly pretty quick.&lt;/P&gt;
&lt;P&gt;Putting the alignment issues aside, let me elaborate a bit on how REP MOVSD stacks up against a memory copy function you (or the compiler) might write. You can characterize the performance of a copy operation by an overhead and a throughput. Ideally, you would want a REP MOVSD to have a throughput close to the fundamental peak transfer rate to cache, with an overhead close to zero. When you hit the first level cache, your maximum throughput is one load and one store per clock, so it matters how big your load and stores are. REP MOVSD achieves 1 load and one store per clock, and it will actually change the load and store sizes for copies that are "long enough", so that your throughput increases. This is called "fast strings" mode-- you probably have seen information about this in Intel documentation. This fast string mode has its own overhead-- the price you pay for that higher throughput. It is this overhead that determines what is "long enough" to enter the fast strings mode. There are also some other restrictions on entry to fast strings-- some alignments on source and destination, and some restrictions to make sure that fast strings cannot corrupt memory (like if the strings overlap). &lt;/P&gt;
&lt;P&gt;In normal strings mode, the Intel Core 2 Duo processors have cut the overhead of REP MOVSD by about a factor of two (in terms of processor clocks)
 vs. the Pentium 4 processors. The throughput is doubled vs. Pentium 4 processor, again in terms of clocks (well, in terms of bytes per clock). The overhead to enter fast strings is about 1/4 of what it was on the Pentium 4 processor, and the throughput is about double (in terms of clocks).&lt;/P&gt;
&lt;P&gt;If you (or the compiler) write custom copy code, you can obtain highest throughput by using the largest possible size move (subject to alignment concerns), and minimizing your loop overhead, both in terms of the code you write and in terms of getting the best possible branch prediction. For very small copies of fixed size, multiple load/stores with no loop at all have the highest performance. When your data is misaligned, the best code I have seen keeps the loads and stores aligned, using shifts and the like to make it all work out right. Like many coding optimizations, there is a tradeoff with code size and performance that the programmer must address. You can still beat REP MOVSD/STOSD with such code, but as previous posters point out, you use a lot of instructions to do it-- at least if you want to cover all possible alignment cases. If you know something about the alignment or length up front, you can simplify your code considerably.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;</description>
      <pubDate>Sat, 05 Aug 2006 05:08:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Time-to-revisit-REP-MOVS/m-p/796389#M35471</guid>
      <dc:creator>Seth_A_Intel</dc:creator>
      <dc:date>2006-08-05T05:08:33Z</dc:date>
    </item>
    <item>
      <title>Re: Time to revisit REP;MOVS</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Time-to-revisit-REP-MOVS/m-p/796390#M35472</link>
      <description>&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Arial" size="2"&gt;Seth,&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Arial" size="2"&gt;Thanks for taking the time to reply. I do have a few observations:&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Arial" size="2"&gt;It is often observed (as in this case) that there is a dichotomy between what is envisioned and what is practiced. I will assume your perspective is that of a processor engineer familiar with the internal microcode. You presumably have mastered assembly language and may have a good grasp of C/C++. As such, your expectations (what you envision) of memory move operations are focused on what is performed by way of a subroutine call (memmove, memcpy). What is practiced is quite different. In Intel Visual Fortran the practice is to place the generalized memory move code in-line. This is done in an effort to optimize the shorter run memory moves by eliminating the overhead of a subroutine call. The consequence of this is code bloat and the side effect of adversely affecting the instruction cache.&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Arial" size="2"&gt;While it would be easy for you to prescribe to the compiler writers use subroutines in practice they cannot, because to do so might place the generated code at a competitive disadvantage or would be contrary to the user dictates of produce fastest possible code.&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT size="2"&gt;&lt;SPAN style="COLOR: black"&gt;&lt;FONT face="Arial"&gt;RE: &lt;/FONT&gt;&lt;/SPAN&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: Verdana"&gt;The suggestions made in this post of testing alignment, internal buffering and custom multiple code paths are very painful in micro-code, and the performance you obtain is often worse that what you would end up with if you just wrote that memory copy subroutine.&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: Verdana"&gt;&lt;P&gt;&lt;FONT size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT size="2"&gt;&lt;SPAN style="COLOR: black; FONT-FAMILY: Verdana"&gt;This is only very painful once. And only painful if your threshold for pain is quite low. In my opinion, the effort to add some of the SSE3 instructions must be more complex than creating efficient REP MOVSx instructions. Should be no worse than root-canal.&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="COLOR: black"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Arial" size="2"&gt;As suggested in my earlier post, create a study to examine some real world applications. My preference is in scientific computing using IVF but realistically you will have to include commercial packages regardless of the language. I will venture to guess that the preponderance of the in-line moves are good candidates for MOVSD or MOVSQ. i.e. the source and target strings for MOVSD are almost always dword aligned 
and the source and target strings for MOVSQ are generally qword aligned but are almost always dword aligned. Therefore, the concentration of the microcode should be to optimize dword aligned REP MOVSD. This will simplify the logic in the microcode. The compiler writers of IVF could easily integrate REP MOVSD as a code generation option. Later implementations of microcode could address other alignment circumstances.&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;I style="mso-bidi-font-style: normal"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Arial"&gt;If source or target not dword aligned  branch to old MOVSD microcode&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/I&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;I style="mso-bidi-font-style: normal"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/I&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;I style="mso-bidi-font-style: normal"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Arial"&gt;Initialize cache such that next items read/written are marked as the Least Recently Used&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/I&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;I style="mso-bidi-font-style: normal"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Arial"&gt;i.e. first to be retired. The intention is to not permit large memory moves to flush the data cache.&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/I&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;I style="mso-bidi-font-style: normal"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/I&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;I style="mso-bidi-font-style: normal"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Arial"&gt;Use an internal buffer of memory width (16 bytes now, 32 later, 64 whatever) to align writes to memory width (first write potentially a read/modify/write to accommodate skewed data).&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/I&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;I style="mso-bidi-font-style: normal"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/I&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;I style="mso-bidi-font-style: normal"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Arial"&gt;The microcode optimally performs prefetch.&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/I&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;I style="mso-bidi-font-style: normal"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/I&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;I style="mso-bidi-font-style: normal"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Arial"&gt;During move, perform interrupt early-exit after write, as presumably fetch of next input line is in progress. As with current REP MOVSD the instruction is interruptible.&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/I&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Arial" size="2"&gt;The advantages of performing the very painful task in microcode as opposed to subroutine or in-line are:&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Arial" size="2"&gt;The assembler code is not required to determine the underlying cache or memory width. True, a compiler option could specify the number of bytes for memory width. The width specified for current processors now may not be true for next generation processors later. Yes, the
 runtime system code could store the optimal width for use by the generated code but this then requires an additional memory read and use of register. For small transfers the overhead would be burdensome. Accommodating the cache/memory width would be attainable with a subroutine but incorporating this in-line is questionable.&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Arial" size="2"&gt;Determining the optimal prefetch distance is problematic in user code but well understood for a given processor design. The runtime system startup code could make this determination. A generalized memmove subroutine could take advantage of this but this adds to the subroutine initialization overhead. In-line code would likely not be able to take advantage of prefetch.&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face="Arial" size="2"&gt;Jim Dempsey&lt;/FONT&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 15 Aug 2006 00:55:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Time-to-revisit-REP-MOVS/m-p/796390#M35472</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2006-08-15T00:55:02Z</dc:date>
    </item>
  </channel>
</rss>

