<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic MOVAPS alignment problem in Intel® ISA Extensions</title>
    <link>https://community.intel.com/t5/Intel-ISA-Extensions/MOVAPS-alignment-problem/m-p/820772#M1129</link>
    <description>Hello!&lt;BR /&gt;I 'am trying to "port" my java special functions class to pure x86 assembly.In my project i use SSE and SSE2instrustion setoperating on fp REAL4 values.I would like to use movaps instruction because of timing (less cpi than movups),but my program crashes with "access violation" error.While debugging i have found thanerror is caused by movaps instruction trying to access stack values local to the procedure(addressed by ebp-n) ebp is decremented by multiplies of 16.When i use movups the problem is absent.I tried to add align 16 directive but it does not work , so i'am stuck to less efficient instruction.&lt;BR /&gt;Here is my code snippet which calculates a few term of e^x taylor expansion.&lt;BR /&gt;[bash] movaps xmm0,one ;movaps works perfectly while accessing memory
 addps xmm0,argument ;1+x xmm0 accumulator
 mov eax,OFFSET coef1
 movaps xmm1,[eax]
 rcpps xmm2,xmm1 ;1/coef1
 movaps xmm3,argument
 mulps xmm3,xmm3 ;x^2
 
 movups [ebp-16],xmm3 ;store x^2 ;here movaps crashes program
  mulps xmm2,xmm3
 addps xmm0,xmm2 ;1+x+x^2/2! xmm0 accumulator
 mov eax,OFFSET coef2
 movups xmm1,[eax]
 rcpps xmm2,xmm1 ;1/coef2
 movups xmm7,argument
 movups xmm3,[ebp-16]
 mulps xmm3,xmm7 ;x^3
 movups [ebp-32],xmm3 ;store x^3
 mulps xmm2,xmm3
 addps xmm0,xmm2 ;1+x+x^2/2!+x^3/3! xmm0 accumulator&lt;BR /&gt;[/bash]</description>
    <pubDate>Thu, 17 May 2012 15:31:31 GMT</pubDate>
    <dc:creator>Bernard</dc:creator>
    <dc:date>2012-05-17T15:31:31Z</dc:date>
    <item>
      <title>MOVAPS alignment problem</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/MOVAPS-alignment-problem/m-p/820772#M1129</link>
      <description>Hello!&lt;BR /&gt;I 'am trying to "port" my java special functions class to pure x86 assembly.In my project i use SSE and SSE2instrustion setoperating on fp REAL4 values.I would like to use movaps instruction because of timing (less cpi than movups),but my program crashes with "access violation" error.While debugging i have found thanerror is caused by movaps instruction trying to access stack values local to the procedure(addressed by ebp-n) ebp is decremented by multiplies of 16.When i use movups the problem is absent.I tried to add align 16 directive but it does not work , so i'am stuck to less efficient instruction.&lt;BR /&gt;Here is my code snippet which calculates a few term of e^x taylor expansion.&lt;BR /&gt;[bash] movaps xmm0,one ;movaps works perfectly while accessing memory
 addps xmm0,argument ;1+x xmm0 accumulator
 mov eax,OFFSET coef1
 movaps xmm1,[eax]
 rcpps xmm2,xmm1 ;1/coef1
 movaps xmm3,argument
 mulps xmm3,xmm3 ;x^2
 
 movups [ebp-16],xmm3 ;store x^2 ;here movaps crashes program
  mulps xmm2,xmm3
 addps xmm0,xmm2 ;1+x+x^2/2! xmm0 accumulator
 mov eax,OFFSET coef2
 movups xmm1,[eax]
 rcpps xmm2,xmm1 ;1/coef2
 movups xmm7,argument
 movups xmm3,[ebp-16]
 mulps xmm3,xmm7 ;x^3
 movups [ebp-32],xmm3 ;store x^3
 mulps xmm2,xmm3
 addps xmm0,xmm2 ;1+x+x^2/2!+x^3/3! xmm0 accumulator&lt;BR /&gt;[/bash]</description>
      <pubDate>Thu, 17 May 2012 15:31:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/MOVAPS-alignment-problem/m-p/820772#M1129</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2012-05-17T15:31:31Z</dc:date>
    </item>
    <item>
      <title>MOVAPS alignment problem</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/MOVAPS-alignment-problem/m-p/820773#M1130</link>
      <description>&lt;P&gt;short answer: you don't need to bother about MOVAPS vs MOVUPS loads /stores&lt;BR /&gt;&lt;BR /&gt;long answer: although you could make an efforts to align your stack (e.g. adding AND EBP, 0xfffffff0), MOVUPS has been as fast as MOVAPS for 4 generations of Intel CPU's now, you are only really penalized when store/load crosses page boundaries (relatively rare case); also stores and subsequent loads from stack are handled by a shortcut called store-to-load forwarding mechanism without cache interaction. Perf bottlenecks are most certainly elsewhere for this code.&lt;BR /&gt;&lt;BR /&gt;-Max&lt;/P&gt;</description>
      <pubDate>Thu, 17 May 2012 17:54:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/MOVAPS-alignment-problem/m-p/820773#M1130</guid>
      <dc:creator>Max_L</dc:creator>
      <dc:date>2012-05-17T17:54:29Z</dc:date>
    </item>
    <item>
      <title>MOVAPS alignment problem</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/MOVAPS-alignment-problem/m-p/820774#M1131</link>
      <description>Thank You very much.</description>
      <pubDate>Thu, 17 May 2012 18:08:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/MOVAPS-alignment-problem/m-p/820774#M1131</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2012-05-17T18:08:41Z</dc:date>
    </item>
  </channel>
</rss>

