<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Difference using SSE on Intel and AMD processors (?) in Intel® ISA Extensions</title>
    <link>https://community.intel.com/t5/Intel-ISA-Extensions/Difference-using-SSE-on-Intel-and-AMD-processors/m-p/910804#M2964</link>
    <description>&lt;P&gt;I think the calling convention for the mmx/xmm registers requires that they be "caller saved" which means that they could be modified by called functions. You have to save them before calls and restore them afterwards. Check the following links:&lt;/P&gt;
&lt;P&gt;Windows:&lt;/P&gt;
&lt;P&gt;&lt;A href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/Kernel_d/hh/Kernel_d/64bitamd_6848c803-89d3-4f19-82b2-6fae5e63ec13.xml.asp"&gt;http://msdn.microsoft.com/library/default.asp?url=/library/en-us/Kernel_d/hh/Kernel_d/64bitamd_6848c803-89d3-4f19-82b2-6fae5e63ec13.xml.asp&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Intel compiler changes:&lt;/P&gt;
&lt;P&gt;&lt;A href="http://www.intel.com/support/performancetools/c/windows/sb/cs-020438.htm"&gt;http://www.intel.com/support/performancetools/c/windows/sb/cs-020438.htm&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Of passing interest:&lt;/P&gt;
&lt;P&gt;SysV AMD64 ABI:&lt;/P&gt;
&lt;P&gt;&lt;A href="http://www.x86-64.org/documentation/abi-0.96.pdf#search=%22linux%20IA32%20ABI%20SSE%22"&gt;http://www.x86-64.org/documentation/abi-0.96.pdf#search=%22linux%20IA32%20ABI%20SSE%22&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 19 Sep 2006 22:53:48 GMT</pubDate>
    <dc:creator>Intel_C_Intel</dc:creator>
    <dc:date>2006-09-19T22:53:48Z</dc:date>
    <item>
      <title>Difference using SSE on Intel and AMD processors (?)</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Difference-using-SSE-on-Intel-and-AMD-processors/m-p/910803#M2963</link>
      <description>&lt;FONT face="Verdana" size="2"&gt;Hi, I have the following problem: I use SSE in database system for selection (sql command &lt;I&gt;select&lt;/I&gt;). I coded it on AMD Duron 1800 MHz processor and it works fine. But when I tested it on Intel Pentium 4 and Pentium D, it gives bad results - it doesn't work properly. In C language we could write it in a simplified way (for sql command "&lt;I&gt;select * from TABLE where VALUE &amp;gt; key&lt;/I&gt;"):&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="Courier New"&gt;&lt;FONT color="#0000ff"&gt;&lt;FONT color="#000000"&gt;----------------------------------&lt;/FONT&gt;&lt;BR /&gt;float &lt;/FONT&gt;*input;    &lt;FONT color="#008000"&gt;// address to input data stored in array&lt;BR /&gt;     // (data page with 820 entries of type float)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#0000ff"&gt;float &lt;/FONT&gt;key;      &lt;FONT color="#008000"&gt;// find key&lt;/FONT&gt;&lt;BR /&gt;...&lt;BR /&gt;&lt;FONT color="#0000ff"&gt;for &lt;/FONT&gt;(i = 0; i &amp;lt; &lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Verdana" size="2"&gt;&lt;FONT face="Courier New"&gt;get_values_count_in_column()&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Verdana" size="2"&gt;&lt;FONT face="Courier New"&gt;; i++)&lt;BR /&gt; {&lt;BR /&gt; &lt;FONT color="#0000ff"&gt;if &lt;/FONT&gt;(input&lt;I&gt; &amp;gt; key)&lt;BR /&gt;  add_row_to_output_table(i);&lt;BR /&gt; }&lt;/I&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;/FONT&gt;&lt;FONT face="Verdana" size="2"&gt;&lt;FONT face="Courier New"&gt;&lt;FONT color="#0000ff"&gt;&lt;FONT color="#000000"&gt;----------------------------------&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="Verdana" size="2"&gt;&lt;BR /&gt;My SSE code is:&lt;BR /&gt;&lt;BR /&gt;&lt;/FONT&gt;&lt;FONT face="Verdana" size="2"&gt;&lt;FONT face="Courier New"&gt;&lt;FONT color="#0000ff"&gt;&lt;FONT color="#000000"&gt;----------------------------------&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="Courier New" size="2"&gt;&lt;FONT color="#0000ff"&gt;float &lt;/FONT&gt;*input;&lt;BR /&gt;&lt;FONT color="#0000ff"&gt;float &lt;/FONT&gt;*key;      &lt;FONT color="#008000"&gt;// addres to find key&lt;/FONT&gt;&lt;BR /&gt;...&lt;BR /&gt;&lt;FONT color="#0000ff"&gt;__asm&lt;/FONT&gt;&lt;BR /&gt; {&lt;BR /&gt; mov esi, input&lt;BR /&gt; mov edi, key&lt;BR /&gt; xor  ecx, ecx            &lt;FONT color="#008000"&gt;// counter&lt;/FONT&gt;&lt;BR /&gt; xor edx, edx&lt;BR /&gt; mov ebx, &lt;/FONT&gt;&lt;FONT face="Verdana" size="2"&gt;&lt;FONT face="Courier New"&gt;values_count_in_column&lt;/FONT&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="Courier New" size="2"&gt;&lt;BR /&gt; movss xmm1, [edi]   &lt;FONT color="#008000"&gt;// xmm1 &amp;lt;- key&lt;/FONT&gt;&lt;BR /&gt; shufps xmm1, xmm1, 0      &lt;FONT color="#008000"&gt;// broadcast&lt;/FONT&gt;&lt;BR /&gt;     &lt;BR /&gt; prefetchnta [esi+32]&lt;BR /&gt;&lt;BR /&gt;START_LOOP:&lt;BR /&gt; movaps xmm0, [esi]   &lt;FONT color="#008000"&gt;// xmm0 &amp;lt;- input&lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT color="#008000"&gt; // The following line is problematic. On AMD processor it is  // &lt;B&gt;not needed&lt;/B&gt;, while on Intel processor without this line XMM1  // losts its contents after first calling of procedure &amp;amp;nbs
p;   // sse_add_row (see below)&lt;BR /&gt; &lt;FONT color="#ff0000"&gt;movaps xmm1, [edi]  // xmm1 &amp;lt;- key&lt;/FONT&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt; cmpnleps xmm0, xmm1   &lt;FONT color="#008000"&gt;// compare input &amp;gt; key&lt;/FONT&gt;&lt;BR /&gt; movmskps edx, xmm0   &lt;FONT color="#008000"&gt;// store mask to edx&lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt; &lt;FONT color="#008000"&gt;// for testing purposes, we show the xmm1 register (see below)&lt;/FONT&gt;&lt;BR /&gt; push eax&lt;BR /&gt; push ecx&lt;BR /&gt; push edx&lt;BR /&gt; call show_xmm1&lt;BR /&gt; pop edx&lt;BR /&gt; pop ecx&lt;BR /&gt; pop eax&lt;BR /&gt;&lt;BR /&gt; test edx, edx  &lt;FONT color="#008000"&gt;// if nothing found, skip testing bits&lt;/FONT&gt;&lt;BR /&gt; jz NOT_FOUND_3&lt;BR /&gt;&lt;BR /&gt;FOUND:&lt;BR /&gt; test edx, 1   &lt;FONT color="#008000"&gt;// test bit 0&lt;/FONT&gt;&lt;BR /&gt; jz NOT_FOUND_0 &lt;FONT color="#008000"&gt;// if not set, jump to test bit 1&lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT color="#008000"&gt; // bit is set, we have to store data into output&lt;BR /&gt; // selection table - it is done by function sse_add_row&lt;BR /&gt;&lt;/FONT&gt; push eax&lt;BR /&gt; push ecx&lt;BR /&gt; push edx&lt;BR /&gt; call sse_add_row &lt;FONT color="#008000"&gt;// sse_add_row stores entry with         // offset in ecx to output table in DBS&lt;/FONT&gt;&lt;BR /&gt; pop edx&lt;BR /&gt; pop ecx&lt;BR /&gt; pop eax&lt;BR /&gt;&lt;BR /&gt;NOT_FOUND_0:&lt;BR /&gt; test edx, 2  &lt;FONT color="#008000"&gt;// test bit 1&lt;/FONT&gt;&lt;BR /&gt; jz NOT_FOUND_1 &lt;FONT color="#008000"&gt;// &lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT color="#008000" face="Courier New" size="2"&gt;if not set, jump to test bit 2&lt;/FONT&gt;&lt;FONT face="Courier New"&gt;&lt;BR /&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt;&lt;BR /&gt; push eax&lt;BR /&gt; push ecx&lt;BR /&gt; push edx&lt;BR /&gt; add ecx, 1&lt;BR /&gt; call sse_add_row    &lt;BR /&gt; pop edx&lt;BR /&gt; pop ecx&lt;BR /&gt; pop eax&lt;BR /&gt;&lt;BR /&gt;NOT_FOUND_1:&lt;BR /&gt; test edx, 4  &lt;FONT color="#008000"&gt;// test bit 2&lt;/FONT&gt;&lt;BR /&gt; jz NOT_FOUND_2 &lt;FONT color="#008000"&gt;// &lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT color="#008000
" face="Courier New" size="2"&gt;if not set, jump to test bit 3&lt;/FONT&gt;&lt;FONT face="Courier New"&gt;&lt;BR /&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt;&lt;BR /&gt; push eax&lt;BR /&gt; push ecx&lt;BR /&gt; push edx&lt;BR /&gt; add ecx, 2&lt;BR /&gt; call sse_add_row&lt;BR /&gt; pop edx&lt;BR /&gt; pop ecx&lt;BR /&gt; pop eax&lt;BR /&gt;&lt;BR /&gt;NOT_FOUND_2:&lt;BR /&gt; test edx, 8  &lt;FONT color="#008000"&gt;// test bit 3&lt;/FONT&gt;&lt;BR /&gt; jz NOT_FOUND_3 &lt;/FONT&gt;&lt;FONT color="#008000" face="Courier New" size="2"&gt;// &lt;/FONT&gt;&lt;FONT color="#008000" face="Courier New" size="2"&gt;if not set, jump&lt;/FONT&gt;&lt;FONT color="#008000" face="Verdana" size="2"&gt;&lt;FONT face="Courier New"&gt; to end of bit &lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT color="#008000" face="Verdana" size="2"&gt;&lt;FONT face="Courier New"&gt;testing&lt;/FONT&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="Verdana" size="2"&gt;&lt;FONT face="Courier New"&gt;&lt;BR /&gt; push eax&lt;BR /&gt; push ecx&lt;BR /&gt; push edx&lt;BR /&gt; add ecx, 3&lt;BR /&gt; call sse_add_row&lt;BR /&gt; pop edx&lt;BR /&gt; pop ecx&lt;BR /&gt; pop eax&lt;BR /&gt;&lt;BR /&gt;NOT_FOUND_3:  &lt;BR /&gt; add esi, 16&lt;BR /&gt; add ecx, 4&lt;BR /&gt; cmp ecx, ebx&lt;BR /&gt; jne START_LOOP&lt;BR /&gt; }&lt;BR /&gt;...&lt;BR /&gt;&lt;BR /&gt;&lt;FONT color="#008000"&gt;// write entry to output table of the DBS&lt;BR /&gt;// sse_ecx is offset of found entry&lt;BR /&gt;&lt;/FONT&gt;&lt;FONT color="#0000ff"&gt;void __fastcall&lt;/FONT&gt; sse_add_row(&lt;FONT color="#0000ff"&gt;register &lt;/FONT&gt;sse_ecx)&lt;BR /&gt; {&lt;BR /&gt; Row *row = algebra -&amp;gt; generateRow(table, page, sse_ecx);&lt;BR /&gt; algebra -&amp;gt; syscat -&amp;gt; addRowData (output_table, row);&lt;BR /&gt; &lt;FONT color="#0000ff"&gt;delete &lt;/FONT&gt;row;&lt;BR /&gt; }&lt;BR /&gt;&lt;BR /&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Verdana" size="2"&gt;&lt;FONT face="Courier New"&gt;&lt;FONT color="#008000"&gt;// print the contents of XMM1 (for testing purposes only)&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="Verdana" size="2"&gt;&lt;FONT face="Courier New"&gt;&lt;FONT color="#0000ff"&gt;void __fastcall&lt;/FONT&gt; show_xmm1(&lt;FONT color="#0000ff"&gt;register &lt;/FONT&gt;sse_ecx)&lt;BR /&gt; {&lt;BR /&gt; &lt;FONT color="#0000ff"&gt;float &lt;/FONT&gt;*o = (&lt;FONT color="#0000ff"&gt;float &lt;/FONT&gt;*)malloc(4 * &lt;FONT color="#0000ff"&gt;sizeof&lt;/FONT&gt;(&lt;FONT color="#0000ff"&gt;float&lt;/FONT&gt;));&lt;BR /&gt; &lt;FONT color="#0000ff"&gt;__asm&lt;/FONT&gt;&lt;BR /&gt;  {&lt;BR /&gt;  mov edi, o&lt;BR /&gt;  movups [edi], xmm1&lt;BR /&gt;  }&lt;BR /&gt;&lt;BR /&gt; printf("%d: %f %f %f %f
", sse_ecx, o[0], o[1], o[2], o[3]);&lt;BR /&gt; free(o);&lt;BR /&gt; }&lt;BR /&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Verdana" size="2"&gt;&lt;FONT face="Courier New"&gt;&lt;FONT color="#0000ff"&gt;&lt;FONT color="#000000"&gt;----------------------------------&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="Verdana" size="2"&gt;On AMD Duron 
1800Mhz processor the &lt;FONT color="#ff0000"&gt;red line&lt;/FONT&gt; above is not needed because XMM1 is already loaded (movss and broadcast). Its contents is constant. But on Intel, its contents is constat only until procedure &lt;FONT face="Courier New" size="2"&gt;sse_add_row&lt;/FONT&gt; is called. After the first call the contents of XMM1 is changed - it is rewriten to these components: 0.00000 2.90625 0.00000 0.00000 and then stay constant with these values.&lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="Verdana" size="2"&gt;I don't understand what part of code is wrong or some strange-side-effect-generating, why it runs fine on AMD and why the new content of XMM1 is &lt;/FONT&gt;&lt;FONT face="Verdana" size="2"&gt;right 0.00000 2.90625 0.00000 0.00000. I studied manuals with instructions and function calling conventions, but I didn't find what could modify the contents of XMM1 and why only on Intel processors.&lt;BR /&gt;&lt;BR /&gt;Now I tested it on AMD Turion and it run in the same way like on Intel. The XMM1 contents is rewriten... So my program run correctly only on AMD Duron 1800 MHz.&lt;BR /&gt;&lt;BR /&gt;Can somebody find the clue? Thanks in advance.&lt;BR /&gt;Jozef&lt;BR /&gt;&lt;/FONT&gt;</description>
      <pubDate>Sun, 17 Sep 2006 04:11:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Difference-using-SSE-on-Intel-and-AMD-processors/m-p/910803#M2963</guid>
      <dc:creator>Anonymous69</dc:creator>
      <dc:date>2006-09-17T04:11:15Z</dc:date>
    </item>
    <item>
      <title>Re: Difference using SSE on Intel and AMD processors (?)</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Difference-using-SSE-on-Intel-and-AMD-processors/m-p/910804#M2964</link>
      <description>&lt;P&gt;I think the calling convention for the mmx/xmm registers requires that they be "caller saved" which means that they could be modified by called functions. You have to save them before calls and restore them afterwards. Check the following links:&lt;/P&gt;
&lt;P&gt;Windows:&lt;/P&gt;
&lt;P&gt;&lt;A href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/Kernel_d/hh/Kernel_d/64bitamd_6848c803-89d3-4f19-82b2-6fae5e63ec13.xml.asp"&gt;http://msdn.microsoft.com/library/default.asp?url=/library/en-us/Kernel_d/hh/Kernel_d/64bitamd_6848c803-89d3-4f19-82b2-6fae5e63ec13.xml.asp&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Intel compiler changes:&lt;/P&gt;
&lt;P&gt;&lt;A href="http://www.intel.com/support/performancetools/c/windows/sb/cs-020438.htm"&gt;http://www.intel.com/support/performancetools/c/windows/sb/cs-020438.htm&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Of passing interest:&lt;/P&gt;
&lt;P&gt;SysV AMD64 ABI:&lt;/P&gt;
&lt;P&gt;&lt;A href="http://www.x86-64.org/documentation/abi-0.96.pdf#search=%22linux%20IA32%20ABI%20SSE%22"&gt;http://www.x86-64.org/documentation/abi-0.96.pdf#search=%22linux%20IA32%20ABI%20SSE%22&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 19 Sep 2006 22:53:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Difference-using-SSE-on-Intel-and-AMD-processors/m-p/910804#M2964</guid>
      <dc:creator>Intel_C_Intel</dc:creator>
      <dc:date>2006-09-19T22:53:48Z</dc:date>
    </item>
    <item>
      <title>Re: Difference using SSE on Intel and AMD processors (?)</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Difference-using-SSE-on-Intel-and-AMD-processors/m-p/910805#M2965</link>
      <description>Can you step into the function calls and see if some instruction is explicitly writing to the XMM1 register?</description>
      <pubDate>Wed, 20 Sep 2006 00:29:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Difference-using-SSE-on-Intel-and-AMD-processors/m-p/910805#M2965</guid>
      <dc:creator>Michael_S_Intel8</dc:creator>
      <dc:date>2006-09-20T00:29:23Z</dc:date>
    </item>
    <item>
      <title>Re: Difference using SSE on Intel and AMD processors (?)</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Difference-using-SSE-on-Intel-and-AMD-processors/m-p/910806#M2966</link>
      <description>&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;Another response to the original question, forwarded to us by engineering:&lt;/FONT&gt;&lt;/P&gt;
&lt;BLOCKQUOTE dir="ltr" style="MARGIN-RIGHT: 0px"&gt;
&lt;P&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: navy; FONT-FAMILY: Arial"&gt;Which compilerare youusing? XMM registers in 32 bit mode are non-volatile, andthe question appears toassume they are. It is very likely that the compiler is calling an optimized memory routinein the Intel case.&lt;/SPAN&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P dir="ltr"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: navy; FONT-FAMILY: Arial"&gt;==&lt;/SPAN&gt;&lt;/P&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: navy; FONT-FAMILY: Arial"&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black; FONT-FAMILY: Arial"&gt;Lexi S.&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 11pt; COLOR: black; FONT-FAMILY: Arial"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black; FONT-FAMILY: Arial"&gt;IntelSoftware NetworkSupport&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 11pt; COLOR: black; FONT-FAMILY: Arial"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black; FONT-FAMILY: Arial"&gt;&lt;A href="http://www.intel.com/software"&gt;http://www.intel.com/software&lt;/A&gt; &lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 11pt; COLOR: black; FONT-FAMILY: Arial"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="MsoNormal" style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black; FONT-FAMILY: Arial"&gt;&lt;A href="http://www.intel.com/cd/ids/developer/asmo-na/eng/58987.htm"&gt;Contact us&lt;/A&gt;&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 11pt; COLOR: black; FONT-FAMILY: Arial"&gt;&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P dir="ltr"&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 21 Sep 2006 08:27:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Difference-using-SSE-on-Intel-and-AMD-processors/m-p/910806#M2966</guid>
      <dc:creator>Intel_Software_Netw1</dc:creator>
      <dc:date>2006-09-21T08:27:42Z</dc:date>
    </item>
    <item>
      <title>Re: Difference using SSE on Intel and AMD processors (?)</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Difference-using-SSE-on-Intel-and-AMD-processors/m-p/910807#M2967</link>
      <description>&lt;P&gt;&lt;FONT face="Arial"&gt;Jozef,&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial"&gt;Some comments on your code:&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial"&gt;First&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT size="2"&gt;&lt;FONT face="Courier New"&gt; movss xmm1, [edi]   &lt;FONT color="#008000"&gt;// xmm1 &amp;lt;- key&lt;/FONT&gt;&lt;BR /&gt; shufps xmm1, xmm1, 0   &lt;FONT color="#008000"&gt;// broadcast&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="Arial"&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial"&gt;is not equivalent to&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Courier New"&gt; movaps xmm1, [edi]  // xmm1 &amp;lt;- key&lt;/FONT&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="Arial"&gt;Unless edi points to 4 identical single precision FP values. (I assume it is)&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial"&gt;Second, as per Lexi's suggestion step through your code. You will most likely find the code called by your sse_add_row is modifying XMM1 (caller's responsibility to preserve/restore XMM registers). If you find this the case then insert the&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;FONT size="2"&gt;&lt;FONT face="Courier New"&gt; movaps xmm1, [edi]  // xmm1 &amp;lt;- key&lt;/FONT&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;/FONT&gt;&lt;FONT face="Arial"&gt;&lt;BR /&gt;following each call to sse_add_row. In this manner the overhead only occures when needed. (remove what you thought was the unnecessary movaps)&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial"&gt;Third, if you data is such that the majority of compares are "not founds" then rearrange the code to place the NOT_FOUND_3 section following the first test&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial"&gt;START_LOOP:&lt;BR /&gt; ...&lt;BR /&gt; test edx,edx&lt;BR /&gt; jnz FOUND&lt;BR /&gt;NOT_FOUND_3: &lt;BR /&gt; add esi, 16&lt;BR /&gt; add ecx, 4&lt;BR /&gt; cmp ecx, ebx&lt;BR /&gt; jne START_LOOP&lt;BR /&gt; jmp DONE&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial"&gt;FOUND:&lt;BR /&gt; ...&lt;BR /&gt;DONE:&lt;BR /&gt;}&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial"&gt;There are a few more tweeks, but I will let you find them for yourself.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial"&gt;Jim Dempsey&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial"&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial"&gt;&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 25 Sep 2006 21:11:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Difference-using-SSE-on-Intel-and-AMD-processors/m-p/910807#M2967</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2006-09-25T21:11:07Z</dc:date>
    </item>
  </channel>
</rss>

