<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic store bandwidth issue in Software Archive</title>
    <link>https://community.intel.com/t5/Software-Archive/store-bandwidth-issue/m-p/819606#M6030</link>
    <description>&lt;BR /&gt;Hi,&lt;BR /&gt;&lt;BR /&gt;I tried to improve performance for memory copy using sse on Xeon 5310 1.6G DDR2 667&lt;BR /&gt;here is my code for testing bandwidth for writing ram&lt;BR /&gt;&lt;BR /&gt;rdtsc&lt;BR /&gt;movl %eax,time1&lt;BR /&gt;movl %edx,time1+4&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;loop:&lt;BR /&gt;movdqa %xmm0,(%edi)&lt;BR /&gt;movdqa %xmm1,16(%edi)&lt;BR /&gt;movdqa %xmm2,32(%edi)&lt;BR /&gt;movdqa %xmm3,48(%edi)&lt;BR /&gt;movdqa %xmm4,64(%edi)&lt;BR /&gt;movdqa %xmm5,80(%edi)&lt;BR /&gt;movdqa %xmm6,96(%edi)&lt;BR /&gt;movdqa %xmm7,112(%edi)&lt;BR /&gt;addl $128,%edi&lt;BR /&gt;dec %ecx&lt;BR /&gt;jnz loop&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;rdtsc&lt;BR /&gt;movl %eax,time2&lt;BR /&gt;movl %edx,time2+4&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;the problem is if ecx is set from 0 to 31 (0 to 4kB), the total cost is 1xxx clocks, and when ecx is set to 32 to &lt;BR /&gt;&lt;BR /&gt;64(4kB to 8kB), the cost rises to 6xxx clocks. It seems every 4kB block will cause a worse jump (5xxx clocks).&lt;BR /&gt;I tried to prefetch 4kB ahead before the loop, for instance&lt;BR /&gt;&lt;BR /&gt;movl %eax,4096(%edi)&lt;BR /&gt;movl %eax,8192(%edi)&lt;BR /&gt;&lt;BR /&gt;but each prefetch will cost 5xxx clocks, so it can't help. I also tried to use movntdq, but it got worse.&lt;BR /&gt;accroding to the current result, the bandwidth for writing can't exceed 1GB/s. The ram I installed is ddr2 667, I &lt;BR /&gt;&lt;BR /&gt;think it has a theoretical bandwidth of 5GB/s. Is this a OS issue or CPU cache issue? BTW OS is Linux Kernel 2.6.9-&lt;BR /&gt;&lt;BR /&gt;78&lt;BR /&gt;&lt;BR /&gt;any ideas will be appreciated&lt;BR /&gt;thanks&lt;BR /&gt;&lt;BR /&gt;</description>
    <pubDate>Fri, 14 May 2010 03:21:45 GMT</pubDate>
    <dc:creator>incoming4u</dc:creator>
    <dc:date>2010-05-14T03:21:45Z</dc:date>
    <item>
      <title>store bandwidth issue</title>
      <link>https://community.intel.com/t5/Software-Archive/store-bandwidth-issue/m-p/819606#M6030</link>
      <description>&lt;BR /&gt;Hi,&lt;BR /&gt;&lt;BR /&gt;I tried to improve performance for memory copy using sse on Xeon 5310 1.6G DDR2 667&lt;BR /&gt;here is my code for testing bandwidth for writing ram&lt;BR /&gt;&lt;BR /&gt;rdtsc&lt;BR /&gt;movl %eax,time1&lt;BR /&gt;movl %edx,time1+4&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;loop:&lt;BR /&gt;movdqa %xmm0,(%edi)&lt;BR /&gt;movdqa %xmm1,16(%edi)&lt;BR /&gt;movdqa %xmm2,32(%edi)&lt;BR /&gt;movdqa %xmm3,48(%edi)&lt;BR /&gt;movdqa %xmm4,64(%edi)&lt;BR /&gt;movdqa %xmm5,80(%edi)&lt;BR /&gt;movdqa %xmm6,96(%edi)&lt;BR /&gt;movdqa %xmm7,112(%edi)&lt;BR /&gt;addl $128,%edi&lt;BR /&gt;dec %ecx&lt;BR /&gt;jnz loop&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;rdtsc&lt;BR /&gt;movl %eax,time2&lt;BR /&gt;movl %edx,time2+4&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;the problem is if ecx is set from 0 to 31 (0 to 4kB), the total cost is 1xxx clocks, and when ecx is set to 32 to &lt;BR /&gt;&lt;BR /&gt;64(4kB to 8kB), the cost rises to 6xxx clocks. It seems every 4kB block will cause a worse jump (5xxx clocks).&lt;BR /&gt;I tried to prefetch 4kB ahead before the loop, for instance&lt;BR /&gt;&lt;BR /&gt;movl %eax,4096(%edi)&lt;BR /&gt;movl %eax,8192(%edi)&lt;BR /&gt;&lt;BR /&gt;but each prefetch will cost 5xxx clocks, so it can't help. I also tried to use movntdq, but it got worse.&lt;BR /&gt;accroding to the current result, the bandwidth for writing can't exceed 1GB/s. The ram I installed is ddr2 667, I &lt;BR /&gt;&lt;BR /&gt;think it has a theoretical bandwidth of 5GB/s. Is this a OS issue or CPU cache issue? BTW OS is Linux Kernel 2.6.9-&lt;BR /&gt;&lt;BR /&gt;78&lt;BR /&gt;&lt;BR /&gt;any ideas will be appreciated&lt;BR /&gt;thanks&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 14 May 2010 03:21:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/store-bandwidth-issue/m-p/819606#M6030</guid>
      <dc:creator>incoming4u</dc:creator>
      <dc:date>2010-05-14T03:21:45Z</dc:date>
    </item>
    <item>
      <title>store bandwidth issue</title>
      <link>https://community.intel.com/t5/Software-Archive/store-bandwidth-issue/m-p/819607#M6031</link>
      <description>I have moved this from the General Contest Questions forum, since it's more of a general programming question.&lt;BR /&gt;&lt;BR /&gt;==&lt;BR /&gt;Aubrey W.&lt;BR /&gt;Intel Software Network Support</description>
      <pubDate>Thu, 10 Jun 2010 15:35:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/store-bandwidth-issue/m-p/819607#M6031</guid>
      <dc:creator>Aubrey_W_</dc:creator>
      <dc:date>2010-06-10T15:35:45Z</dc:date>
    </item>
  </channel>
</rss>

