<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Achieving peak bandwidth on multi-socket systems in OpenCL* for CPU</title>
    <link>https://community.intel.com/t5/OpenCL-for-CPU/Achieving-peak-bandwidth-on-multi-socket-systems/m-p/960055#M2168</link>
    <description>&lt;P&gt;Let's say each CPU socket has 43 GB/s of bandwidth through its four memory channels.&amp;nbsp; Let's say I have a dual socket system.&amp;nbsp; A reduction operation should achieve performance of 86 GB/s, but it doesn't.&amp;nbsp; It will still only achieve up to 43 GB/s.&amp;nbsp; Why is that and is there anything in Intel's OpenCL implementation for CPUs that can fix that?&lt;/P&gt;

&lt;P&gt;How could I fix that outside of OpenCL?&lt;/P&gt;</description>
    <pubDate>Wed, 26 Mar 2014 04:16:09 GMT</pubDate>
    <dc:creator>James_R_</dc:creator>
    <dc:date>2014-03-26T04:16:09Z</dc:date>
    <item>
      <title>Achieving peak bandwidth on multi-socket systems</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Achieving-peak-bandwidth-on-multi-socket-systems/m-p/960055#M2168</link>
      <description>&lt;P&gt;Let's say each CPU socket has 43 GB/s of bandwidth through its four memory channels.&amp;nbsp; Let's say I have a dual socket system.&amp;nbsp; A reduction operation should achieve performance of 86 GB/s, but it doesn't.&amp;nbsp; It will still only achieve up to 43 GB/s.&amp;nbsp; Why is that and is there anything in Intel's OpenCL implementation for CPUs that can fix that?&lt;/P&gt;

&lt;P&gt;How could I fix that outside of OpenCL?&lt;/P&gt;</description>
      <pubDate>Wed, 26 Mar 2014 04:16:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Achieving-peak-bandwidth-on-multi-socket-systems/m-p/960055#M2168</guid>
      <dc:creator>James_R_</dc:creator>
      <dc:date>2014-03-26T04:16:09Z</dc:date>
    </item>
    <item>
      <title>Dear James,</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Achieving-peak-bandwidth-on-multi-socket-systems/m-p/960056#M2169</link>
      <description>&lt;P&gt;Dear James,&lt;/P&gt;

&lt;P&gt;Can you please check if the following helps?&lt;/P&gt;

&lt;P&gt;&lt;A href="http://software.intel.com/forums/topic/497429"&gt;http://software.intel.com/forums/topic/497429&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Thanks,&lt;/P&gt;

&lt;P&gt;Arik&lt;/P&gt;</description>
      <pubDate>Wed, 26 Mar 2014 13:31:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Achieving-peak-bandwidth-on-multi-socket-systems/m-p/960056#M2169</guid>
      <dc:creator>Arik_N_Intel</dc:creator>
      <dc:date>2014-03-26T13:31:05Z</dc:date>
    </item>
    <item>
      <title>Quote:Arik Narkis (Intel)</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Achieving-peak-bandwidth-on-multi-socket-systems/m-p/960057#M2170</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Arik Narkis (Intel) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Dear James,&lt;/P&gt;

&lt;P&gt;Can you please check if the following helps?&lt;/P&gt;

&lt;P&gt;&lt;A href="http://software.intel.com/forums/topic/497429"&gt;http://software.intel.com/forums/topic/497429&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Thanks,&lt;/P&gt;

&lt;P&gt;Arik&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Arik,&lt;/P&gt;

&lt;P&gt;This method improves bandwidth performance substantially (by &amp;gt;1.8x).&amp;nbsp; It actually achieves 90+% of the platform bandwidth for my code rather than the ~50% of peak bandwidth I had the other day.&amp;nbsp; I'm a bit surprised it actually worked.&amp;nbsp; I'd like to test it on a four or eight socket system now, but I'll have to find one.&lt;/P&gt;

&lt;P&gt;Knowing this now makes life more difficult for OpenCL developers with bandwidth-bound kernels on multi-socket nodes.&amp;nbsp; Thanks a lot, Arik!&lt;/P&gt;

&lt;P&gt;-James&lt;/P&gt;</description>
      <pubDate>Fri, 28 Mar 2014 05:03:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Achieving-peak-bandwidth-on-multi-socket-systems/m-p/960057#M2170</guid>
      <dc:creator>James_R_</dc:creator>
      <dc:date>2014-03-28T05:03:24Z</dc:date>
    </item>
  </channel>
</rss>

