<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic The driver can help a lot in OpenCL* for CPU</title>
    <link>https://community.intel.com/t5/OpenCL-for-CPU/Performance-of-quot-intel-sub-group-block-readN-writeN-quot-vs/m-p/1099493#M5105</link>
    <description>&lt;P&gt;The driver can help a lot with optimizing memory transfers and can often get to similar results. &amp;nbsp; However, if you're already going to the trouble of using the subgroup reads and writes this can help guarantee you're getting optimal memory buffer bandwidth which could give some advantages over vload/vstore, using vector data types for memory I/O, etc. &amp;nbsp;&lt;/P&gt;

&lt;UL&gt;
	&lt;LI&gt;Subgroup read/write is closer to what the driver would try to optimize for in any case. &amp;nbsp;If you've already arranged your data I/O to work this way this should be an optimal data access approach (for linear buffers) which will touch a minimal # of cache lines to maximize cache efficienc&lt;/LI&gt;
	&lt;LI&gt;This also optimizes the calculations needed to compute addresses and the # of addresses that need to be passed to the driver. With &amp;nbsp;subgroups only the address of the first item in the block and a length is sent, vs. an address for every work item in the subgroup&lt;/LI&gt;
&lt;/UL&gt;</description>
    <pubDate>Fri, 02 Dec 2016 16:36:37 GMT</pubDate>
    <dc:creator>Jeffrey_M_Intel1</dc:creator>
    <dc:date>2016-12-02T16:36:37Z</dc:date>
    <item>
      <title>Performance of "intel_sub_group_block_readN/writeN" vs "vloadN/vstoreN"</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Performance-of-quot-intel-sub-group-block-readN-writeN-quot-vs/m-p/1099492#M5104</link>
      <description>&lt;P&gt;Does&amp;nbsp;subgroup extension API "intel_sub_group_block_readN/writeN" have better performance than "vloadN/vstoreN"? I did some testing, but don't see much difference between them.&amp;nbsp; Can you elaborate the read/write&amp;nbsp; performance expectation between them?&lt;/P&gt;</description>
      <pubDate>Tue, 29 Nov 2016 08:21:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Performance-of-quot-intel-sub-group-block-readN-writeN-quot-vs/m-p/1099492#M5104</guid>
      <dc:creator>Shengquan_Y_Intel</dc:creator>
      <dc:date>2016-11-29T08:21:40Z</dc:date>
    </item>
    <item>
      <title>The driver can help a lot</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Performance-of-quot-intel-sub-group-block-readN-writeN-quot-vs/m-p/1099493#M5105</link>
      <description>&lt;P&gt;The driver can help a lot with optimizing memory transfers and can often get to similar results. &amp;nbsp; However, if you're already going to the trouble of using the subgroup reads and writes this can help guarantee you're getting optimal memory buffer bandwidth which could give some advantages over vload/vstore, using vector data types for memory I/O, etc. &amp;nbsp;&lt;/P&gt;

&lt;UL&gt;
	&lt;LI&gt;Subgroup read/write is closer to what the driver would try to optimize for in any case. &amp;nbsp;If you've already arranged your data I/O to work this way this should be an optimal data access approach (for linear buffers) which will touch a minimal # of cache lines to maximize cache efficienc&lt;/LI&gt;
	&lt;LI&gt;This also optimizes the calculations needed to compute addresses and the # of addresses that need to be passed to the driver. With &amp;nbsp;subgroups only the address of the first item in the block and a length is sent, vs. an address for every work item in the subgroup&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Fri, 02 Dec 2016 16:36:37 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Performance-of-quot-intel-sub-group-block-readN-writeN-quot-vs/m-p/1099493#M5105</guid>
      <dc:creator>Jeffrey_M_Intel1</dc:creator>
      <dc:date>2016-12-02T16:36:37Z</dc:date>
    </item>
  </channel>
</rss>

