<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Thanks for taking a look... in OpenCL* for CPU</title>
    <link>https://community.intel.com/t5/OpenCL-for-CPU/Is-there-any-GEN-friendly-idiom-for-communicating-subgroup/m-p/1063071#M4250</link>
    <description>&lt;P&gt;Thanks for taking a look...&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;My workaround is to simply launch subgroup-wide workgroups (in this case 8 item workgroups).&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;That works really well on Skylake... but this might not be a long term solution and because of the local mem granularity rules, I'm unable to exploit all 64KB of local mem per subslice.&lt;/P&gt;

&lt;P&gt;I would rather launch two workgroups with 28 SIMD8 subgroups each and have each subgroup obtain access to ~1700 bytes of local memory and let each subgroup run independently.&lt;/P&gt;

&lt;P&gt;Bouncing data through SLM to help indicate uniformity is an option but I still think the code generation couldn't possibly be as good as&amp;nbsp;actually knowing that a sequence is subgroup isolated.&lt;/P&gt;

&lt;P&gt;You could always provide us a GEN assembler! :)&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 11 Jan 2017 17:07:47 GMT</pubDate>
    <dc:creator>allanmac1</dc:creator>
    <dc:date>2017-01-11T17:07:47Z</dc:date>
    <item>
      <title>Is there any GEN-friendly idiom for communicating subgroup uniformity?</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Is-there-any-GEN-friendly-idiom-for-communicating-subgroup/m-p/1063069#M4248</link>
      <description>&lt;P&gt;It seems to me that GEN might benefit more from detecting "subgroup uniform" values than other architectures because of its &lt;A href="https://software.intel.com/en-us/articles/introduction-to-gen-assembly"&gt;unique register file architecture and instruction set&lt;/A&gt;.&lt;/P&gt;

&lt;P&gt;Are there are any GEN idioms that you've discovered that nudge/help the compiler so it's able to determine that variables used by a subgroup are actually scalars (subgroup uniform)?&lt;/P&gt;

&lt;P&gt;For example, would an idiom like this:&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;kernel foo(...)
{
  uint const sg_id = get_sub_group_id();

  if (sg_id == get_sub_group_id())
  {
    // rest of kernel
  }
}&lt;/PRE&gt;

&lt;P&gt;or an idiom like this (perhaps better):&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;kernel foo(...)
{
  if (sub_group_all(true))
  {
    // rest of kernel
  }
}&lt;/PRE&gt;

&lt;P&gt;... help the compiler determine that the subgroups are running "in isolation" and therefore any future function involving get_sub_group_id() (or similar) would be uniform?&lt;/P&gt;

&lt;P&gt;I suspect this hasn't been implemented but it might be a useful idiom for both performance and reducing register pressure.&lt;/P&gt;</description>
      <pubDate>Fri, 06 Jan 2017 22:43:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Is-there-any-GEN-friendly-idiom-for-communicating-subgroup/m-p/1063069#M4248</guid>
      <dc:creator>allanmac1</dc:creator>
      <dc:date>2017-01-06T22:43:16Z</dc:date>
    </item>
    <item>
      <title>So far I have not been able</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Is-there-any-GEN-friendly-idiom-for-communicating-subgroup/m-p/1063070#M4249</link>
      <description>&lt;P&gt;So far I have not been able to find anything that exactly fits. &amp;nbsp;However, we will keep this in mind for future documentation and features.&lt;/P&gt;

&lt;P&gt;For now, would it help at all to set up "subgroup uniform" values using SLM or possibly images to take advantage of hardware shared within subgroups? &amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Jan 2017 06:54:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Is-there-any-GEN-friendly-idiom-for-communicating-subgroup/m-p/1063070#M4249</guid>
      <dc:creator>Jeffrey_M_Intel1</dc:creator>
      <dc:date>2017-01-11T06:54:00Z</dc:date>
    </item>
    <item>
      <title>Thanks for taking a look...</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Is-there-any-GEN-friendly-idiom-for-communicating-subgroup/m-p/1063071#M4250</link>
      <description>&lt;P&gt;Thanks for taking a look...&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;My workaround is to simply launch subgroup-wide workgroups (in this case 8 item workgroups).&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;That works really well on Skylake... but this might not be a long term solution and because of the local mem granularity rules, I'm unable to exploit all 64KB of local mem per subslice.&lt;/P&gt;

&lt;P&gt;I would rather launch two workgroups with 28 SIMD8 subgroups each and have each subgroup obtain access to ~1700 bytes of local memory and let each subgroup run independently.&lt;/P&gt;

&lt;P&gt;Bouncing data through SLM to help indicate uniformity is an option but I still think the code generation couldn't possibly be as good as&amp;nbsp;actually knowing that a sequence is subgroup isolated.&lt;/P&gt;

&lt;P&gt;You could always provide us a GEN assembler! :)&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Jan 2017 17:07:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Is-there-any-GEN-friendly-idiom-for-communicating-subgroup/m-p/1063071#M4250</guid>
      <dc:creator>allanmac1</dc:creator>
      <dc:date>2017-01-11T17:07:47Z</dc:date>
    </item>
  </channel>
</rss>

