<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Potential OpenCL compiler/implementation issue in OpenCL* for CPU</title>
    <link>https://community.intel.com/t5/OpenCL-for-CPU/Potential-OpenCL-compiler-implementation-issue/m-p/1139980#M5852</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I met issue when using work_group_all/sub_group_all. So I simplified into below kernel.&lt;/P&gt;&lt;P&gt;// kernel start&lt;/P&gt;
&lt;PRE class="brush:cpp; class-name:dark; wrap-lines:false;"&gt;kernel void entry()
{
    int id = (int)get_global_id(0);

    bool end = false;
    int cnt = 0;
    bool end2 = false;  // always of the same value for the whole work/sub group
    while (1)
    {
        if (end2) break;

        if (cnt==0)
        {
            //  First loop
            end = id==0;    //  end is only true for first work item
        }
        else
        {
            //  Second loop
            end = true;     //  end is always true now
        }

        //  end2 will be false at first loop, and true at second loop
        //  end2 will be of the same value for whole sub_group/work_group
#if 1
        end2 = sub_group_all(end?1:0)!=0;
#else
        end2 = work_group_all(end?1:0)!=0;
#endif

#if 1
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; if ((id&amp;amp;0xff)&amp;lt;=1)
        {
            printf("id = %d, cnt=%d, end = %d, end2 = %d\n", id, cnt, end?1:0, end2?1:0);
        }
#endif

        cnt++;
    }
}
&lt;/PRE&gt;

&lt;P&gt;// kernel end&lt;/P&gt;
&lt;P&gt;The execution is just dead. Output shows the cnt will never be 2, but the kernel just not finished. No matter I use work_group_all() or sub_group_all().&lt;/P&gt;
&lt;P&gt;id = 0, cnt=0, end = 1, end2 = 0&lt;BR /&gt;id = 1, cnt=0, end = 0, end2 = 0&lt;BR /&gt;id = 512, cnt=0, end = 0, end2 = 0&lt;BR /&gt;id = 513, cnt=0, end = 0, end2 = 0&lt;BR /&gt;id = 512, cnt=1, end = 1, end2 = 1&lt;BR /&gt;id = 513, cnt=1, end = 1, end2 = 1&lt;BR /&gt;id = 512, cnt=1, end = 1, end2 = 1&lt;BR /&gt;id = 513, cnt=1, end = 1, end2 = 1&lt;BR /&gt;id = 512, cnt=1, end = 1, end2 = 1&lt;BR /&gt;id = 513, cnt=1, end = 1, end2 = 1&lt;/P&gt;
&lt;P&gt;My work item number is always power of 2, and bigger than 512.&lt;/P&gt;
&lt;P&gt;When running on CPU, it just deadloop that I can kill through OS. When running on GPU, it will just lead to whole OS deadloop if I use&amp;nbsp;work_group_all().&lt;/P&gt;
&lt;P&gt;I tried with following two different OCL compiler version with same result:&lt;/P&gt;
&lt;P&gt;Intel(R) SDK for OpenCL(TM) - Offline Compiler, version 8.0.0.171&lt;/P&gt;
&lt;P&gt;Intel(R) SDK for OpenCL(TM) - offline compiler command line, version 7.0.0.3993&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks,&lt;BR /&gt;Tango&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 11 Mar 2020 23:43:36 GMT</pubDate>
    <dc:creator>gu__sheng</dc:creator>
    <dc:date>2020-03-11T23:43:36Z</dc:date>
    <item>
      <title>Potential OpenCL compiler/implementation issue</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Potential-OpenCL-compiler-implementation-issue/m-p/1139980#M5852</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I met issue when using work_group_all/sub_group_all. So I simplified into below kernel.&lt;/P&gt;&lt;P&gt;// kernel start&lt;/P&gt;
&lt;PRE class="brush:cpp; class-name:dark; wrap-lines:false;"&gt;kernel void entry()
{
    int id = (int)get_global_id(0);

    bool end = false;
    int cnt = 0;
    bool end2 = false;  // always of the same value for the whole work/sub group
    while (1)
    {
        if (end2) break;

        if (cnt==0)
        {
            //  First loop
            end = id==0;    //  end is only true for first work item
        }
        else
        {
            //  Second loop
            end = true;     //  end is always true now
        }

        //  end2 will be false at first loop, and true at second loop
        //  end2 will be of the same value for whole sub_group/work_group
#if 1
        end2 = sub_group_all(end?1:0)!=0;
#else
        end2 = work_group_all(end?1:0)!=0;
#endif

#if 1
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; if ((id&amp;amp;0xff)&amp;lt;=1)
        {
            printf("id = %d, cnt=%d, end = %d, end2 = %d\n", id, cnt, end?1:0, end2?1:0);
        }
#endif

        cnt++;
    }
}
&lt;/PRE&gt;

&lt;P&gt;// kernel end&lt;/P&gt;
&lt;P&gt;The execution is just dead. Output shows the cnt will never be 2, but the kernel just not finished. No matter I use work_group_all() or sub_group_all().&lt;/P&gt;
&lt;P&gt;id = 0, cnt=0, end = 1, end2 = 0&lt;BR /&gt;id = 1, cnt=0, end = 0, end2 = 0&lt;BR /&gt;id = 512, cnt=0, end = 0, end2 = 0&lt;BR /&gt;id = 513, cnt=0, end = 0, end2 = 0&lt;BR /&gt;id = 512, cnt=1, end = 1, end2 = 1&lt;BR /&gt;id = 513, cnt=1, end = 1, end2 = 1&lt;BR /&gt;id = 512, cnt=1, end = 1, end2 = 1&lt;BR /&gt;id = 513, cnt=1, end = 1, end2 = 1&lt;BR /&gt;id = 512, cnt=1, end = 1, end2 = 1&lt;BR /&gt;id = 513, cnt=1, end = 1, end2 = 1&lt;/P&gt;
&lt;P&gt;My work item number is always power of 2, and bigger than 512.&lt;/P&gt;
&lt;P&gt;When running on CPU, it just deadloop that I can kill through OS. When running on GPU, it will just lead to whole OS deadloop if I use&amp;nbsp;work_group_all().&lt;/P&gt;
&lt;P&gt;I tried with following two different OCL compiler version with same result:&lt;/P&gt;
&lt;P&gt;Intel(R) SDK for OpenCL(TM) - Offline Compiler, version 8.0.0.171&lt;/P&gt;
&lt;P&gt;Intel(R) SDK for OpenCL(TM) - offline compiler command line, version 7.0.0.3993&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks,&lt;BR /&gt;Tango&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Mar 2020 23:43:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Potential-OpenCL-compiler-implementation-issue/m-p/1139980#M5852</guid>
      <dc:creator>gu__sheng</dc:creator>
      <dc:date>2020-03-11T23:43:36Z</dc:date>
    </item>
  </channel>
</rss>

