<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic However doing this fixed the in OpenCL* for CPU</title>
    <link>https://community.intel.com/t5/OpenCL-for-CPU/Using-lookup-table-on-reduction-kernel-produces-incorrect/m-p/1081502#M4656</link>
    <description>&lt;P&gt;However doing this fixed the problem:&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;#ifndef LOCAL_SIZE
#define LOCAL_SIZE 8
#endif // LOCAL_SIZE

kernel void test( global int* in, global int* out )
{   
  int id = get_local_id(0);

  local int indx[LOCAL_SIZE];

  int temp = id;
  for (int i = id; i &amp;lt; 1024; i += LOCAL_SIZE)
  {
    temp = in&lt;I&gt; &amp;lt; in[temp] ? i : temp;
  }
  indx[id] = temp;
  barrier(CLK_LOCAL_MEM_FENCE);

  for(int i = LOCAL_SIZE / 2; i!= 0; i&amp;gt;&amp;gt;=1)
  {
    int tmp = indx[id]; 
    int val = in[tmp];
    barrier(CLK_LOCAL_MEM_FENCE);
    int tmpi = indx[id + i];
    int vali = in[tmpi];
    if(id &amp;lt; i)
    {
      printf( "%4d: %3d, %4d: %3d\n", tmp, val, tmpi, vali );
      indx[id] = val &amp;lt; vali ? tmp : tmpi;
    }
    barrier(CLK_LOCAL_MEM_FENCE);
    if(id == 0) printf("\n");
  }
  out[0] = indx[0];
}&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;I just don't quite understand why the program behaves the way it does.&lt;/P&gt;</description>
    <pubDate>Fri, 03 Jun 2016 10:48:09 GMT</pubDate>
    <dc:creator>Joose_S_</dc:creator>
    <dc:date>2016-06-03T10:48:09Z</dc:date>
    <item>
      <title>Using lookup table on reduction kernel produces incorrect results on GPU</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Using-lookup-table-on-reduction-kernel-produces-incorrect/m-p/1081499#M4653</link>
      <description>&lt;P&gt;I have a kernel that takes in an array of integers and returns the index of the smallest element.&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;#ifndef LOCAL_SIZE
#define LOCAL_SIZE 8
#endif // LOCAL_SIZE

kernel void test( global int* in, global int* out )
{   
  int id = get_local_id(0);

  local int indx[LOCAL_SIZE];

  int temp = id;
  for (int i = id; i &amp;lt; 1024; i += LOCAL_SIZE)
  {
    temp = in&lt;I&gt; &amp;lt; in[temp] ? i : temp;
  }
  indx[id] = temp;
  barrier(CLK_LOCAL_MEM_FENCE);

  for(int i = LOCAL_SIZE / 2; i!= 0; i&amp;gt;&amp;gt;=1)
  {
    if(id &amp;lt; i)
    {
      printf( "%4d: %3d, %4d: %3d\n", indx[id], in[indx[id]], indx[id + i], in[indx[id + i]] );
      indx[id] = in[indx[id]] &amp;lt; in[indx[id+ i]] ? indx[id] : indx[id + i];
    }
    barrier(CLK_LOCAL_MEM_FENCE);
    if(id == 0) printf("\n");
  }
  out[0] = indx[0];
}&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Before the first barrier each work item finds its smallest value and places it into a local buffer. Everything works fine here.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;In the for loop the results from each work item is reduced further to find the result. However the second to last iteration fails on GPU everytime: in[indx[id]] and in[indx[id + i]] both return the same value.&lt;/P&gt;

&lt;P&gt;Operating system: Windows 7 Enterprise&lt;/P&gt;

&lt;P&gt;Device Driver Version: 10.18.14.4280&lt;/P&gt;

&lt;P&gt;Device: Intel HD 4600 &amp;amp; Processor Intel i5-4590&lt;/P&gt;

&lt;P&gt;Works fine on CPU and Nvidia GTX 970&lt;/P&gt;

&lt;P&gt;I've attached the kernel and host code to reproduce&lt;/P&gt;</description>
      <pubDate>Wed, 01 Jun 2016 10:26:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Using-lookup-table-on-reduction-kernel-produces-incorrect/m-p/1081499#M4653</guid>
      <dc:creator>Joose_S_</dc:creator>
      <dc:date>2016-06-01T10:26:43Z</dc:date>
    </item>
    <item>
      <title>I am afraid there might be in</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Using-lookup-table-on-reduction-kernel-produces-incorrect/m-p/1081500#M4654</link>
      <description>&lt;P&gt;I am afraid there might be in race in your code. What about this:&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;#ifndef LOCAL_SIZE
#define LOCAL_SIZE 8
#endif // LOCAL_SIZE

kernel void test( global int* in, global int* out )
{&amp;nbsp;&amp;nbsp; 
&amp;nbsp; int id = get_local_id(0);

&amp;nbsp; local int indx[LOCAL_SIZE];

&amp;nbsp; int temp = id;
&amp;nbsp; for (int i = id; i &amp;lt; 1024; i += LOCAL_SIZE)
&amp;nbsp; {
&amp;nbsp;&amp;nbsp;&amp;nbsp; temp = in&lt;I&gt; &amp;lt; in[temp] ? i : temp;
&amp;nbsp; }
&amp;nbsp; indx[id] = temp;
&amp;nbsp; barrier(CLK_LOCAL_MEM_FENCE);

&amp;nbsp; for(int i = LOCAL_SIZE / 2; i!= 0; i&amp;gt;&amp;gt;=1)
&amp;nbsp; {
&amp;nbsp;&amp;nbsp;&amp;nbsp; int tmp = indx[id], tmpi = indx[id + i];
&amp;nbsp;   int val = in[tmp], vali = in[tmpi];
&amp;nbsp;   barrier(CLK_LOCAL_MEM_FENCE);
&amp;nbsp;&amp;nbsp;&amp;nbsp; if(id &amp;lt; i)
&amp;nbsp;&amp;nbsp;&amp;nbsp; {
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; printf( "%4d: %3d, %4d: %3d\n", tmp, val, tmpi, vali );
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; indx[id] = val &amp;lt; vali ? tmp : tmpi;
&amp;nbsp;&amp;nbsp;&amp;nbsp; }
&amp;nbsp;&amp;nbsp;&amp;nbsp; barrier(CLK_LOCAL_MEM_FENCE);
&amp;nbsp;&amp;nbsp;&amp;nbsp; if(id == 0) printf("\n");
&amp;nbsp; }
&amp;nbsp; out[0] = indx[0];
}&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 01 Jun 2016 21:50:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Using-lookup-table-on-reduction-kernel-produces-incorrect/m-p/1081500#M4654</guid>
      <dc:creator>Robert_I_Intel</dc:creator>
      <dc:date>2016-06-01T21:50:00Z</dc:date>
    </item>
    <item>
      <title>Quote:Robert I. (Intel) wrote</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Using-lookup-table-on-reduction-kernel-produces-incorrect/m-p/1081501#M4655</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Robert I. (Intel) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;I am afraid there might be in race in your code. What about this:&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Nope, the issue still persist even with this change.&lt;/P&gt;</description>
      <pubDate>Thu, 02 Jun 2016 06:03:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Using-lookup-table-on-reduction-kernel-produces-incorrect/m-p/1081501#M4655</guid>
      <dc:creator>Joose_S_</dc:creator>
      <dc:date>2016-06-02T06:03:00Z</dc:date>
    </item>
    <item>
      <title>However doing this fixed the</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Using-lookup-table-on-reduction-kernel-produces-incorrect/m-p/1081502#M4656</link>
      <description>&lt;P&gt;However doing this fixed the problem:&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;#ifndef LOCAL_SIZE
#define LOCAL_SIZE 8
#endif // LOCAL_SIZE

kernel void test( global int* in, global int* out )
{   
  int id = get_local_id(0);

  local int indx[LOCAL_SIZE];

  int temp = id;
  for (int i = id; i &amp;lt; 1024; i += LOCAL_SIZE)
  {
    temp = in&lt;I&gt; &amp;lt; in[temp] ? i : temp;
  }
  indx[id] = temp;
  barrier(CLK_LOCAL_MEM_FENCE);

  for(int i = LOCAL_SIZE / 2; i!= 0; i&amp;gt;&amp;gt;=1)
  {
    int tmp = indx[id]; 
    int val = in[tmp];
    barrier(CLK_LOCAL_MEM_FENCE);
    int tmpi = indx[id + i];
    int vali = in[tmpi];
    if(id &amp;lt; i)
    {
      printf( "%4d: %3d, %4d: %3d\n", tmp, val, tmpi, vali );
      indx[id] = val &amp;lt; vali ? tmp : tmpi;
    }
    barrier(CLK_LOCAL_MEM_FENCE);
    if(id == 0) printf("\n");
  }
  out[0] = indx[0];
}&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;I just don't quite understand why the program behaves the way it does.&lt;/P&gt;</description>
      <pubDate>Fri, 03 Jun 2016 10:48:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Using-lookup-table-on-reduction-kernel-produces-incorrect/m-p/1081502#M4656</guid>
      <dc:creator>Joose_S_</dc:creator>
      <dc:date>2016-06-03T10:48:09Z</dc:date>
    </item>
    <item>
      <title>Ok, looks like a compiler</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Using-lookup-table-on-reduction-kernel-produces-incorrect/m-p/1081503#M4657</link>
      <description>&lt;P&gt;Ok, looks like a compiler team needs to take a look at this one :)&lt;/P&gt;</description>
      <pubDate>Tue, 07 Jun 2016 01:30:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Using-lookup-table-on-reduction-kernel-produces-incorrect/m-p/1081503#M4657</guid>
      <dc:creator>Robert_I_Intel</dc:creator>
      <dc:date>2016-06-07T01:30:39Z</dc:date>
    </item>
    <item>
      <title>Hi Robert,</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Using-lookup-table-on-reduction-kernel-produces-incorrect/m-p/1081504#M4658</link>
      <description>Hi Robert,

Have you found out anything regarding the issue?</description>
      <pubDate>Mon, 20 Jun 2016 06:59:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Using-lookup-table-on-reduction-kernel-produces-incorrect/m-p/1081504#M4658</guid>
      <dc:creator>Joose_S_</dc:creator>
      <dc:date>2016-06-20T06:59:46Z</dc:date>
    </item>
    <item>
      <title>Hi Joose,</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Using-lookup-table-on-reduction-kernel-produces-incorrect/m-p/1081505#M4659</link>
      <description>&lt;P&gt;Hi Joose,&lt;/P&gt;

&lt;P&gt;I showed this to the compiler folks and they think this is a compiler bug. Will file a bug. Thanks!&lt;/P&gt;</description>
      <pubDate>Tue, 21 Jun 2016 14:42:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Using-lookup-table-on-reduction-kernel-produces-incorrect/m-p/1081505#M4659</guid>
      <dc:creator>Robert_I_Intel</dc:creator>
      <dc:date>2016-06-21T14:42:06Z</dc:date>
    </item>
  </channel>
</rss>

