<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic What's wrong with this kernel? in OpenCL* for CPU</title>
    <link>https://community.intel.com/t5/OpenCL-for-CPU/What-s-wrong-with-this-kernel/m-p/1036152#M3702</link>
    <description>&lt;PRE class="brush:cpp;"&gt;struct RGB16
{
  ushort R, G, B;
};

__kernel void scale2D_3u16(__global ushort* restrict in, __global ushort* restrict out)
{
  int xPos = get_global_id(0);
  int yPos = get_global_id(1);
  int p = yPos*get_global_size(0)+xPos;
  
  __global struct RGB16* pRGBIn = (__global struct RGB16*)in;
  __global struct RGB16* pRGBOut = (__global struct RGB16*)out;
  pRGBOut&lt;P&gt; = pRGBIn&lt;/P&gt;&lt;P&gt;;
}
&lt;/P&gt;&lt;/PRE&gt;

&lt;P&gt;Input data (memory view) is:&lt;/P&gt;

&lt;P&gt;0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4 ...&lt;/P&gt;

&lt;P&gt;Using NVidia and AMD I get the same result as output. But using Intel OCL on HD4600 I get:&lt;/P&gt;

&lt;P&gt;0, 0, 0, 0, 0, 1, 2, 2, 2, 0, 0, 3, 4, 4, 4, ...&lt;/P&gt;

&lt;P&gt;I'm totally confused!&lt;/P&gt;</description>
    <pubDate>Sat, 25 Oct 2014 13:03:11 GMT</pubDate>
    <dc:creator>renegr</dc:creator>
    <dc:date>2014-10-25T13:03:11Z</dc:date>
    <item>
      <title>What's wrong with this kernel?</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/What-s-wrong-with-this-kernel/m-p/1036152#M3702</link>
      <description>&lt;PRE class="brush:cpp;"&gt;struct RGB16
{
  ushort R, G, B;
};

__kernel void scale2D_3u16(__global ushort* restrict in, __global ushort* restrict out)
{
  int xPos = get_global_id(0);
  int yPos = get_global_id(1);
  int p = yPos*get_global_size(0)+xPos;
  
  __global struct RGB16* pRGBIn = (__global struct RGB16*)in;
  __global struct RGB16* pRGBOut = (__global struct RGB16*)out;
  pRGBOut&lt;P&gt; = pRGBIn&lt;/P&gt;&lt;P&gt;;
}
&lt;/P&gt;&lt;/PRE&gt;

&lt;P&gt;Input data (memory view) is:&lt;/P&gt;

&lt;P&gt;0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4 ...&lt;/P&gt;

&lt;P&gt;Using NVidia and AMD I get the same result as output. But using Intel OCL on HD4600 I get:&lt;/P&gt;

&lt;P&gt;0, 0, 0, 0, 0, 1, 2, 2, 2, 0, 0, 3, 4, 4, 4, ...&lt;/P&gt;

&lt;P&gt;I'm totally confused!&lt;/P&gt;</description>
      <pubDate>Sat, 25 Oct 2014 13:03:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/What-s-wrong-with-this-kernel/m-p/1036152#M3702</guid>
      <dc:creator>renegr</dc:creator>
      <dc:date>2014-10-25T13:03:11Z</dc:date>
    </item>
    <item>
      <title>Additional note, using</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/What-s-wrong-with-this-kernel/m-p/1036153#M3703</link>
      <description>&lt;P&gt;Additional note, using following code works as expected.&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;  pRGBOut&lt;P&gt;.R = pRGBIn&lt;/P&gt;&lt;P&gt;.R;
  pRGBOut&lt;/P&gt;&lt;P&gt;.G = pRGBIn&lt;/P&gt;&lt;P&gt;.G;
  pRGBOut&lt;/P&gt;&lt;P&gt;.B = pRGBIn&lt;/P&gt;&lt;P&gt;.B;
&lt;/P&gt;&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 25 Oct 2014 18:48:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/What-s-wrong-with-this-kernel/m-p/1036153#M3703</guid>
      <dc:creator>renegr</dc:creator>
      <dc:date>2014-10-25T18:48:04Z</dc:date>
    </item>
    <item>
      <title>The problem is that the way</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/What-s-wrong-with-this-kernel/m-p/1036154#M3704</link>
      <description>&lt;P&gt;The problem is that the way you wrote you code in the first example, two adjacent work items are reading and writing overlapping data, and there is no guarantee of the order in which the work items will write their data out. In the second case you just got lucky :) Technically, you would need barriers between those instructions in the second case to guarantee the results.&lt;/P&gt;

&lt;P&gt;To fix, try to make sure that work items read and write non-overlapping pieces of memory.&lt;/P&gt;</description>
      <pubDate>Mon, 27 Oct 2014 00:02:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/What-s-wrong-with-this-kernel/m-p/1036154#M3704</guid>
      <dc:creator>Robert_I_Intel</dc:creator>
      <dc:date>2014-10-27T00:02:08Z</dc:date>
    </item>
    <item>
      <title>Thank you Robert,</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/What-s-wrong-with-this-kernel/m-p/1036155#M3705</link>
      <description>&lt;P&gt;Thank you Robert,&lt;/P&gt;

&lt;P&gt;unfortunately&amp;nbsp;I don't see&amp;nbsp;any overlapping&amp;nbsp;within&amp;nbsp;this code. Maybe I'm totally blind. It is nearly the same code as I use within a "parallel_for()". I could rewrite the code to:&lt;BR /&gt;
	&amp;nbsp;&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;__kernel void scale2D_3u16(__global ushort* restrict in, __global ushort* restrict out)
{
&amp;nbsp; int xPos = get_global_id(0);
&amp;nbsp; int yPos = get_global_id(1);
&amp;nbsp; int p = (yPos*get_global_size(0)+xPos)*3;

&amp;nbsp; out[p+0] = in[p+0];
&amp;nbsp; out[p+1] = in[p+1];
&amp;nbsp; out[p+2] = in[p+2];
}&lt;/PRE&gt;

&lt;P&gt;I think this overlapping occurs because the addresses of pRGBOut/pRGBIn are not multiples of 4/8/16. Is this correct?&lt;/P&gt;</description>
      <pubDate>Mon, 27 Oct 2014 09:53:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/What-s-wrong-with-this-kernel/m-p/1036155#M3705</guid>
      <dc:creator>renegr</dc:creator>
      <dc:date>2014-10-27T09:53:00Z</dc:date>
    </item>
    <item>
      <title>Hi Rene,</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/What-s-wrong-with-this-kernel/m-p/1036156#M3706</link>
      <description>&lt;P&gt;Hi Rene,&lt;/P&gt;

&lt;P&gt;My bad: there is no issue with your original code - the problem is with the Intel compiler and how it handles structs, so indeed the workaround is&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="color: rgb(0, 0, 0); font-family: Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace; line-height: 14.3088006973267px; font-size: 1em;"&gt;pRGBOut&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;.R = pRGBIn&lt;/P&gt;&lt;P&gt;.R;&lt;/P&gt;

&lt;DIV class="line alt1" style="line-height: 14.3088006973267px; font-family: Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace; color: rgb(96, 96, 96); margin: 0px !important; padding: 0px !important; border: 0px !important; outline: 0px !important; float: none !important; vertical-align: baseline !important; position: static !important; left: auto !important; top: auto !important; right: auto !important; bottom: auto !important; height: auto !important; width: auto !important; min-height: inherit !important; background-image: none !important; background-attachment: initial !important; background-size: initial !important; background-origin: initial !important; background-clip: initial !important; background-position: initial !important; background-repeat: initial !important;"&gt;
	&lt;P&gt;&lt;SPAN style="color: rgb(0, 0, 0); font-family: Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace; line-height: 14.3088006973267px; background-color: rgb(248, 248, 248);"&gt;pRGBOut&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;.G = pRGBIn&lt;/P&gt;&lt;P&gt;.G;&lt;/P&gt;

	&lt;P&gt;&lt;SPAN style="color: rgb(0, 0, 0); font-family: Consolas, 'Bitstream Vera Sans Mono', 'Courier New', Courier, monospace; line-height: 14.3088006973267px;"&gt;pRGBOut&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;.B = pRGBIn&lt;/P&gt;&lt;P&gt;.B;&lt;/P&gt;

	&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Mon, 27 Oct 2014 14:25:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/What-s-wrong-with-this-kernel/m-p/1036156#M3706</guid>
      <dc:creator>Robert_I_Intel</dc:creator>
      <dc:date>2014-10-27T14:25:00Z</dc:date>
    </item>
    <item>
      <title>BTW, your last proposal</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/What-s-wrong-with-this-kernel/m-p/1036157#M3707</link>
      <description>&lt;P&gt;BTW, your last proposal should work as well.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 27 Oct 2014 14:33:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/What-s-wrong-with-this-kernel/m-p/1036157#M3707</guid>
      <dc:creator>Robert_I_Intel</dc:creator>
      <dc:date>2014-10-27T14:33:15Z</dc:date>
    </item>
  </channel>
</rss>

