<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Kernel returns wrong results in OpenCL* for CPU</title>
    <link>https://community.intel.com/t5/OpenCL-for-CPU/Kernel-returns-wrong-results/m-p/781079#M384</link>
    <description>Hello,&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;I created a kernel for summing up some small matrices. The operation is the same for a large set of such matrices. When compiling the kernel, then compiler generates a kernel-object. The compiler says that the kernel was not vectorized.When I execute the kernel, the results are just wrong.&lt;/DIV&gt;&lt;DIV&gt;Running the same code using the AMD OpenCL SKD gives correct results.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;The kernel looks like this:&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;__kernel void calcAxA(&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt; const int n,&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt; const int n0,&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt; const int m,&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt; const int nm,&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt; const __global int*  nmMask,&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt; const __global double* nmJ,&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt; const __global double* nmE,&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt; __global double* AxA,&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt; __global double* AxE)&lt;/DIV&gt;&lt;DIV&gt;{&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;int j  = get_global_id(0);&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;int j0  = j - n0;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;if (j0 &amp;lt; 0)&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;		&lt;/SPAN&gt;return;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;double axeT[6];&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;double axaT[6*6];&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;for (int i = 0; i &amp;lt; 6 * 6; ++i) axaT&lt;I&gt; = 0.0;&lt;/I&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;for (int i = 0; i &amp;lt; 6; ++i) axeT&lt;I&gt; = 0.0;&lt;/I&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;// Sum up in local variables&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;for (int i = 0; i &amp;lt; m; ++i)&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;{&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;		&lt;/SPAN&gt;int ij = nmMask[i * n + j];&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;		&lt;/SPAN&gt;if (ij == -1) continue;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;		&lt;/SPAN&gt;int r0 = ij * nParams;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;		&lt;/SPAN&gt;int r1 = (nm + ij) * nParams;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;		&lt;/SPAN&gt;for (int r = 0; r &amp;lt; 6; ++r) {&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;			&lt;/SPAN&gt;for (int c = 0; c &amp;lt; 6; ++c) {&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;				&lt;/SPAN&gt;axaT[6 * r + c] += nmJ[r0 + c] * nmJ[r0 + r] + nmJ[r1 + c] * nmJ[r1 + r];&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;			&lt;/SPAN&gt;}&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;			&lt;/SPAN&gt;axeT&lt;R&gt; += nmJ[r0 + r] * e[2 * ij + 0] + nmJ[r1 + r] * nmE[2 * ij + 1];&lt;/R&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;		&lt;/SPAN&gt;}&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;}&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;// Assign sums to global arrays&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;for (int i = 0; i &amp;lt; 6; ++i)&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;{&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;		&lt;/SPAN&gt;for (int k = 0; k &amp;lt; 6; ++k)&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;		&lt;/SPAN&gt;{&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;			&lt;/SPAN&gt;AxA[6 * j0 + (n - n0) * i * 6 + k] = axaT[6 * i + k];&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;		&lt;/SPAN&gt;}&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;		&lt;/SPAN&gt;AxE[6 * j + i] = axeT&lt;I&gt;;&lt;/I&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;}&lt;/DIV&gt;&lt;DIV&gt;}&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV id="_mcePaste"&gt;Other topic:&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;When compiling the cl code, the Intel OpenCL SDK returns the message:&lt;/DIV&gt;&lt;DIV&gt;:1:26: warning: expected identifier in '#pragma OPENCL' - ignored&lt;/DIV&gt;&lt;DIV&gt;for the line&lt;/DIV&gt;&lt;DIV&gt;#pragma OPENCL EXTENSION cl_khr_fp64 : enable.&lt;/DIV&gt;&lt;DIV&gt;But I can't find the problem causing the error message. But looking at other posts, the message seems to be pretty common.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Any ideas?&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Thanks,&lt;/DIV&gt;&lt;DIV&gt;Rasmus&lt;/DIV&gt;</description>
    <pubDate>Thu, 29 Mar 2012 20:14:25 GMT</pubDate>
    <dc:creator>Rasmus_Debitsch</dc:creator>
    <dc:date>2012-03-29T20:14:25Z</dc:date>
    <item>
      <title>Kernel returns wrong results</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Kernel-returns-wrong-results/m-p/781079#M384</link>
      <description>Hello,&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;I created a kernel for summing up some small matrices. The operation is the same for a large set of such matrices. When compiling the kernel, then compiler generates a kernel-object. The compiler says that the kernel was not vectorized.When I execute the kernel, the results are just wrong.&lt;/DIV&gt;&lt;DIV&gt;Running the same code using the AMD OpenCL SKD gives correct results.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;The kernel looks like this:&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;__kernel void calcAxA(&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt; const int n,&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt; const int n0,&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt; const int m,&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt; const int nm,&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt; const __global int*  nmMask,&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt; const __global double* nmJ,&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt; const __global double* nmE,&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt; __global double* AxA,&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt; __global double* AxE)&lt;/DIV&gt;&lt;DIV&gt;{&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;int j  = get_global_id(0);&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;int j0  = j - n0;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;if (j0 &amp;lt; 0)&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;		&lt;/SPAN&gt;return;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;double axeT[6];&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;double axaT[6*6];&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;for (int i = 0; i &amp;lt; 6 * 6; ++i) axaT&lt;I&gt; = 0.0;&lt;/I&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;for (int i = 0; i &amp;lt; 6; ++i) axeT&lt;I&gt; = 0.0;&lt;/I&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;// Sum up in local variables&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;for (int i = 0; i &amp;lt; m; ++i)&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;{&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;		&lt;/SPAN&gt;int ij = nmMask[i * n + j];&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;		&lt;/SPAN&gt;if (ij == -1) continue;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;		&lt;/SPAN&gt;int r0 = ij * nParams;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;		&lt;/SPAN&gt;int r1 = (nm + ij) * nParams;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;		&lt;/SPAN&gt;for (int r = 0; r &amp;lt; 6; ++r) {&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;			&lt;/SPAN&gt;for (int c = 0; c &amp;lt; 6; ++c) {&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;				&lt;/SPAN&gt;axaT[6 * r + c] += nmJ[r0 + c] * nmJ[r0 + r] + nmJ[r1 + c] * nmJ[r1 + r];&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;			&lt;/SPAN&gt;}&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;			&lt;/SPAN&gt;axeT&lt;R&gt; += nmJ[r0 + r] * e[2 * ij + 0] + nmJ[r1 + r] * nmE[2 * ij + 1];&lt;/R&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;		&lt;/SPAN&gt;}&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;}&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;// Assign sums to global arrays&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;for (int i = 0; i &amp;lt; 6; ++i)&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;{&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;		&lt;/SPAN&gt;for (int k = 0; k &amp;lt; 6; ++k)&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;		&lt;/SPAN&gt;{&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;			&lt;/SPAN&gt;AxA[6 * j0 + (n - n0) * i * 6 + k] = axaT[6 * i + k];&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;		&lt;/SPAN&gt;}&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;		&lt;/SPAN&gt;AxE[6 * j + i] = axeT&lt;I&gt;;&lt;/I&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;}&lt;/DIV&gt;&lt;DIV&gt;}&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV id="_mcePaste"&gt;Other topic:&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;When compiling the cl code, the Intel OpenCL SDK returns the message:&lt;/DIV&gt;&lt;DIV&gt;:1:26: warning: expected identifier in '#pragma OPENCL' - ignored&lt;/DIV&gt;&lt;DIV&gt;for the line&lt;/DIV&gt;&lt;DIV&gt;#pragma OPENCL EXTENSION cl_khr_fp64 : enable.&lt;/DIV&gt;&lt;DIV&gt;But I can't find the problem causing the error message. But looking at other posts, the message seems to be pretty common.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Any ideas?&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Thanks,&lt;/DIV&gt;&lt;DIV&gt;Rasmus&lt;/DIV&gt;</description>
      <pubDate>Thu, 29 Mar 2012 20:14:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Kernel-returns-wrong-results/m-p/781079#M384</guid>
      <dc:creator>Rasmus_Debitsch</dc:creator>
      <dc:date>2012-03-29T20:14:25Z</dc:date>
    </item>
    <item>
      <title>Kernel returns wrong results</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Kernel-returns-wrong-results/m-p/781080#M385</link>
      <description>I am sure you did this but just to confirm you do have "#pragma OPENCL EXTENSION cl_khr_fp64 : enable" at the top of your ".cl" file right? This is required to enable double precision support as conformant to the extension spec:&lt;BR /&gt;&lt;BR /&gt;"OpenCL 1.0 adds support for double precision floating-point as an optional extension. An application that wants to use double will need to include the #pragma OPENCL EXTENSION cl_khr_fp64 : enable directive before any double precision data type is declared in the kernel code."&lt;BR /&gt;&lt;BR /&gt;I am guessing you did this but the compile seems to be sayaing it didn't vectorize your code because you are using double precision support without enabling it.</description>
      <pubDate>Fri, 06 Apr 2012 17:04:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Kernel-returns-wrong-results/m-p/781080#M385</guid>
      <dc:creator>Jim_Vaughn</dc:creator>
      <dc:date>2012-04-06T17:04:17Z</dc:date>
    </item>
    <item>
      <title>Kernel returns wrong results</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Kernel-returns-wrong-results/m-p/781081#M386</link>
      <description>Yes, the&lt;DIV id="_mcePaste"&gt;#pragma OPENCL EXTENSION cl_khr_fp64 : enable&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;is at the top of my .cl file. The .cl file contains some more kernels using double data. The compiler vectorizes the other kernels and executing them gives the expected results. But the kernel shown above is not vectorized and returns wrong results.&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Rasmus&lt;/DIV&gt;</description>
      <pubDate>Sat, 07 Apr 2012 20:21:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Kernel-returns-wrong-results/m-p/781081#M386</guid>
      <dc:creator>Rasmus_Debitsch</dc:creator>
      <dc:date>2012-04-07T20:21:27Z</dc:date>
    </item>
    <item>
      <title>Kernel returns wrong results</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Kernel-returns-wrong-results/m-p/781082#M387</link>
      <description>OK - found it finally. It was the improper usage ofCL_MEM_USE_HOST_PTR. If used correctly, everything works as expected.&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Rasmus&lt;/DIV&gt;</description>
      <pubDate>Fri, 27 Apr 2012 17:35:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Kernel-returns-wrong-results/m-p/781082#M387</guid>
      <dc:creator>Rasmus_Debitsch</dc:creator>
      <dc:date>2012-04-27T17:35:18Z</dc:date>
    </item>
  </channel>
</rss>

