OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1686 Discussions

Kernel returns wrong results

I created a kernel for summing up some small matrices. The operation is the same for a large set of such matrices. When compiling the kernel, then compiler generates a kernel-object. The compiler says that the kernel was not vectorized.When I execute the kernel, the results are just wrong.
Running the same code using the AMD OpenCL SKD gives correct results.
The kernel looks like this:
__kernel void calcAxA(
const int n,
const int n0,
const int m,
const int nm,
const __global int* nmMask,
const __global double* nmJ,
const __global double* nmE,
__global double* AxA,
__global double* AxE)
int j = get_global_id(0);
int j0 = j - n0;
if (j0 < 0)
double axeT[6];
double axaT[6*6];
for (int i = 0; i < 6 * 6; ++i) axaT = 0.0;
for (int i = 0; i < 6; ++i) axeT = 0.0;
// Sum up in local variables
for (int i = 0; i < m; ++i)
int ij = nmMask[i * n + j];
if (ij == -1) continue;
int r0 = ij * nParams;
int r1 = (nm + ij) * nParams;
for (int r = 0; r < 6; ++r) {
for (int c = 0; c < 6; ++c) {
axaT[6 * r + c] += nmJ[r0 + c] * nmJ[r0 + r] + nmJ[r1 + c] * nmJ[r1 + r];
axeT += nmJ[r0 + r] * e[2 * ij + 0] + nmJ[r1 + r] * nmE[2 * ij + 1];
// Assign sums to global arrays
for (int i = 0; i < 6; ++i)
for (int k = 0; k < 6; ++k)
AxA[6 * j0 + (n - n0) * i * 6 + k] = axaT[6 * i + k];
AxE[6 * j + i] = axeT;
Other topic:
When compiling the cl code, the Intel OpenCL SDK returns the message:
:1:26: warning: expected identifier in '#pragma OPENCL' - ignored
for the line
#pragma OPENCL EXTENSION cl_khr_fp64 : enable.
But I can't find the problem causing the error message. But looking at other posts, the message seems to be pretty common.
Any ideas?
0 Kudos
3 Replies
I am sure you did this but just to confirm you do have "#pragma OPENCL EXTENSION cl_khr_fp64 : enable" at the top of your ".cl" file right? This is required to enable double precision support as conformant to the extension spec:

"OpenCL 1.0 adds support for double precision floating-point as an optional extension. An application that wants to use double will need to include the #pragma OPENCL EXTENSION cl_khr_fp64 : enable directive before any double precision data type is declared in the kernel code."

I am guessing you did this but the compile seems to be sayaing it didn't vectorize your code because you are using double precision support without enabling it.
Yes, the
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
is at the top of my .cl file. The .cl file contains some more kernels using double data. The compiler vectorizes the other kernels and executing them gives the expected results. But the kernel shown above is not vectorized and returns wrong results.
OK - found it finally. It was the improper usage ofCL_MEM_USE_HOST_PTR. If used correctly, everything works as expected.