- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

I created a kernel for summing up some small matrices. The operation is the same for a large set of such matrices. When compiling the kernel, then compiler generates a kernel-object. The compiler says that the kernel was not vectorized.When I execute the kernel, the results are just wrong.

Running the same code using the AMD OpenCL SKD gives correct results.

The kernel looks like this:

__kernel void calcAxA(

const int n,

const int n0,

const int m,

const int nm,

const __global int* nmMask,

const __global double* nmJ,

const __global double* nmE,

__global double* AxA,

__global double* AxE)

{

int j = get_global_id(0);

int j0 = j - n0;

if (j0 < 0)

return;

double axeT[6];

double axaT[6*6];

for (int i = 0; i < 6 * 6; ++i) axaT

*= 0.0;* for (int i = 0; i < 6; ++i) axeT

*= 0.0;* // Sum up in local variables

for (int i = 0; i < m; ++i)

{

int ij = nmMask[i * n + j];

if (ij == -1) continue;

int r0 = ij * nParams;

int r1 = (nm + ij) * nParams;

for (int r = 0; r < 6; ++r) {

for (int c = 0; c < 6; ++c) {

axaT[6 * r + c] += nmJ[r0 + c] * nmJ[r0 + r] + nmJ[r1 + c] * nmJ[r1 + r];

}

axeT += nmJ[r0 + r] * e[2 * ij + 0] + nmJ[r1 + r] * nmE[2 * ij + 1];

}

}

// Assign sums to global arrays

for (int i = 0; i < 6; ++i)

{

for (int k = 0; k < 6; ++k)

{

AxA[6 * j0 + (n - n0) * i * 6 + k] = axaT[6 * i + k];

}

AxE[6 * j + i] = axeT

*;* }

}

Other topic:

When compiling the cl code, the Intel OpenCL SDK returns the message:

:1:26: warning: expected identifier in '#pragma OPENCL' - ignored

for the line

#pragma OPENCL EXTENSION cl_khr_fp64 : enable.

But I can't find the problem causing the error message. But looking at other posts, the message seems to be pretty common.

Any ideas?

Thanks,

Rasmus

Link Copied

3 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

"OpenCL 1.0 adds support for double precision floating-point as an optional extension. An application that wants to use double will need to include the #pragma OPENCL EXTENSION cl_khr_fp64 : enable directive before any double precision data type is declared in the kernel code."

I am guessing you did this but the compile seems to be sayaing it didn't vectorize your code because you are using double precision support without enabling it.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

#pragma OPENCL EXTENSION cl_khr_fp64 : enable

is at the top of my .cl file. The .cl file contains some more kernels using double data. The compiler vectorizes the other kernels and executing them gives the expected results. But the kernel shown above is not vectorized and returns wrong results.Rasmus

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Rasmus

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page