- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I created a kernel for summing up some small matrices. The operation is the same for a large set of such matrices. When compiling the kernel, then compiler generates a kernel-object. The compiler says that the kernel was not vectorized.When I execute the kernel, the results are just wrong.
Running the same code using the AMD OpenCL SKD gives correct results.
The kernel looks like this:
__kernel void calcAxA(
const int n,
const int n0,
const int m,
const int nm,
const __global int* nmMask,
const __global double* nmJ,
const __global double* nmE,
__global double* AxA,
__global double* AxE)
{
int j = get_global_id(0);
int j0 = j - n0;
if (j0 < 0)
return;
double axeT[6];
double axaT[6*6];
for (int i = 0; i < 6 * 6; ++i) axaT = 0.0;
for (int i = 0; i < 6; ++i) axeT = 0.0;
// Sum up in local variables
for (int i = 0; i < m; ++i)
{
int ij = nmMask[i * n + j];
if (ij == -1) continue;
int r0 = ij * nParams;
int r1 = (nm + ij) * nParams;
for (int r = 0; r < 6; ++r) {
for (int c = 0; c < 6; ++c) {
axaT[6 * r + c] += nmJ[r0 + c] * nmJ[r0 + r] + nmJ[r1 + c] * nmJ[r1 + r];
}
axeT += nmJ[r0 + r] * e[2 * ij + 0] + nmJ[r1 + r] * nmE[2 * ij + 1];
}
}
// Assign sums to global arrays
for (int i = 0; i < 6; ++i)
{
for (int k = 0; k < 6; ++k)
{
AxA[6 * j0 + (n - n0) * i * 6 + k] = axaT[6 * i + k];
}
AxE[6 * j + i] = axeT;
}
}
Other topic:
When compiling the cl code, the Intel OpenCL SDK returns the message:
:1:26: warning: expected identifier in '#pragma OPENCL' - ignored
for the line
#pragma OPENCL EXTENSION cl_khr_fp64 : enable.
But I can't find the problem causing the error message. But looking at other posts, the message seems to be pretty common.
Any ideas?
Thanks,
Rasmus
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am sure you did this but just to confirm you do have "#pragma OPENCL EXTENSION cl_khr_fp64 : enable" at the top of your ".cl" file right? This is required to enable double precision support as conformant to the extension spec:
"OpenCL 1.0 adds support for double precision floating-point as an optional extension. An application that wants to use double will need to include the #pragma OPENCL EXTENSION cl_khr_fp64 : enable directive before any double precision data type is declared in the kernel code."
I am guessing you did this but the compile seems to be sayaing it didn't vectorize your code because you are using double precision support without enabling it.
"OpenCL 1.0 adds support for double precision floating-point as an optional extension. An application that wants to use double will need to include the #pragma OPENCL EXTENSION cl_khr_fp64 : enable directive before any double precision data type is declared in the kernel code."
I am guessing you did this but the compile seems to be sayaing it didn't vectorize your code because you are using double precision support without enabling it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, the
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
is at the top of my .cl file. The .cl file contains some more kernels using double data. The compiler vectorizes the other kernels and executing them gives the expected results. But the kernel shown above is not vectorized and returns wrong results.Rasmus
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK - found it finally. It was the improper usage ofCL_MEM_USE_HOST_PTR. If used correctly, everything works as expected.
Rasmus
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page