Solved: Thanks for your reply.

tooguni · ‎12-26-2016

The following kernel (my_kernel()) which is written based on my_function() calculates wrong results on Intel GPU.
The same code works well on Intel CPU and AMD GPU platform.
If you change the type of array index (idx) from unsigned long to unsigned int, the kernel calculates correct results, but I think that both should calculate correct results.
Which is caused this problem? By my code or Intel OpenCL SDK?

const char* kernel_str =
"__kernel \n"
"void my_kernel(__global const unsigned char* src, \n"
"               __global unsigned char*       dst, \n"
"               const unsigned long           elements) \n"
"{ \n"
"  const unsigned long gid = get_global_id(0); \n"
"  const unsigned long idx = 3 * gid; // NG\n"
"  //const unsigned int idx = 3 * gid; // OK\n"
"  if (! (gid < elements)) { \n"
"    return; \n"
"  } \n"
"  float r = ((float)src[idx] + (1.5f * (float)src[idx + 2])) - 18.0f;\n"
"  float g = (((float)src[idx] - (0.4f * (float)src[idx + 1])) - (0.7f * (float)src[idx + 2])) + 14.0f;\n"
"  float b = ((float)src[idx] + (1.8f * (float)src[idx + 1])) - 23.0f;\n"
"  r = clamp(r, 0.0f, 255.0f); \n"
"  g = clamp(g, 0.0f, 255.0f); \n"
"  b = clamp(b, 0.0f, 255.0f); \n"
"  dst[idx + 0] = convert_uchar(r); \n"
"  dst[idx + 1] = convert_uchar(g); \n"
"  dst[idx + 2] = convert_uchar(b); \n"
"}";


void my_function(const cl_uchar*     src,
                 cl_uchar*           dst,
                 const unsigned long elements)
{
  for (unsigned long gid = 0; gid < elements; ++gid) {
    const unsigned long idx = 3 * gid;
    float r = ((float)src[idx] + (1.5f * (float)src[idx + 2])) - 18.0f;
    float g = (((float)src[idx] - (0.4f * (float)src[idx + 1])) - (0.7f * (float)src[idx + 2])) + 14.0f;
    float b = ((float)src[idx] + (1.8f * (float)src[idx + 1])) - 23.0f;
    r = std::min(std::max(0.0f, r), 255.0f); // clamp
    g = std::min(std::max(0.0f, g), 255.0f);
    b = std::min(std::max(0.0f, b), 255.0f);
    dst[idx + 0] = (cl_uchar)r;
    dst[idx + 1] = (cl_uchar)g;
    dst[idx + 2] = (cl_uchar)b;
  }
}

I am using these environments.
OS : Windows 10 Pro 64bit
Device Name : Intel(R) Iris(TM) Pro Graphics 580
Device Driver Version : 21.20.16.4542
Intel OpenCL SDK : 2016 R3

OS : Windows 10 Pro 64bit
Device Name : Intel(R) HD Graphics 530
Device Driver Version : 20.19.15.4501
Intel OpenCL SDK : 2016 R2

I attach a source code and VC project.

Best regards,

Jeffrey_M_Intel1 · ‎12-30-2016

Thanks for your great reproducer! We're processing this as a bug.

View solution in original post

Jeffrey_M_Intel1 · ‎12-30-2016

Thanks for your great reproducer! We're processing this as a bug.

tooguni · ‎01-10-2017

Thanks for your reply.

Why do small changes make in kernel code wrong results?