OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.

Why do small changes make in kernel code wrong results?

tooguni
Beginner
415 Views

The following kernel (my_kernel()) which is written based on my_function() calculates wrong results on Intel GPU.
The same code works well on Intel CPU and AMD GPU platform.
If you change the type of array index (idx) from unsigned long to unsigned int, the kernel calculates correct results, but I think that both should calculate correct results.
Which is caused this problem? By my code or Intel OpenCL SDK?

const char* kernel_str =
"__kernel \n"
"void my_kernel(__global const unsigned char* src, \n"
"               __global unsigned char*       dst, \n"
"               const unsigned long           elements) \n"
"{ \n"
"  const unsigned long gid = get_global_id(0); \n"
"  const unsigned long idx = 3 * gid; // NG\n"
"  //const unsigned int idx = 3 * gid; // OK\n"
"  if (! (gid < elements)) { \n"
"    return; \n"
"  } \n"
"  float r = ((float)src[idx] + (1.5f * (float)src[idx + 2])) - 18.0f;\n"
"  float g = (((float)src[idx] - (0.4f * (float)src[idx + 1])) - (0.7f * (float)src[idx + 2])) + 14.0f;\n"
"  float b = ((float)src[idx] + (1.8f * (float)src[idx + 1])) - 23.0f;\n"
"  r = clamp(r, 0.0f, 255.0f); \n"
"  g = clamp(g, 0.0f, 255.0f); \n"
"  b = clamp(b, 0.0f, 255.0f); \n"
"  dst[idx + 0] = convert_uchar(r); \n"
"  dst[idx + 1] = convert_uchar(g); \n"
"  dst[idx + 2] = convert_uchar(b); \n"
"}";


void my_function(const cl_uchar*     src,
                 cl_uchar*           dst,
                 const unsigned long elements)
{
  for (unsigned long gid = 0; gid < elements; ++gid) {
    const unsigned long idx = 3 * gid;
    float r = ((float)src[idx] + (1.5f * (float)src[idx + 2])) - 18.0f;
    float g = (((float)src[idx] - (0.4f * (float)src[idx + 1])) - (0.7f * (float)src[idx + 2])) + 14.0f;
    float b = ((float)src[idx] + (1.8f * (float)src[idx + 1])) - 23.0f;
    r = std::min(std::max(0.0f, r), 255.0f); // clamp
    g = std::min(std::max(0.0f, g), 255.0f);
    b = std::min(std::max(0.0f, b), 255.0f);
    dst[idx + 0] = (cl_uchar)r;
    dst[idx + 1] = (cl_uchar)g;
    dst[idx + 2] = (cl_uchar)b;
  }
}

I am using these environments.
OS : Windows 10 Pro 64bit
Device Name : Intel(R) Iris(TM) Pro Graphics 580
Device Driver Version : 21.20.16.4542
Intel OpenCL SDK : 2016 R3

OS : Windows 10 Pro 64bit
Device Name : Intel(R) HD Graphics 530
Device Driver Version : 20.19.15.4501
Intel OpenCL SDK : 2016 R2

I attach a source code and VC project.

Best regards,

0 Kudos
1 Solution
Jeffrey_M_Intel1
Employee
415 Views

Thanks for your great reproducer!  We're processing this as a bug.

 

View solution in original post

0 Kudos
2 Replies
Jeffrey_M_Intel1
Employee
416 Views

Thanks for your great reproducer!  We're processing this as a bug.

 

0 Kudos
tooguni
Beginner
415 Views

Thanks for your reply.

0 Kudos
Reply