setting work_group_size crashes OpenCL on Intel CPU

SDyul · ‎08-16-2015

Hi

I am transfering the reduction kernel from amd app sdk.

It requires setting work_group_size when you execute

clEnqueueNDRangeKernel with local_work_size that is different from 8 it crashes directly in tbb on Intel OpenCL for Intel CPU. The clEnqueueNDRRange successfully launches the kernel.

When you request work_group_size from the device it returns 8192 (should be 8 in this case) and the kernel work group size is 2048. It crashes with both settings.

Works only with the number of the cores.

I have Intel Haswell 4770K.

I have global_size = 4096;

Intel 4600 GPU works fine for all different sizes according to spec.

The project is located here:

https://github.com/kingofthebongo2008/dare12_opencl

the file that launches the kernel is located here:

https://github.com/kingofthebongo2008/dare12_opencl/blob/master/src/freeform_converged.h

Robert_I_Intel · ‎08-17-2015

Hi Stefan,

CPU OpenCL version is very unforgiving if you are accessing global or local data out of bounds. I suspect that this is exactly what's going on. For example, in your kernel you have the following line

sdata[tid] += sdata[tid + s];

tid is unsigned int tid = get_local_id(0);

initial value of s is unsigned int s = localSize >> 1;

So, if your local size is 8, for the last work item in a workgroup, tid is 7 and s is 4, so tid+s is 11, but the size of sdata is 8.

Same problem with the following line:

sdata[tid] = input[stride] + input[stride + 1];

I believe you global memory accesses go out of bound too. Please size your global and local memory such that you don't access your data out of bounds. Otherwise, things will crash on the CPU.