OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1663 Discussions

setting work_group_size crashes OpenCL on Intel CPU



I am transfering the reduction kernel from amd app sdk.

It requires setting work_group_size when you execute

clEnqueueNDRangeKernel  with local_work_size that is different from 8 it crashes directly in tbb on Intel OpenCL for Intel CPU. The clEnqueueNDRRange successfully launches the kernel.

When you request work_group_size from the device it returns 8192 (should be 8 in this case) and the kernel work group size is 2048. It crashes with both settings.

Works only with the number of the cores.

I have Intel Haswell 4770K.

I have global_size = 4096;

Intel 4600 GPU works fine for all different sizes according to spec.

The project is located here:

the file that launches the kernel is located here:







0 Kudos
1 Reply

Hi Stefan,

CPU OpenCL version is very unforgiving if you are accessing global or local data out of bounds. I suspect that this is exactly what's going on. For example, in your kernel you have the following line

 sdata[tid] += sdata[tid + s];

tid is unsigned int tid = get_local_id(0);

initial value of s is unsigned int s = localSize >> 1;

So, if your local size is 8, for the last work item in a workgroup, tid is 7 and s is 4, so tid+s is 11, but the size of sdata is 8.

Same problem with the following line:

sdata[tid] = input[stride] + input[stride + 1];

I believe you global memory accesses go out of bound too. Please size your global and local memory such that you don't access your data out of bounds. Otherwise, things will crash on the CPU.