Hi,
I'm having a kernel defined as below:
__kernel
__attribute__((max_work_group_size(LRN_MAX_LOCAL_SIZE, 1, 1)))
void lrn(
// Params Ports
uchar data_dim1,
uchar data_dim2,
char frac_dout,
// Data Ports
__global lane_data *restrict bottom,
__global lane_data *restrict top) {
..........
}
in this code, LRN_MAX_LOCAL_SIZE is equal to 64. In the host side, I push my kernel as such:
status = clEnqueueNDRangeKernel(que_memWr[i], knl_lrn[i], 3, NULL, knl_lrn_global_size, knl_lrn_local_size, 0, NULL, &lrn_event[i]);
checkError(status, "Failed to launch kernel lrn");
Both `knl_lrn_global_size` and `knl_lrn_local_size` are arrays of size 3. Here is the content of both arrays:
Global 27, 27, 24. Local 1, 1, 24
For me, everything seems to be fine, but I'm not sure why I'm getting CL_INVALID_WORK_GROUP_SIZE. Let me mentions that everything works fine in the emulation mode, but it fails in the real executions.
I'm using OpenCL compiler version 18.1-pro. Also I'm using Nallatech p385a.
Anyone has any idea what I'm doing wrong?
Thanks,
Saman
- 標籤:
- OpenCL™
連結已複製
You are setting the limit for the third dimension of the work-group to 1 in the kernel, but you are supplying 24 in the host for that dimension. I believe this is the reason behind the error you are getting. The emulator probably ignores these attributes altogether and hence, you don't get any errors there.
