1. I have a global work size of 1024 by 1024.
2. I set the local work size to 16 by 16.
3. My CPU opnecl device has a maximum work-group-size of 8192.
4. I call clEnqueueNDRangeKernel with the desired local-work-size (along with all other necessary parameters)
5. I call:
a. clGetKernelWorkGroupInfo(kernel, device, CL_KERNEL_WORK_GROUP_SIZE, sizeof(size_t), (void*)&workGroupSizeUsed, NULL);
b. clGetKernelWorkGroupInfo(kernel, device, CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE, sizeof(size_t), (void*)&workGroupSizeUsed, NULL);
6. Both calls return 8192. How is this possible?
My expectation is 16 - the value that I passed to it.
Querying clGetKernelWorkGroupInfo with CL_KERNEL_WORK_GROUP_SIZE returns the maximum work group size supported for that kernel as determined by its resource utilization (e.g., private, local memory) or kernel attribute max_work_group_size. On the other hand, CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE is a performance hint often used to map to a device's underlying SIMD architecture. The two may return the same value, and CPUs seem like they would benefit from large work group sizes more than small work group sizes.