I'm facing a problem on workgroup size definitions during a session of kernel analysis, I'd like to know if it is possible to benchmark all the combinations of local work size possible.
For example, if I want to test the combinations of local sizes between 1,23,50 and 100 I put these values :
And I get :
I don't have any analysis for (25,25) for example, or (50,100). How should I put the values ? Is it possible ?
If the local sizes multiplication exceeds the CL_KERNEL_WORK_GROUP_SIZE than this configuration is omitted.
25X25 = 625, 50X100 = 5000 which is larger than 512 which is the CL_KERNEL_WORK_GROUP_SIZE on the GPU for example.
On the CPU it is 8192 so those configuration should appear.
You can check out the spe in https://www.khronos.org/registry/cl/specs/opencl-2.0.pdf, page 222.
Are you running on the CPU or GPU?