I've found a bug in the 2013 beta Linux build: when using clSetKernelArg to set an int3 argument, it fails if sizeof(cl_int3) is used as the size. It accepts 3 * sizeof(cl_int), but the CL spec is pretty clear that the API type corresponding to int3 is cl_int3, not cl_int (there's a table at the start of section 6.1.2 in the CL 1.1 spec). The sizes are different because cl_int3 is padded to 4*sizeof(cl_int).
Both the NVIDIA and AMD OpenCL implementations expect the size to be sizeof(cl_int3).
Raghupathi Muthyalampalli (Intel) wrote:Actually I think the part in parentheses supports my point: the size of an int3 is 16, not 12, yet clSetKernelArg only works when I pass a size of 12.
"When applied to an operand that is a vector type, the result is number of components * size of each scalar component (Except for 3-component vectors whose size is defined as 4 * size of each scalar component.)"