Does autovectorization work on below kernel?
__kernel void vec_add(__global const float* in1, __global const float* in2, __global float* out)
{
int i=get_global_id(0);
int j=(i<<2);
out
out[j+1]=in1[j+1]+in2[j+1];
out[j+2]=in1[j+2]+in2[j+2];
out[j+3]=in1[j+3]+in2[j+3];
}
thanks,
Jeffrey
Link Copied
Hi Jeffrey,
Yes, but you may want to use float4 data type instead. Please see my reply to your other post.
For more complete information about compiler optimizations, see our Optimization Notice.