OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1687 Discussions

Irregular behaviour of vectors in OpenCL(1.2) kernels


So, I am trying to perform some operation inside an OpenCL kernel. I have this buffer named filter which is a 3x3 matrix initialized with value 1.

I pass this as an argument to the OpenCL kernel from the host side. The issue is when I try to fetch this buffer on the device side as a float3 vector.  For ex -


__kernel void(constant float3* restrict filter)
        float3 temp1 = filter[0];
        float3 temp2 = filter[1];
        float3 temp3 = filter[2];


The first two temp variables behave as expected and have all their value as 1. But, the third temp variable (temp3) has only the x component as 1 and rest of the y and z components are 0. When I fetch the buffer as only a float vector, everything behaves as expected. Am I doing something wrong? I don't want to use vload instructions as they give an overhead.

0 Kudos
3 Replies

Hi Abhi Verma,

Thanks for the interest.

Can you attach a minimally representative reproducer for your observed issue? I took a look at the relevant OCL standard documentation and didn't see anything immediately obvious.

I'd like to see how the kernel is ingesting the constant data. Can you send the host side and the device side program? Can you send build options as well? How are you displaying the data?

What hardware target is this executing on? Which OpenCL implementation is in use (from Intel Graphics Driver version (Windows* OS) or NEO/CPU RT version (Linux* OS))?




Vec3s in OpenCL C are a bit strange - they are three component vectors, but the size of a vec3 is the same as the size of a vec4, and the alignment requirements for a vec3 are the same as alignment requirements for a vec4.  See:

If your matrix data is tightly packed then you'll want to use the vload3 and vstore3 built-ins to load and store your data instead.  Can you say anything more about the overhead you are seeing?  Of the top of my head I can't think of any reason why there should be overhead when loading float data with vload/vstore vs. regular pointer-based loads and stores.  Thanks!


Hi AbhiV,

Hoping to hear any next step you've tried as this question is very useful.