OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1663 Discussions

CL_DEVICE_PREFERRED_VECTOR_WIDTH for Intel devices

Alexander_Karsakov
253 Views

Hello.

I have tried to make some OpenCL-related performance optimization for Intel devices. I want to use vectorization and vector data type with optimal lenght for specified device. I called clGetDeviceInfo(.., CL_DEVICE_PREFERRED_VECTOR_WIDTH, ..) method, but it returns not really optimal values: 

uchar - 1
short - 1
int - 1
float - 1

I checked it on GPU Intel HD4600 and CPU Intel Core i5-4570.

I have tried to find the optimal value of the vector length for my problem and got following values:

uchar - 16
short - 8
int - 1
float - 1

If I use uchar16 instead uchar I get x3 acceleration.

I have two question:

1. Why is clGetDeviceInfo(.., CL_DEVICE_PREFERRED_VECTOR_WIDTH, ..) return these values?

2. Is it possible to change these values in future releases? This will make possible to do cross-platform optimization.

Thanks,

Alexander.

0 Kudos
1 Solution
Dmitry_K_Intel
Employee
253 Views

Hi Alexander,

You are right - Intel OpenCL devices prefer scalar values as they assume that internal autovectorization will produce better results in most cases. And you are right once more - there are cases where internal autovectorization fails and manual tuning produce better results.

Please check https://software.intel.com/sites/products/documentation/ioclsdk/2013/Intel_SDK_for_OpenCL_Applicatio... for more info

 

View solution in original post

3 Replies
Dmitry_K_Intel
Employee
254 Views

Hi Alexander,

You are right - Intel OpenCL devices prefer scalar values as they assume that internal autovectorization will produce better results in most cases. And you are right once more - there are cases where internal autovectorization fails and manual tuning produce better results.

Please check https://software.intel.com/sites/products/documentation/ioclsdk/2013/Intel_SDK_for_OpenCL_Applicatio... for more info

 

Alexander_Karsakov
253 Views

Hi Dmitry,

Thanks for clarification!

Maxim_S_Intel
Employee
253 Views

Intel OpenCL devices prefer scalar values as they assume that internal autovectorization

The caveat: compiler does best job when vectorizing for 32 bits types (like int and float). In contrast for char/uchar using the short vectors like uchar4 explicitly might be more performant as it  better coalesces the memory accesses (since with uchar4/uchar8/etc you operate on aligned data chunks) and also better amortizes the work-item scheduling costs (since you process multiple pixels simultaneously).

Reply