- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello.
I have tried to make some OpenCL-related performance optimization for Intel devices. I want to use vectorization and vector data type with optimal lenght for specified device. I called clGetDeviceInfo(.., CL_DEVICE_
uchar - 1 short - 1 int - 1 float - 1
I checked it on GPU Intel HD4600 and CPU Intel Core i5-4570.
I have tried to find the optimal value of the vector length for my problem and got following values:
uchar - 16 short - 8 int - 1 float - 1
If I use uchar16 instead uchar I get x3 acceleration.
I have two question:
1. Why is clGetDeviceInfo(.., CL_DEVICE_PREFERRED_VECTOR_WIDTH, ..) return these values?
2. Is it possible to change these values in future releases? This will make possible to do cross-platform optimization.
Thanks,
Alexander.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Alexander,
You are right - Intel OpenCL devices prefer scalar values as they assume that internal autovectorization will produce better results in most cases. And you are right once more - there are cases where internal autovectorization fails and manual tuning produce better results.
Please check https://software.intel.com/sites/products/documentation/ioclsdk/2013/Intel_SDK_for_OpenCL_Applications_2013_Optimization_Guide.pdf for more info
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Alexander,
You are right - Intel OpenCL devices prefer scalar values as they assume that internal autovectorization will produce better results in most cases. And you are right once more - there are cases where internal autovectorization fails and manual tuning produce better results.
Please check https://software.intel.com/sites/products/documentation/ioclsdk/2013/Intel_SDK_for_OpenCL_Applications_2013_Optimization_Guide.pdf for more info
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dmitry,
Thanks for clarification!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Intel OpenCL devices prefer scalar values as they assume that internal autovectorization
The caveat: compiler does best job when vectorizing for 32 bits types (like int and float). In contrast for char/uchar using the short vectors like uchar4 explicitly might be more performant as it better coalesces the memory accesses (since with uchar4/uchar8/etc you operate on aligned data chunks) and also better amortizes the work-item scheduling costs (since you process multiple pixels simultaneously).
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page