Does Intel OpenCL on CPU require consecutive memory accesses of neighboring threads for vectorization?

OpenCL* for CPU

Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.

Does Intel OpenCL on CPU require consecutive memory accesses of neighboring threads for vectorization?

384 Views

Hello everyone,

does Intel OpenCL on CPU require consecutive memory accesses of neighboring threads (=in same work group) for vectorization?

I have an hashing-based OpenCL kernel that has mandatory non-consecutive memory accesses (the threads use a calculated hash-value as an memory index, the hashing makes it unpredictable). So far, I'm always getting reported a

"Kernel <kernel_name> was not vectorized"

in the OpenCL build log. I suspect that this is due adjacent threads not accessing consecutive memory addresses. Is that correct? Or can I motivate the Intel OpenCL platform to generate gather/scatter (or intermittent scalar loops) instructions?

A clarification on whether the Intel OpenCL platform can handle this kind of memory access pattern in general would be greatly appreciated.

Link Copied

0 Replies

Community support is provided during standard business hours (Monday to Friday 7AM - 5PM PST). Other contact methods are available here.

Intel does not verify all solutions, including but not limited to any file transfers that may appear in this community. Accordingly, Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

For more complete information about compiler optimizations, see our Optimization Notice.