Does Intel OpenCL on CPU require consecutive memory accesses of neighboring threads for vectorization?
does Intel OpenCL on CPU require consecutive memory accesses of neighboring threads (=in same work group) for vectorization?
I have an hashing-based OpenCL kernel that has mandatory non-consecutive memory accesses (the threads use a calculated hash-value as an memory index, the hashing makes it unpredictable). So far, I'm always getting reported a
"Kernel <kernel_name> was not vectorized"
in the OpenCL build log. I suspect that this is due adjacent threads not accessing consecutive memory addresses. Is that correct? Or can I motivate the Intel OpenCL platform to generate gather/scatter (or intermittent scalar loops) instructions?
A clarification on whether the Intel OpenCL platform can handle this kind of memory access pattern in general would be greatly appreciated.