I'm having a problem with a few of my kernels on an i3-5010U CPU when running on the GPU device. When I specify an X offset of 2 or greater when queuing a kernel, every 16th row does not get calculated. I have a test project attached in a zip file. Visual Studio 2012 with the Intel OpenCL SDK integration is needed. The extension Image Watch is useful to visualize the problem. This issue is only seen when using the GPU device. I cannot reproduce using HD4400(Gen4) or HD520(Gen6) GPUs.
My configuration is:
- Intel i3-5010U
- Windows 7 Embedded
- Graphics driver version 184.108.40.20654
Any help would be appreciated. I have found a workaround, but my code is correct as far as I can tell and I would like to not put fixes in when I don't understand the cause of the problem.