We've recent run across an issue in our embedded product where some of our work items are not being executed completely (the kernel bails out part of the way through but does not give an error). This is happening on our 5th gen NUC i3 with HD5500 graphics. The behavior is consistent but makes no sense. We are running a 2D ND range with offsets (basically a 5x5 filter that skips a 2-pixel border around the image). Our "image" is stored in a buffer not an OpenCL image as we do not need sub-pixel sampling. What we've found is that for certain rows of our data the kernel simply stops executing before it writes the output. We even tried manually setting the output buffer to some fixed value as the last line of the kernel and it does not get called. If we run the same software on a Gen6 NUC we do not get the same behavior. I'm looking for suggestions to debug the issue, it seems the latest version of the intel code builder lacks debugging capabilities and we are unable to configure the visual studio plug-in to debug our kernels. I should also mention that the exact same kernel when run on the CPU device does not exhibit any problems.
Looks like we have this issue in Intel Premier Support, so we can proceed there. In general, reproducer code is always very helpful.
Not to say this is the only way to think about it, but in similar situations in the past I've started with a known good implementation (perhaps like the example "Create kernel with buffers" example code from the code builder wizard as a simplified model and tweaked it closer and closer to the desired memory transfer scheme. Since OpenCL memory management is more complex it's often the calculations of where data is read from or written to that are the issue. If you can write some sort of intermediate result to the output to tell you about these basic I/O address calculations it can often lead to additional insights.