Reading an image using CLK_FILTER_LINEAR interpolation mode produces zeros instead of the correct image values.
Driver: OpenCL Runtime 15.1
Version: 5.0.0.57
OS: Debian Jessie
Language: Python (PyOpenCL)
The following kernel is a minimum test case. It is supposed to copy the input into the output. It works fine under the Beignet GPU driver, but under Intel's CPU driver the output image consists of all zeros. When CLK_FILTER_LINEAR is replaced by CLK_FILTER_NEAREST it works.
The problem persists when using normalized coordinates.
constant sampler_t S = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_NONE | CLK_FILTER_LINEAR; kernel void tube( read_only image2d_t img_in, write_only image2d_t img_out) { int2 xy = {get_global_id(0), get_global_id(1)}; uint4 rgb_; rgb_ = read_imageui(img_in, S, convert_float2(xy) + 0.5f); write_imageui(img_out, xy, rgb_); }
I attach a small test script.
Link Copied
Hi Juan Ignacio,
Sorry for the delay: I crashed my Linux box after I installed a bunch of dependencies required by your python script - will take me some time to reinstall everything and try again.
Could you reproduce the issue? I can make a C example if you are having problems with python.
I wasn't able to reproduce this with Python, so yes, a C example would help.
I talked with the CPU runtime lead and he acknowledged that there are problems with sampler implementation on the CPU, but was not sure whether it was this specific issue or something else, so having a C example would allow me to file a bug and track its fix.
Thanks!
Hi Juan Ignacio,
Yes, I can reproduce it on a CPU device. Thank you for providing such a beautiful reproducer! I will notify the CPU runtime team.
Thanks!
Hi Juan Ignacio,
The member of the CPU team pointed me to the following:
From https://www.khronos.org/registry/cl/specs/opencl-1.2.pdf p.298
The read_image{i|ui} calls support a nearest filter only. The filter_mode specified in sampler must be set to CLK_FILTER_NEAREST; otherwise the values returned are undefined.
CPU is sensitive to such kind of bugs because it is undefined behavior. on CPU it may lead to crash due to Access violation since OS takes care of that. On GPU Access violations, sampler misusage are more tolerant. App is not terminated.
For more complete information about compiler optimizations, see our Optimization Notice.