I am experiencing strange behaviour when measuring the execution time of an OpenCL kernel. The kernel expects three buffers as input. I create those buffers in the host code and initialize them by using CL_MEM_COPY_HOST_PTR. I then measure the kernel execution time via OpenCL events. However, when I omit CL_MEM_COPY_HOST_PTR, the kernel execution time drops to a third.
So far I discovered that this problem has something to do with optimizations done by the OpenCL compiler. It looks like the compiler notices that the buffers are not getting initialized and optimizes the kernel accordingly. If I supply the flag "-cl-opt-disable" there is no difference in execution time between initializing and not initializing the buffers. But disabling all optimizations is obviously not what I intend to do.
Is there a way to stop the compiler from noticing that the buffers have not been initialized without disabling all optimizations? Writing just one byte into the buffer didn't do the trick unfortunately.
Thanks in advance!
Sorry for the delayed reply. Could you tell us more about the system you're seeing this behavior on?
These will be a great start, but it will also help if you can provide a short reproducer.