I got some confusions on kernel executing timeline.I created an in-order commandqueue and 2 sets of kernels which should be executing one after another.After enqueued these kernels, I kept on doing some other work on CPU, which is irrelevant to GPU side work. After 0.33ms, I called the API cl_finish() to get GPU's output data. it cost 0.29ms till the API return.Which is wired cause I timed each kernel set, the overall kernels executing time should be within 0.31ms.Based on the time cost on each part, I expect an immediately return on the cl_finish API.It almost seemed like the GPU didn't work in the 0.33ms window and start to work right after the finish API is called. Could you give me some clues on this matter. Thank you.
When you submit work to the GPU there is no guarantee that it will execute right away unless you force execution with clFinish. If you don't call clFinish, it could very well happen that kernels will be sitting in the queue and not executing. So what you observed is completely natural.