One way which I have thought is to get the time before and after the NDRangeKernel statement. I might have to use events so as to ensure the execution of the kernel has completed. But I still feel that the loop will start the kernels sequentially. Can someone tell me if this is the right way to start concurrent kernels..?
Are you talking about Nvidia GPU?
clEnqueueNDRangeKernel only enques the kernelto the command queue. You are enqueueing the commands sequentially in the loop.Then you'd run your commands (including your kernels) on theGPU by calling clFlush which is not a blocking call. But I am not too sure how you can make sure your kernels are running concurrently. I'd be curious to know too.
After muchgoogling around,I got to know that there is something called Device fission by which we can create subdeviceson the single device and then execute different kernels on those subdevices. I am still reading on it.
Ya am talking about the NVIDIA GPU.
Kindly comment if anyone has more idea on device fission.
I am not sure ifGPUs, particularly NVidia,support device fission.