Enqueuing multiple Writes (Host to FPGA DMA transfer) to the same kernel - OpenCL

Altera_Forum · ‎07-20-2016

Hi,

I am currently trying to en-queue multiple writes to the same kernel by invoking clEnqueueWriteBuffer (pointing to the same buffer). I would like to know what sequence should I follow such that I can read the total result altogether in one go. For example I have a kernel which loops back the result. After I launch this kernel I'd like to send 4 writes and read back result of 4 writes in one clEnqueueReadBuffer. I used clEnqueueNDRangeKernel but didn't work.

Thanks in Advance

Altera_Forum · ‎07-26-2016

--- Quote Start ---

Hi,

I am currently trying to en-queue multiple writes to the same kernel by invoking clEnqueueWriteBuffer (pointing to the same buffer). I would like to know what sequence should I follow such that I can read the total result altogether in one go. For example I have a kernel which loops back the result. After I launch this kernel I'd like to send 4 writes and read back result of 4 writes in one clEnqueueReadBuffer. I used clEnqueueNDRangeKernel but didn't work.

Thanks in Advance

--- Quote End ---

I've personally never tried this myself, but there are inherently some problems with communicating this way. How many command queues do you have? If you're using the same command queue to launch the kernel that you are to read and write you'll run into some issues. The command queue will only execute one command at a time and it will do it to completion. For example, if you start execution on a command queue and then use that same command queue for a clEnqueueReadBuffer right after, OpenCL will stall/block at the clEnqueueReadBuffer until the above launched kernel has completed execution.

You can make use of this by simply removing the loop from your device code into the host code. Maybe something like this:

Host Code:

for(int i=0; i<SOME_ITERATIONS; i++) {

clEnqueueWriteBuffer.....

clEnqueueNDRangeKernel(or task kernel or whatever)...

clEnqueueReadBuffer...

}

Hope this helps.

Altera_Forum · ‎08-09-2016

I might not understand your question, but you can do asynchronous writes (non-blocking), and get event handles to each write.

Then you can have a read-call which takes the four events as arguments so essentially the read waits on the four writes to finish before launching.