Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16557 Discussions

CL_FLUSH is acting as Blocking call.

Altera_Forum
Honored Contributor II
1,023 Views

Hi All, 

 

 

I'm using non-blocking data transfer in my program to transfer 4K resolution(3840x2160) frame data. If clEnqueueWriteBuffer is blocking(CL_TRUE), it is consuming close to 3.5 millisec and if make it as non-blocking(CL_FALSE), it is just taking neglible(~104 micro seconds) time. But the time of CL_FLUSH is increasing from 4 micro seconds to 3.3 milli seconds. 

 

 

Below is the snippet of my code: 

 

 

err = clEnqueueWriteBuffer(commandQueue[0], srcBuffer, CL_FALSE, 0, 3840 * 2160 , inputBuffer, 0, NULL, NULL); 

if (CL_SUCCESS != err) { 

printf("Error in clEnqueueWriteBuffer srcBuffer %d\n", err); 

exit(-1); 

/* setKernelArg */ 

... 

 

 

err = clEnqueueTask(commandQueue[0], kernel, 0, NULL, &kernel_event[0]); 

if (CL_SUCCESS != err) { 

printf("Error in clEnqueueTask kernel %d\n\n", err); 

exit(-1); 

clFlush(commandQueue[0]); 

 

 

I am not able to understabd why CL_FLUSH is blocking for 3.5 millisec. 

 

 

Thanks in advance
0 Kudos
3 Replies
Altera_Forum
Honored Contributor II
304 Views

clFlush() blocks the host until all commands in the queue are issued (but not necessarily finished). Since you are issuing both the clEnqueueWriteBuffer command and the clEnqueueTask command into the same queue, and hence, the clEnqueueTask command CANNOT be issued before clEnqueueWriteBuffer has been COMPLETED, clFlush() effectively blocks the host until clEnqueueWriteBuffer has completed and clEnqueueTask has been issued which will take at least as much time as the blocking clEnqueueWriteBuffer call. 

 

What are you trying to achieve by using a non-blocking clEnqueueWriteBuffer call? The kernel will never start before this call has finished anyway.
0 Kudos
Altera_Forum
Honored Contributor II
304 Views

Thanks for the reply. I am trying to do is just initiate the transfer and kernel execution time in the background, so that I can use the same time to process my other functions instead of waiting for the results from device. That’s the reason I am using CL_FALSE. Can you please suggest any better way to accomplish this, as I don’t want it be blocking for data transfer or kernel execution.

0 Kudos
Altera_Forum
Honored Contributor II
304 Views

In that case set memory transfer to non-blocking as you have already done and enqueue the kernel, but use an event on the kernel enqueue and avoid using clFlush/clFinish after that. Then, continue with your other processing in the host and whenever it was required, "wait" on the event from the kernel enqueue (clWaitForEvents) to make sure the kernel has finished processing and then transfer the output data from the FPGA to the host.

0 Kudos
Reply