Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
Announcements
Intel Support hours are Monday-Fridays, 8am-5pm PST, except Holidays. Thanks to our community members who provide support during our down time or before we get to your questions. We appreciate you!

Need Forum Guidance? Click here
Search our FPGA Knowledge Articles here.
15465 Discussions

clflush or any openCL API is taking more time once kernel execution is completed.

Altera_Forum
Honored Contributor II
807 Views

In function1(), we are launching kernel, but we are not waiting for kernel completion. After kernel launch, some processing is happening on CPU which takes around 300msec. While processing on CPU we have triggered for the kernel execution status along with timer. It is showing status as running for few iterations and time taken by clFlush is negligible. Once kernel execution is completed clFlush is consuming 6711 micro sec(Instead of clfinish, if we have clenqueuereadbuffer it is also consuming the same time). Why is it consuming more time once the execution is completed? Is there any alternate method to reduce the time? 

 

void function1() 

 

 

printf("In function-1\n"); 

... 

err = clEnqueueTask(commandQueue[0], kernel, 0, NULL, &kernel_event[0]); 

if (CL_SUCCESS != err) { 

printf("Error in clEnqueueTask kernel-1 %d\n\n", err); 

exit(-1); 

void function2() 

 

 

printf("In function-2\n"); 

 

struct timeval start_timer, end_timer; 

gettimeofday(&start_timer, NULL); 

 

err = clEnqueueReadBuffer(commandQueue[0], dstOut1, CL_TRUE, 0, 120 * 23 * sizeof(cl_short2), output1, 0, NULL, NULL); 

if (CL_SUCCESS != err) { 

printf("Error in clEnqueueReadBuffer dstOut1 %d\n", err); 

exit(-1); 

gettimeofday(&end_timer, NULL); 

time_taken = ((end_timer.tv_sec * 1000000 + end_timer.tv_usec) - (start_timer.tv_sec * 1000000 + start_timer.tv_usec)); 

printf("Time taken by clEnqueueReadBuffer-1 %ld\n", time_taken); 

 

 

gettimeofday(&start_timer, NULL); 

 

err = clEnqueueReadBuffer(commandQueue[0], dstOut2, CL_TRUE, 0, 120 * 23 * sizeof(cl_short2), output2, 0, NULL, NULL); 

if (CL_SUCCESS != err) { 

printf("Error in clEnqueueReadBuffer dstOut2 %d\n", err); 

exit(-1); 

 

gettimeofday(&end_timer, NULL); 

time_taken = ((end_timer.tv_sec * 1000000 + end_timer.tv_usec) - (start_timer.tv_sec * 1000000 + start_timer.tv_usec)); 

printf("Time taken by clEnqueueReadBuffer-2 %ld\n", time_taken); 

 

 

//launching another kernel 

err = clEnqueueTask(commandQueue[0], kernel, 0, NULL, &kernel_event[0]); 

if (CL_SUCCESS != err) { 

printf("Error in clEnqueueTask kernel-1 %d\n\n", err); 

exit(-1); 

 

 

 

 

int main() 

 

 

for(int i = 0; i < 100; i++) 

if(i == 0) 

function1(); 

else 

function2(); 

 

 

 

/* processing on cpu */ 

 

 

for(int id = 0; id < 1000; id++) 

 

 

/* processing on cpu */ 

 

 

 

struct timeval begin_cq, end_cq; 

gettimeofday(&begin_cq, NULL); 

 

cl_int res, status; 

res = clGetEventInfo(kernel_event[0], CL_EVENT_COMMAND_EXECUTION_STATUS, sizeof(cl_int), &status, NULL); 

switch (status) 

case CL_QUEUED: 

printf("Execution Status: Queued\n"); 

break; 

case CL_SUBMITTED: 

printf("Execution Status: Submitted\n"); 

break; 

case CL_RUNNING: 

printf("Execution Status: Running\n"); 

break; 

case CL_COMPLETE: 

printf("Execution Status: Completed\n"); 

break; 

default: 

printf("Execution Status: Error (%d)\n", status); 

break; 

clFlush(commandQueue[0]); 

 

gettimeofday(&end_cq, NULL); 

long time_taken_cq = ((end_cq.tv_sec * 1000000 + end_cq.tv_usec) - (begin_cq.tv_sec * 1000000 + begin_cq.tv_usec)); 

printf("Time taken by clFlush %ld micro sec\n", time_taken_cq); 

 

 

}//for(id) 

 

}//for(i) 

 

 

return 0; 

 

 

 

 

 

 

Output: 

In function-1 

Execution Status: Running 

Time taken by clFlush 5 micro sec 

Execution Status: Running 

Time taken by clFlush 4 micro sec 

Execution Status: Running 

Time taken by clFlush 4 micro sec 

Execution Status: Running 

Time taken by clFlush 4 micro sec 

Execution Status: Running 

Time taken by clFlush 4 micro sec 

Execution Status: Running 

Time taken by clFlush 4 micro sec 

Execution Status: Running 

Time taken by clFlush 4 micro sec 

Execution Status: Completed 

Time taken by clFlush 6711 micro sec 

Execution Status: Completed 

Time taken by clFlush 1 micro sec 

Execution Status: Completed 

Time taken by clFlush 1 micro sec 

Execution Status: Completed 

Time taken by clFlush 2 micro sec 

Execution Status: Completed 

Time taken by clFlush 2 micro sec 

Execution Status: Completed 

Time taken by clFlush 2 micro sec 

Execution Status: Completed 

Time taken by clFlush 1 micro sec 

Execution Status: Completed 

Time taken by clFlush 2 micro sec 

In function-2 

 

 

Thanks, in advance
0 Kudos
2 Replies
Altera_Forum
Honored Contributor II
98 Views

Hi All, 

Would like to know if anybody has faced similar kind of issue, while doing host pipelining to cover the FPGA time wrt Host processing. Any help from Altera would be of great help. 

Thanks in Advance.
Altera_Forum
Honored Contributor II
98 Views

 

--- Quote Start ---  

Hi All, 

Would like to know if anybody has faced similar kind of issue, while doing host pipelining to cover the FPGA time wrt Host processing. Any help from Altera would be of great help. 

Thanks in Advance. 

--- Quote End ---  

 

 

I am afraid this forum is not monitored by Altera. You can go to Altera.com, creat an account and open a service request.
Reply