Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
15674 Discussions

Using Profiling to measure the Kernel Execution Time

JBorr6
Novice
422 Views

Dear All,

I am using clGetEventProfilingInfo of OpenCL to measure the Kernel Execution time of the IFFT implementation. I tried getting the difference of QUEUED to END time and the START to END time using the code snippet below.

status = clEnqueueTask(queue, fft_kernel, 0, NULL, &timing_event);
checkError(status, "Failed to launch kernel");

status = clFinish(queue);
checkError(status, "failed to finish");

status= clGetEventProfilingInfo(timing_event, CL_PROFILING_COMMAND_START, sizeof(cl_ulong), &time_start, NULL);
status= clGetEventProfilingInfo(timing_event, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &time_end, NULL);

status= clGetEventProfilingInfo(timing_event, CL_PROFILING_COMMAND_QUEUED, sizeof(cl_ulong), &time_queued, NULL);

total_time += time_end - time_start;
queued_time += time_end - time_queued;

 

For the total_time, I got around 20us result, while on the queued_time, I have around 90us result. Which means that the QUEUED to START experience more latency than START to END. Can someone explain to me what causes this delay? 

I implemented this on Quartus 19.1 using a DE10_pro FPGA board.

Thank you in advance,

Justine

 

0 Kudos
1 Solution
HRZ
Valued Contributor II
399 Views

"CL_PROFILING_COMMAND_QUEUED" returns the time at which the command was added to the queue. There is no guarantee that the command is executed as soon as it is added to the queue since there might be some other commands already in the queue (e.g. non-blocking data transfers or clSetKernelArg() commands that were queued before the kernel launch). At the same time, even if there are no other commands in the queue, there is some queuing/launch overhead associated with all OpenCL commands, including transfer of metadata via PCI-E. General consensus is that if your kernel execution time is less than at least some hundreds of us, the command queue/launch overhead could very well become the performance bottleneck and there is probably no point in accelerating such application on a PCI-E-attached accelerator.

View solution in original post

2 Replies
HRZ
Valued Contributor II
400 Views

"CL_PROFILING_COMMAND_QUEUED" returns the time at which the command was added to the queue. There is no guarantee that the command is executed as soon as it is added to the queue since there might be some other commands already in the queue (e.g. non-blocking data transfers or clSetKernelArg() commands that were queued before the kernel launch). At the same time, even if there are no other commands in the queue, there is some queuing/launch overhead associated with all OpenCL commands, including transfer of metadata via PCI-E. General consensus is that if your kernel execution time is less than at least some hundreds of us, the command queue/launch overhead could very well become the performance bottleneck and there is probably no point in accelerating such application on a PCI-E-attached accelerator.

AnilErinch_A_Intel
351 Views

Hi ,

Please let us know whether the solution mentioned in the discussion was adequate and any further support is required on this issue.

Thanks and Regards

Anil


Reply