Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
Announcements
Intel Support hours are Monday-Fridays, 8am-5pm PST, except Holidays. Thanks to our community members who provide support during our down time or before we get to your questions. We appreciate you!

Need Forum Guidance? Click here
Search our FPGA Knowledge Articles here.
15481 Discussions

Different Kernel Executions times

Altera_Forum
Honored Contributor II
843 Views

Hi! 

 

I'm getting a strange thing when executing a kernel multiples times (inside a loop). The first execution always take a long time compared to the others.. 

 

Example: 

Number of calls: 10 

1st call: 3,2 seconds 

other calls: around 0,013 seconds 

 

The buffers and its size are always the same.. 

 

What can be happening?
0 Kudos
7 Replies
Altera_Forum
Honored Contributor II
94 Views

Please post the section of your host code that measures the kernel execution times.

Altera_Forum
Honored Contributor II
94 Views

 

--- Quote Start ---  

Please post the section of your host code that measures the kernel execution times. 

--- Quote End ---  

 

 

Sorry HRZ, here it is: 

#include "timing.h"# include <Windows.h> double get_wall_time(){ LARGE_INTEGER time,freq; if (!QueryPerformanceFrequency(&freq)){ // Handle error return 0; } if (!QueryPerformanceCounter(&time)){ // Handle error return 0; } return (double)time.QuadPart / freq.QuadPart; } ----------------------------------------------------------------------------- runKernerl(...){ /* Set Kernel Arguments */ for(i=0; i < num_arguments;i++) status = clSetKernelArg(kernel, i, sizeof(cl_mem), &buffer); /* Run kernel the kernel */ status = clEnqueueTask(cmdqueue,kernel,0,NULL,NULL); checkError(status, "Failed to launch kernel"); /* Wait for command queue to complete pending events */ status = clFinish(cmdqueue); /* Read the device output buffer to the host output array */ checkError(status, "Failed to finish"); } ----------------------------------------------------------------------------- ini_kernel_bi = get_wall_time(); runKernel(context, cluster_kernel, cmd_queue, 6, 0, NULL, buffers, NULL , NULL); end_kernel_bi = get_wall_time(); printf("Time:%f", end_kernel_bi - ini_kernel_bi);
Altera_Forum
Honored Contributor II
94 Views

Try moving clSetKernelArg and checkError outside of the timing region and only time clEnqueueTask and clFinish. 

 

You can also use OpenCL's built-in profiler that allows you to accurately measure kernel execution time, and see if you would still see any variance in the run time.
Altera_Forum
Honored Contributor II
94 Views

 

--- Quote Start ---  

Try moving clSetKernelArg and checkError outside of the timing region and only time clEnqueueTask and clFinish. 

 

You can also use OpenCL's built-in profiler that allows you to accurately measure kernel execution time, and see if you would still see any variance in the run time. 

--- Quote End ---  

 

 

Do you have any profiling tool to MS VS2012? or any reliable function to measure the executions times, i said this because im not confident about the function i encounter to measure the times.
Altera_Forum
Honored Contributor II
94 Views

The function you are using is a high-precision timer. I personally also use the same function on Windows. This function provides accurate time measurement up to a few microseconds or maybe even less. 

 

The documentation for OpenCL's built-in profiler is here: 

 

https://www.khronos.org/registry/opencl/sdk/1.0/docs/man/xhtml/clgeteventprofilinginfo.html
Altera_Forum
Honored Contributor II
94 Views

Interesting, did you figure out the problem? Which board are you using? Try to use the profiler to check the actual kernel runtime. 

 

It's actually quite common in GPU programming, usually we use a warm-up kernel to get around the power-saving status so we can measure the correct runtime,  

haven't encounter this on my Arria10 FPGA though.
Altera_Forum
Honored Contributor II
94 Views

 

--- Quote Start ---  

It's actually quite common in GPU programming, usually we use a warm-up kernel to get around the power-saving status so we can measure the correct runtime,  

haven't encounter this on my Arria10 FPGA though. 

--- Quote End ---  

 

 

GPUs usually run a low-clock when idle to save power, that is why a warm-up run is required to force the GPU out of idle mode to get correct timing. However, this does not apply to FPGAs and I have certainly never encountered such behavior either.
Reply