Kernel Launch Overhead

Altera_Forum · ‎01-06-2015

Hi everyone,

I am currently trying to measure and minimize the kernel launch overhead, because the kernels that I wrote have to be launched repeatedly.

I am wondering which of the following way is the most accurate way of measuring kernel launch overhead:

1. Use the Kernel Execution tab of the profiler report and measure the "blank space" between each kernel launch.

2. Call clGetEventProfilingInfo() in host and calculate CL_PROFILING_COMMAND_START - CL_PROFILING_COMMAND_QUEUED

3. Call clock_gettime() in host to get wall-clock time between enquening kernel launches and clFinish, and then subtract the time of CL_PROFILING_COMMAND_START - CL_PROFILING_COMMAND_END from it.

Also, I am wondering if launch overhead dependents on how the kernel is written? In some case, I applied SIMD during kernel compilation, and profile report seems to indicate that while the kernel execution time is reduced, the launch overhead is also increased, which offsets the increase in performance that I expected to get.

Altera_Forum · ‎01-07-2015

I think it's more correct in your second step to measure CL_PROFILING_COMMAND_START - CL_PROFILING_COMMAND_SUBMIT. Unless you immediately call clFlush on the command queue after an enqueue, the state transition from queued to submit may be postponed until deemed necessary by the runtime.

I think you'll find that the overhead is minuscule if the kernels are in the same aocx file. I prefer measuring all these times with one or more cl_event's and their queued, submit, start, end states.