In Code Builder we give 4 execution times:
1. Total - measured by calculating the host time (machine time) before the clEnqueueNDRangeKernel and after the clFinish.
2. Queue - measured by querying the clEvent object we get from clEnqueueNDRangeKernel for CL_PROFILING_COMMAND_QUEUED, CL_PROFILING_COMMAND_SUBMIT and calculating the difference
3. Submit - measured by querying the clEvent object we get from clEnqueueNDRangeKernel for CL_PROFILING_COMMAND_SUBMIT, CL_PROFILING_COMMAND_START and calculating the difference
4. Run - measured by querying the clEvent object we get from clEnqueueNDRangeKernel for CL_PROFILING_COMMAND_START, CL_PROFILING_COMMAND_END and calculating the difference
we do that for every iteration and for every configuration we run and then calculate the average, median, minimum, maximum and standard deviation.
I don't think the time measuring method is documented anywhere, do you think we should do it?
Hi ARIK Z.,
Thank you for your answer, in fact what I wanted to know was an idea about the hardware/software mechanism by which the time is measured and retrieved.
I've read the whole documentation about Code Builder and I think it could be a good idea to explain briefly how the measurements are done. Because yes we have time measurements but we don't know how they are done by Intel's implementation.