In my opencl design I have multiple kernel which are executed one after the other. what i am seeing is , although let's say second kernel uses results of first kernel and waits for first kernel to finish, time difference between first kernel ending and second kernel starting is very significant. Can any tell what could be reason for that. I am using clclWaitForEvents so that second kernel can start execution after first kernel ends.
I have attach for example.
It depends on how long the gap is. There is certainly a kernel launch overheard. Moreover, when profiling, profile results are dumped to the disk between kernel executions which will further increase the gap. If you are saving the profiling results to a network-attached storage, the gap will get even larger.
I am definitely profiling results, so by network attach storage does it mean that it storing of data to profile.mon file ? Also so if I do not profile and use "clGetEventProfilingInfo(event1,CL_PROFILING_COMMAND_START,sizeof(time_start),&time_start,NULL);
clGetEventProfilingInfo(event1,CL_PROFILING_COMMAND_END,sizeof(time_end),&time_end,NULL); " to get time( time_end - time_starts) for two kernel and adding two t1 and t2 should give total runtime ?
>so by network attach storage does it mean that it storing of data to profile.mon file
Yes, if you are saving that file to a network-attached storage or a slow hard disk, it will increase the gap between kernel executions. This issue is mentioned in Intel's documents.
And yes, extracting start and end time of each kernel from its associated event and summing the run time of each will give you total run time of kernel executions, but that will not include the gap between the kernel executions. You can also use gettimeofday or clock_gettime functions on Linux to measure total run time including the gap from the host. Something like this:
You can subtract the run time of each kernel execution measured with clGetEventProfilingInfo from the above value to get the length of the gap between kernel executions.