Possible Bug with clWaitForEvents

Altera_Forum · ‎09-25-2017

Hi,

I'm trying to do performance testing on FPGA, and I'm using some timing in order to calculate the GFlops of a kernel. In order to make sure I'm waiting for the kernel completion, I use clWaitForEvents, on an event which is being created before enqueuing the kernel. Here is the piece of code I use to enqueue the kernel:

 Event evKernel (algorithm->getKernelName ());
 err = clEnqueueNDRangeKernel (queue, kernel, algorithm->getWorkDim(),
 NULL,
 globalWorkSize,
 localMemSize,
 0, NULL, &evKernel.CLEvent());
 CL_CHECK_ERROR (err);
 err = clWaitForEvents (1, &evKernel.CLEvent());
 evKernel.FillTimingInfo ();

evKernel is a wrapper aroud clEvent object, and the filltiminginfo function works as below:

void Event::FillTimingInfo(const int idx)
{
 int sidx, eidx;
 if (idx == ALL_EVENTS) {
 sidx = 0; eidx = count-1;
 } else
 sidx = eidx = idx;
 for (int i=sidx ; i<=eidx ; ++i) {
 cl_int err;
 err = clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_QUEUED,
 sizeof(cl_ulong), &queuedTime, NULL);
 CL_CHECK_ERROR(err);
 err = clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_SUBMIT,
 sizeof(cl_ulong), &submitTime, NULL);
 CL_CHECK_ERROR(err);
 err = clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_START,
 sizeof(cl_ulong), &startTime, NULL);
 CL_CHECK_ERROR(err);
 err = clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_END,
 sizeof(cl_ulong), &endTime, NULL);
 CL_CHECK_ERROR(err);
 }
}

As you can see, I'm using the clgeteventprofilinginfo in order to get the correct numbers.

For a long time, using this method was giving me non-reasonable performance numbers, which was going above the theoretical performance cap of the FPGA (1.5TFlops). So I got suspicious about the way we do timing and decided to do my own timing, by timestamping before clenqueuendrangekernel,and after the clwaitforevents. When I do that and calculate the difference, it's around 1 milliseconds, while the evkernel givesMuch higher value. As a result, I feel like there is something wrong with event management with Altera OpenCL host API. And I think two issues:

1) clWaitForEvents does not really wait for kernel completion.

2) clGetEventProfilingInfo, does not necessarily reflect the correct value.

Any thought on this? Just for your information, I was working on this for a long time and always wondering something wrong with my GFlops computation. but it turns out the problem is with timing.

Thanks,

Saman

Altera_Forum · ‎09-27-2017

I had previously used clGetEventProfilingInfo and compared reported timing with that of a high-precision timer running on the host. The reported values matched. However, I used clFinish() in that case.

I have also used clWaitForEvents to synchronize kernels running in separate queues, seems to work fine for that purpose.

You can try replacing clWaitForEvents() with clFinish() and see if you get different results.

Note that there is one common mistake when using OpenCL's standard profiler. If you call a kernel in a for loop, and you do not use a separate event for each kernel call, the event will get overwritten again and again and the timing value you will read in the end will only reflect the last kernel execution.