Iterating EnqueueNDRangeKernel

Kilinc__Gorkem · ‎01-23-2019

Hello,

I am trying to run some simulation on FPGA. Depending on the simplifications I make, I can run the whole simulation on FPGA (simplifying it to a bunch of computations in this case) or I can offload the compute intensive part of the simulation. Since there are multiple agents that run the same computation for each iteration I chose to have NDRange kernel instead of single work item, though I could use single work item kernel with loop. In short, I want to run N computations at a time and the next N computations depend on the result of previous. Important part of the code looks like this:

for(int i = 0; i < iteration; i++){

iteration_start = getCurrentTimestamp();

ret |= clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL, &globalWorkSize, &localWorkSize, 0, NULL, &kernel_execution);

clFinish(command_queue);

clEnqueueReadBuffer(command_queue, memObjects[2], CL_TRUE, 0, agentCount * sizeof(float), position, 0, NULL, NULL);

writeArray<float>(positions, position, agentCount); // Save the result of iteration i

iteration_end = getCurrentTimestamp();

std::cout << i << " " << (iteration_end - iteration_start) * 1e6 << std::endl; // Required time for iteration i in microseconds

}

What I observed after running this code is that required time for iterations increase as i increases. As far as I see there is no stable value that it converges to, it increases unboundedly. Is this behavior normal? What is the explanation?

Kilinc__Gorkem · ‎01-24-2019

To whom who are interested,

I found my mistake and would like to share it with you. The answer lies in page 96 of FPGA SDK Standard Edition Programming Guide (for Quartus 18.1):

"Intel recommends that you release an event object when it is not in use. The SDK keeps an event object live until you explicitly instruct it to release the event object. Keeping an unused event object live causes unnecessary memory usage."

What I previously thought was that whenever I use the same cl_event object it is rewritten with the new information. Maybe that is the case and the time required for this operation accumulates over time. Or maybe cl_event works more like a pointer and it keeps creating new objects with leaving no access to the previous ones.