Multiple same kernel call behavior

Altera_Forum · ‎07-17-2013

What is the expected behavior when the same kernel is called multiple times on the FPGA, where the input is the output buffer as well.

Suppose I have a vector increment kernel that I call K consecutive times. Moreover, the kernel call launches only a single work-group of some dimension N. What is the behavior of the FPGA board?

A - Does it run each kernel fully pipelined, i.e. the will the first work-item of the (i+1)-th call be pipelined with the last work-item of the i-th call?

B - Will the i-th call completely finish before the (i+1)-th call start?

This case is trivial, I can always add K to the vector instead of calling K times the increment kernel. But suppose the FFT case, where I'm confronted with unrolling all the stages in the same kernel, thus calling several times barrier(CLK_LOCAL_MEM_FENCE) which reduces the kernel performance, or calling several radix-n kernels. If the hypothesis B holds, then the former strategy might be better, but if A holds then the latter should deliver a greater performance.

Which one is expectable?

Altera_Forum · ‎07-17-2013

Scenerio B is expected.