Showing results for 
Search instead for 
Did you mean: 
Valued Contributor III

Multiple same kernel call behavior

What is the expected behavior when the same kernel is called multiple times on the FPGA, where the input is the output buffer as well. 



Suppose I have a vector increment kernel that I call K consecutive times. Moreover, the kernel call launches only a single work-group of some dimension N. What is the behavior of the FPGA board? 



A - Does it run each kernel fully pipelined, i.e. the will the first work-item of the (i+1)-th call be pipelined with the last work-item of the i-th call? 


B - Will the i-th call completely finish before the (i+1)-th call start? 



This case is trivial, I can always add K to the vector instead of calling K times the increment kernel. But suppose the FFT case, where I'm confronted with unrolling all the stages in the same kernel, thus calling several times barrier(CLK_LOCAL_MEM_FENCE) which reduces the kernel performance, or calling several radix-n kernels. If the hypothesis B holds, then the former strategy might be better, but if A holds then the latter should deliver a greater performance. 


Which one is expectable?
0 Kudos
1 Reply
Valued Contributor III

Re: Multiple same kernel call behavior

Scenerio B is expected.

0 Kudos