Application Acceleration With FPGAs
Programmable Acceleration Cards (PACs), DCP, DLA, Software Stack, and Reference Designs
Intel Support hours are Monday-Fridays, 8am-5pm PST, except Holidays. Thanks to our community members who provide support during our down time or before we get to your questions. We appreciate you!

Need Forum Guidance? Click here
Search our FPGA Knowledge Articles here.
426 Discussions

Concurrent execution of two loops in OpenCL



I'm implementing a kernel in OpenCL which has two loops - Loop_a and Loop_b. Loop_a and Loop_b operations are totally independent of each other and hence can be executed concurrently. The code has been optimized as follows


int loop_limit = max(loop_a_num_iterations, loop_b_num_iterations); for(int i = 0; i < loop_limit; i++) { if(i < loop_a_num_iterations) { // loop_a operation } if(i < loop_b_num_iterations) { // loop_b operation } }


Both these if statements are executed concurrently. loop_a operation has high latency than loop_b operation, but loop_a performs lesser number of iterations than loop_b. For the first loop_b_num_iterations, both loop_a operation and loop_b operation is executed at the same high latency as loop_a. Followed by this is remaining iterations for loop_b operation.

Is there a better way to overlap the execution of two loops?

Thanks in advance

0 Kudos
2 Replies
Valued Contributor II

The best way to overlap the execution of two different blocks of code in single work-item kernels is to put them int two different kernels, create two queues on the host, and queue the kernels concurrently. It is expected that the compiler should implement two independent blocks of code within the same kernel in a parallel fashion anyway. May I ask why you care about the "latency" of the operations? As long as you have a fully pipelined loop with an initiation interval of 1 and your input size (loop trip count) is large enough, the latency of the loop will have negligible effect on performance/run time.



Yes, you would need two different kernel and queues the kernel on the host concurrently.

Also, the initiation interval (II) information can be found at link below: