Concurrent execution of two loops in OpenCL

PRavi7 · ‎12-13-2019

Hi,

I'm implementing a kernel in OpenCL which has two loops - Loop_a and Loop_b. Loop_a and Loop_b operations are totally independent of each other and hence can be executed concurrently. The code has been optimized as follows

int loop_limit = max(loop_a_num_iterations, loop_b_num_iterations);
for(int i = 0; i < loop_limit; i++)
{
    if(i < loop_a_num_iterations)
    {
        // loop_a operation
    }
    if(i < loop_b_num_iterations)
    {
        // loop_b operation
    }
}

Both these if statements are executed concurrently. loop_a operation has high latency than loop_b operation, but loop_a performs lesser number of iterations than loop_b. For the first loop_b_num_iterations, both loop_a operation and loop_b operation is executed at the same high latency as loop_a. Followed by this is remaining iterations for loop_b operation.

Is there a better way to overlap the execution of two loops?

Thanks in advance

HRZ · ‎12-14-2019

The best way to overlap the execution of two different blocks of code in single work-item kernels is to put them int two different kernels, create two queues on the host, and queue the kernels concurrently. It is expected that the compiler should implement two independent blocks of code within the same kernel in a parallel fashion anyway. May I ask why you care about the "latency" of the operations? As long as you have a fully pipelined loop with an initiation interval of 1 and your input size (loop trip count) is large enough, the latency of the loop will have negligible effect on performance/run time.

MEIYAN_L_Intel · ‎12-16-2019

Hi,

Yes, you would need two different kernel and queues the kernel on the host concurrently.

Also, the initiation interval (II) information can be found at link below:

https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/opencl-sdk/aocl_programming_guide.pdf

Thanks