- Intel Community
- FPGAs and Programmable Solutions
- Intel® Quartus® Prime Software
How to deal with the Out-of-Order Loop Iterations in single work-item kernel?

Altera_Forum

Honored Contributor I

03-20-2018
10:59 AM

How to deal with the Out-of-Order Loop Iterations in single work-item kernel?

Hi,

Today I tried to use single work-item kernel. I have a nested loop. In Loop Report, I found my outer loop not pipelined due to:```
__kernel void order( __global unsigned* restrict input,
__global unsigned* restrict output, int N ) {
unsigned sum=0;
for (unsigned i = 0; i < N; i++) {
for (unsigned j = 0; j < i; j++)
sum += input;
}
output = sum;
}
```

But no solution is mentioned here. How can I pipeline the loop? Or how to deal with this problem? If I use multiple kernels, will it work?
Altera_Forum

Honored Contributor I

03-20-2018
11:15 AM

Sorry I just think about multiple kernels... Maybe it will solve this problem, is it right?

Thanks in advance.
Altera_Forum

Honored Contributor I

03-20-2018
05:40 PM

You can pipeline the loop like this:

```
__kernel void order( __global unsigned* restrict input,
__global unsigned* restrict output, int N ) {
unsigned sum=0;
for (unsigned i = 0; i < N; i++) {
for (unsigned j = 0; j < N; j++)
if (j < i) sum += input;
}
output = sum;
}
```

However, since in this case both of the loops will run N times, depending on N, this code could actually be slower than the original case due to redundant computation. For such unpipelineable loops, it is actually preferred to use NDRange kernels.
Altera_Forum

Honored Contributor I

03-21-2018
12:55 AM

Thanks very much. My code is more complex then it is hard to make the same number of inner iterations... Yes, it is actually preferred to use NDRange kernels...
