- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi,

Today I tried to use single work-item kernel. I have a nested loop. In Loop Report, I found my outer loop not pipelined due to:*loop iteration ordering: iterations may get out of order with respect to the inner loop,*

*as the number of iterations of the inner loop may be different for different iterations of this loop.*I understood this problem.

**for different outer iterations of outer loop, actually i need different number of iterations of inner loop.**And in "

*out-of-order loop iterations*" section of the

*best practices guide*, I found an example, it is just similar to my code:

```
__kernel void order( __global unsigned* restrict input,
__global unsigned* restrict output, int N ) {
unsigned sum=0;
for (unsigned i = 0; i < N; i++) {
for (unsigned j = 0; j < i; j++)
sum += input;
}
output = sum;
}
```

But no solution is mentioned here. How can I pipeline the loop? Or how to deal with this problem? If I use multiple kernels, will it work?
Link Copied

3 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Sorry I just think about multiple kernels... Maybe it will solve this problem, is it right?

Thanks in advance.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

You can pipeline the loop like this:

```
__kernel void order( __global unsigned* restrict input,
__global unsigned* restrict output, int N ) {
unsigned sum=0;
for (unsigned i = 0; i < N; i++) {
for (unsigned j = 0; j < N; j++)
if (j < i) sum += input;
}
output = sum;
}
```

However, since in this case both of the loops will run N times, depending on N, this code could actually be slower than the original case due to redundant computation. For such unpipelineable loops, it is actually preferred to use NDRange kernels.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

--- Quote Start --- You can pipeline the loop like this:

```
__kernel void order( __global unsigned* restrict input,
__global unsigned* restrict output, int N ) {
unsigned sum=0;
for (unsigned i = 0; i < N; i++) {
for (unsigned j = 0; j < N; j++)
if (j < i) sum += input;
}
output = sum;
}
```

However, since in this case both of the loops will run N times, depending on N, this code could actually be slower than the original case due to redundant computation. For such unpipelineable loops, it is actually preferred to use NDRange kernels. --- Quote End --- Thanks very much. My code is more complex then it is hard to make the same number of inner iterations... Yes, it is actually preferred to use NDRange kernels...
Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page