Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- FPGAs and Programmable Solutions
- Intel® Quartus® Prime Software
- How to deal with the Out-of-Order Loop Iterations in single work-item kernel?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Altera_Forum

Honored Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-20-2018
10:59 AM

1,130 Views

How to deal with the Out-of-Order Loop Iterations in single work-item kernel?

Hi,

Today I tried to use single work-item kernel. I have a nested loop. In Loop Report, I found my outer loop not pipelined due to:```
__kernel void order( __global unsigned* restrict input,
__global unsigned* restrict output, int N ) {
unsigned sum=0;
for (unsigned i = 0; i < N; i++) {
for (unsigned j = 0; j < i; j++)
sum += input;
}
output = sum;
}
```

But no solution is mentioned here. How can I pipeline the loop? Or how to deal with this problem? If I use multiple kernels, will it work?
Link Copied

3 Replies

Altera_Forum

Honored Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-20-2018
11:15 AM

25 Views

Sorry I just think about multiple kernels... Maybe it will solve this problem, is it right?

Thanks in advance.
Altera_Forum

Honored Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-20-2018
05:40 PM

25 Views

You can pipeline the loop like this:

```
__kernel void order( __global unsigned* restrict input,
__global unsigned* restrict output, int N ) {
unsigned sum=0;
for (unsigned i = 0; i < N; i++) {
for (unsigned j = 0; j < N; j++)
if (j < i) sum += input;
}
output = sum;
}
```

However, since in this case both of the loops will run N times, depending on N, this code could actually be slower than the original case due to redundant computation. For such unpipelineable loops, it is actually preferred to use NDRange kernels.
Altera_Forum

Honored Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-21-2018
12:55 AM

25 Views

--- Quote Start --- You can pipeline the loop like this:

```
__kernel void order( __global unsigned* restrict input,
__global unsigned* restrict output, int N ) {
unsigned sum=0;
for (unsigned i = 0; i < N; i++) {
for (unsigned j = 0; j < N; j++)
if (j < i) sum += input;
}
output = sum;
}
```

However, since in this case both of the loops will run N times, depending on N, this code could actually be slower than the original case due to redundant computation. For such unpipelineable loops, it is actually preferred to use NDRange kernels. --- Quote End --- Thanks very much. My code is more complex then it is hard to make the same number of inner iterations... Yes, it is actually preferred to use NDRange kernels...
For more complete information about compiler optimizations, see our Optimization Notice.