- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am bit a bit confuse about this topic as in the "aocl_programming_guide.pdf" (2016.10.31), section "1.6.4.4 Restrictions in the Implementation of Intel FPGA SDK for OpenCL
Channels Extension", it says: --- Quote Start --- Because you can only assign a single call site per channel ID, you cannot unroll loops containing channels. ... --- Quote End --- However, in the "aocl-best-practices-guide.pdf", section "1.6.1.3 Simplifying Loop-Carried Dependency", the optimized example contains in line 18, an unroll pragma on a for-loop containing a channel call:
12 ...
13 for (unsigned i = 0; i < N; i++) {
14
15 // Ensure that we have enough space if we read from ALL channels
16 if (num_bytes <= (8-NUM_CH)) {
17 # pragma unroll
18 for (unsigned j = 0; j < NUM_CH; j++) {
19 bool valid = false;
20 uchar data_in = read_channel_nb_altera(CH_DATA_IN, &valid);
21 if (valid) {
22 storage <<= 8;
23 storage |= data_in;
24 num_bytes++;
25 }
26 }
27 }
28 ...
Which according to the correspoding report is successfully fully unrolled:
==================================================================================
Kernel: optimized
==================================================================================
The kernel is compiled for single work-item execution.
Loop Report:
+ Loop "Block1" (file optimized3.cl line 13)
| Pipelined well. Successive iterations are launched every cycle.
|
|
|-+ Fully unrolled loop (file optimized3.cl line 18)
Loop was fully unrolled due to "#pragma unroll" annotation.
Perhaps loops containing NON-blocking channel calls are not a problem for loop-unrolling?
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The example you are looking at does NOT create multiple call sites per channel ID since it unrolls the channel ID, and not the data being read from it. In fact, in that example, if the loop is NOT unrolled you will get a compilation failure due to variable channel ID. This has nothing to do with whether the channel call is blocking or non-blocking.
For the sake of clarification, the following is allowed and valid:#pragma unroll
for (int i = 0; i < N; i++)
{
data_in = read_channel_altera(CH_DATA_IN);
}
But this is not: #pragma unroll
for (int i = 0; i < N; i++)
{
data_in = read_channel_altera(CH_DATA_IN);
}
Note that the newer versions of the compiler (17+) seem to also support multiple call sites per channel, so the second example might now work (I haven't checked), but it will NOT work with older versions (v16.1.2 and below).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Understood.
Many thanks, HRZ!
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page