Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16611 Discussions

Unroll loops containing channels

Altera_Forum
Honored Contributor II
1,550 Views

I am bit a bit confuse about this topic as in the "aocl_programming_guide.pdf" (2016.10.31), section "1.6.4.4 Restrictions in the Implementation of Intel FPGA SDK for OpenCL 

Channels Extension", it says: 

 

 

--- Quote Start ---  

 

Because you can only assign a single call site per channel ID, you cannot unroll loops containing channels. ... 

 

--- Quote End ---  

 

 

However, in the "aocl-best-practices-guide.pdf", section "1.6.1.3 Simplifying Loop-Carried Dependency", the optimized example contains in line 18, an unroll pragma on a for-loop containing a channel call: 

 

12 ... 13 for (unsigned i = 0; i < N; i++) { 14 15 // Ensure that we have enough space if we read from ALL channels 16 if (num_bytes <= (8-NUM_CH)) { 17 # pragma unroll 18 for (unsigned j = 0; j < NUM_CH; j++) { 19 bool valid = false; 20 uchar data_in = read_channel_nb_altera(CH_DATA_IN, &valid); 21 if (valid) { 22 storage <<= 8; 23 storage |= data_in; 24 num_bytes++; 25 } 26 } 27 } 28 ...  

 

Which according to the correspoding report is successfully fully unrolled: 

 

================================================================================== Kernel: optimized ================================================================================== The kernel is compiled for single work-item execution. Loop Report: + Loop "Block1" (file optimized3.cl line 13) | Pipelined well. Successive iterations are launched every cycle. | | |-+ Fully unrolled loop (file optimized3.cl line 18) Loop was fully unrolled due to "#pragma unroll" annotation.  

 

Perhaps loops containing NON-blocking channel calls are not a problem for loop-unrolling?
0 Kudos
2 Replies
Altera_Forum
Honored Contributor II
514 Views

The example you are looking at does NOT create multiple call sites per channel ID since it unrolls the channel ID, and not the data being read from it. In fact, in that example, if the loop is NOT unrolled you will get a compilation failure due to variable channel ID. This has nothing to do with whether the channel call is blocking or non-blocking. 

 

For the sake of clarification, the following is allowed and valid: 

 

#pragma unroll for (int i = 0; i < N; i++) { data_in = read_channel_altera(CH_DATA_IN); } 

 

But this is not: 

#pragma unroll for (int i = 0; i < N; i++) { data_in = read_channel_altera(CH_DATA_IN); } 

 

Note that the newer versions of the compiler (17+) seem to also support multiple call sites per channel, so the second example might now work (I haven't checked), but it will NOT work with older versions (v16.1.2 and below).
0 Kudos
Altera_Forum
Honored Contributor II
514 Views

Understood. 

Many thanks, HRZ!
0 Kudos
Reply