====================================================|-+ Loop "Block43" (file conv.cl line 162) | Pipelined with successive iterations launched every cycle. | | Iterations executed serially across the regions listed below. | Only a single loop iteration will execute inside the listed regions. | This will cause performance degradation unless the regions are pipelined well | (can process an iteration every cycle). | | Loop "Block44" (file conv.cl line 163) | due to: | Memory dependency on Load Operation from: (file conv.cl line 188) | Store Operation (file conv.cl line 190) | Store Operation (file conv.cl line 190) | Load Operation (file conv.cl line 205) | | Loop "Block44" (file conv.cl line 163) | due to: | Memory dependency on Store Operation from: (file conv.cl line 190) | Store Operation (file conv.cl line 190) | | |-+ Loop "Block44" (file conv.cl line 163) | Pipelined with successive iterations launched every 2 cycles due to: | | Pipeline structure: every terminating loop with subloops has iterations launched at least 2 cycles apart. | Having successive iterations launched every two cycles should still lead to good performance | if the inner loops are pipelined well and have sufficiently high number of iterations. =========================================================================== In the optimization report, there are some parts like this which say my blocks will be executed serially (For example block 44). And I assume it means there's no pipelining. But the report also says the block44 will be launched every 2 clock cycles. What does this mean? Any advice would be greatly appreciated!!
This usually happens with nested loops where the outer loop is serialized due to a dependency in the inner loop. In such cases the outer loop is pipelined, but it will be stalled until the inner loop is fully executed. What the initiation interval of two means in this case is that even if the inner loop finishes in one clock, the outer loop will be executed once every two clocks. In every other case, the outer loop is basically serialized. You could assume the two clocks is the "minimum" initiation interval of the outer loop.