Pipeline latency

Altera_Forum · ‎06-18-2013

Suppose I would like to know how many cycles deep is my kernel pipeline. Is this information readily available from any .log file generated by AOCL?

Thank you,

Altera_Forum · ‎06-18-2013

Unfortunately no, there is no reporting to show pipeline depths.

Altera_Forum · ‎06-19-2013

So, I should just stick to the report printed on the stdout?

I can always browse through the quartus project, but since it's an automatically generated project from the OpenCL kernel most of it is not 'user'-readable.

Pipeline latency can be an important design parameter to some designs. If a work-item is dispatched for execution each clock cycle, if the pipeline latency is in the same order of magnitude of the number of scheduled work-items it carries a non-negligible effect for the overall FPGA kernel execution throughput, whereas latencies orders of magnitude below the number of work-items scheduled for execution might be neglected.

On a given design I obtain the following report:

Kernel throughput analysis for : decode

.. simd work items : 2

.. compute units : 1

.. throughput due to control flow analysis : 0.48 K work items/second

.. kernel global memory bandwidth analysis : 18390.81 MB/second

.. kernel number of local memory banks : none

+--------------------------------------------------------------------+

; Estimated Resource Usage Summary ;

+----------------------------------------+---------------------------+

; Resource + Usage ;

+----------------------------------------+---------------------------+

; Logic utilization ; 73% ;

; Dedicated logic registers ; 37% ;

; Memory blocks ; 64% ;

; DSP blocks ; 15% ;

+----------------------------------------+---------------------------;

This means that for a large number of work-items I can get a throughput of 480 work-items per second, right? This seems too slow for kernel clocks (found in the quartus projects) kernel_clk and kernel_clk2x of 214.13 and 428.27 MHz. Well, the latter is just two times the former.

It would make sense to provide more 'hardware' metrics, after all AOCL generates circuits.

Altera_Forum · ‎06-19-2013

I use the report to get all the information I need when I'm turning a kernel. The use model of the OpenCL compiler is that you don't need to worry about the pipeline depth because it will already be tuned to ensure that enough work-items are in flight to keep the pipeline full.

That clock that is 2x the kernel clock is used to double pump RAMs to essentially make the dual port RAMs function as quad port RAMs.