- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Suppose I would like to know how many cycles deep is my kernel pipeline. Is this information readily available from any .log file generated by AOCL?
Thank you,Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Unfortunately no, there is no reporting to show pipeline depths.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So, I should just stick to the report printed on the stdout?
I can always browse through the quartus project, but since it's an automatically generated project from the OpenCL kernel most of it is not 'user'-readable. Pipeline latency can be an important design parameter to some designs. If a work-item is dispatched for execution each clock cycle, if the pipeline latency is in the same order of magnitude of the number of scheduled work-items it carries a non-negligible effect for the overall FPGA kernel execution throughput, whereas latencies orders of magnitude below the number of work-items scheduled for execution might be neglected. On a given design I obtain the following report: Kernel throughput analysis for : decode .. simd work items : 2 .. compute units : 1 .. throughput due to control flow analysis : 0.48 K work items/second .. kernel global memory bandwidth analysis : 18390.81 MB/second .. kernel number of local memory banks : none +--------------------------------------------------------------------+ ; Estimated Resource Usage Summary ; +----------------------------------------+---------------------------+ ; Resource + Usage ; +----------------------------------------+---------------------------+ ; Logic utilization ; 73% ; ; Dedicated logic registers ; 37% ; ; Memory blocks ; 64% ; ; DSP blocks ; 15% ; +----------------------------------------+---------------------------; This means that for a large number of work-items I can get a throughput of 480 work-items per second, right? This seems too slow for kernel clocks (found in the quartus projects) kernel_clk and kernel_clk2x of 214.13 and 428.27 MHz. Well, the latter is just two times the former. It would make sense to provide more 'hardware' metrics, after all AOCL generates circuits.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I use the report to get all the information I need when I'm turning a kernel. The use model of the OpenCL compiler is that you don't need to worry about the pipeline depth because it will already be tuned to ensure that enough work-items are in flight to keep the pipeline full.
That clock that is 2x the kernel clock is used to double pump RAMs to essentially make the dual port RAMs function as quad port RAMs.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page