Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
17267 Discussions

I want to calculate my kernel ideal execution time with fmax.

Altera_Forum
Honored Contributor II
1,331 Views

This is my acl_quartus_report.txt 

 

 

///////////////////////////////////////////////////////// 

ALUTs: 7794 

Registers: 9,641 

Logic utilization: 5,368 / 32,070 ( 17 % ) ( 16 % ) 

I/O pins: 103 / 457 ( 23 % ) 

DSP blocks: 0 / 87 ( 0 % ) 

Memory bits: 348,224 / 4,065,280 ( 9 % ) 

M10K blocks: 63 / 397 ( 16 % ) 

Actual clock freq: 135.639999807 

Kernel fmax: 135.64 

1x clock fmax: 135.64 

2x clock fmax: 10000 

Highest non-global fanout: 2723 

////////////////////////////////////////////////////////// 

 

 

1) 

I can see that Kernel fmax is 135.64. 

Does this mean that fmax is 135.64MHz??? 

 

 

2) 

my kernel make 350036 elements and work. 

if I caculate kernel ideal execution time except memory load/write delay. 

That is 1/135.64MHz * 350056 = 2.58ms??? 

 

 

 

 

Thanks,
0 Kudos
3 Replies
Altera_Forum
Honored Contributor II
598 Views

The fmax is the highest frequency the design could be run at. The actual used frequency depends on your hardware, on the oscillator you are using and on your pll settings, if you use plls. 

I don't understand "my kernel make 350036 elements and work.". If your design uses 350036 clock cycles to compute a result, then yes the execution time will be 1/f*350036.
0 Kudos
Altera_Forum
Honored Contributor II
598 Views

2) 

 

If you mean that you run the kernel on 350056 work-items, then to know the ideal execution time you would need to know how many work-items come out of the pipeline per clock. It might not necessarily be one depending on the kernel structure and instructions, in fact it will most probably be lower. I don't have any report on hand right now, but I believe with the profiler you are able to get that information. 

 

Also, you would need to know the generated pipeline depth and calculate the time it takes for the first work-unit to get processed. 

 

All in all, maybe something like this would do: 

 

1/135.64MHz * 350056 / clockcycles_per_work-unit + pipeline_processing_time 

0 Kudos
Altera_Forum
Honored Contributor II
598 Views

I am grateful for your replys. 

Yes, 350056 means 350036 work-items. 

 

And I don't know my pipeline depth of my kernel. 

 

Can you teach me how do I know pipeline depth??? 

 

thanks.
0 Kudos
Reply