Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
17267 Discussions

Clock frequency vs. Latency issues for an IIR filter

Altera_Forum
Honored Contributor II
2,034 Views

The LPM arithmetic & logic functions in Quartus have an option to specify the required Output Latency in terms of no. of clock cycles. An obvious choice is to set this value to the least i.e. '1' in order to minimize the propagation delay. 

 

Does this mean that - for each module: the output will be updated after 1 clock cycle from the input, no matter what clock frequency we use

 

Suppose, I have to implement the following control algorithm in my FPGA: 

Vout[k] = C1*( Vin[k] + Vin[k-1] ) + C2*Vstate[k] + Vout[k-1]; 

(it resembles an IIR filter, and involves adders, multipliers, delays ,etc.) 

 

My aim is to reduce the overall input-to-output latency, while the throughput remains more or equal to the sampling frequency. 

 

What clock frequency should I supply to each of these modules ? 

My input-output (ADC-DAC) sampling frequency is 20 MHz & FPGA oscillator clock is 50 Mhz.  

 

I understand that to generate Vx[k-1] from Vx[k], we need to delay Vx[k] by the exactly 1 sampling instant (1 / 20Mhz = 50ns). So delays would necessarily be supplied with this 20 Mhz sampling clock.  

 

But what about adders, multipliers, etc. Can I not use a pll_clock = 100 Mhz (Tpll = 10ns) for these, and finish the entire equation within 10ns*5steps = 50ns ?
0 Kudos
6 Replies
Altera_Forum
Honored Contributor II
858 Views

Hi,  

 

" Does this mean that - for each module: the output will be updated after 1 clock cycle from the input, no matter what clock frequency we use ?" 

 

Quite so, that's what it means.  

 

"What clock frequency should I supply to each of these modules ?" 

Everything should be run at the DAC/ADC clock. 

 

"But what about adders, multipliers, etc. Can I not use a pll_clock = 100 Mhz (Tpll = 10ns) for these, and finish the entire equation within 10ns*5steps = 50ns ?"  

This is a unnecessary complication. Assuming you're working with 16 bit( wide signals, you should be do do all that in a single 50ns cycle. 

 

What you need is to disable the registers in the several LPM modules and register only the input signals from the ADC and the output signals to the DAC. 

Hell, with a 20 MHz clock you might even be able to skip one of those registers.
0 Kudos
Altera_Forum
Honored Contributor II
858 Views

@rbugalho: Wow ! Thanks for such a quick reply. But some doubts do lurk in my mind. 

 

 

--- Quote Start ---  

Assuming you're working with 16 bit ( wide signals, you should be do do all that in a single 50ns cycle. 

--- Quote End ---  

Really !!! (Yes I am working with 16-bit data) You mean to say that my entire algorithm can get executed within a single 50 ns time-unit, and my input-to-output latency will be just 1 sampling period. This is really alluring, I need to try this out. 

 

 

--- Quote Start ---  

What you need is to disable the registers in the several LPM modules and register only the input signals from the ADC and the output signals to the DAC. 

--- Quote End ---  

I understand what you mean here, in terms of pure Verilog coding. But how do I do this in Quartus (since I am using LPM symbols in a Quartus BDF file) 

 

When I do the "Open Design File" option on each LPM module, it takes me to the corresponding Verilog code. but i already see no registers here, all inputs outputs & intermediate sub-wires - are (by default) of 'wire' data type. 

 

Do I check the "no" answer to "do you want to pipeline the function?" option inside every LPM module ?
0 Kudos
Altera_Forum
Honored Contributor II
858 Views

Hi,  

if you register both the inputs and outputs, you have a 2 cycle latency. This will work for sure at 20MHz. Actually, I did a quick implementation of that equation on a Cyclone III, with all the factors being registers, and the fMax is 93MHz. 

 

To achieve a single cycle latency, one has to drop either the input or output registers. And then one has to do a more careful analysis, including ADC tco or DAC tsu/th and board delays. But my guess is that yeah, it's probably doable at 20 MHz. 

 

Quite honestly, I never use BDF, so I'm totally unfamiliar with it. But yes, it should be related to pipelining options or registered input/output options.
0 Kudos
Altera_Forum
Honored Contributor II
858 Views

That's great ... thanks again rbugalho. 

 

 

--- Quote Start ---  

if you register both the inputs and outputs, you have a 2 cycle latency. This will work for sure at 20MHz. 

--- Quote End ---  

Earlier I was pipelining each module to 1 clock-cycle latency, and was getting an overall input-to-output latency of about 8 clock-cycles. ( I have a few more things apart from that equation). So this new value of latency (2) is a significant achievement ! 

 

 

--- Quote Start ---  

Actually, I did a quick implementation of that equation on a Cyclone III, with all the factors being registers, and the fMax is 93MHz. 

--- Quote End ---  

I appreciate you taking that effort for me. And making me aware that even such an analysis can be done. Will surely learn it in the future. Right now I need to get working with my controller. 

 

 

--- Quote Start ---  

Quite honestly, I never use BDF, so I'm totally unfamiliar with it. But yes, it should be related to pipelining options or registered input/output options. 

--- Quote End ---  

The reason I was using a .BDF file, because I wanted to use Megafunctions instead of simple Verilog constructs. I hear that they are well optimized. ( ref: "using megafunction vs just verilog expression" http://www.alteraforum.com/forum/showthread.php?t=22128 (http://http://www.alteraforum.com/forum/showthread.php?t=22128) ) 

 

And yes, when I did remove the "pipelining" option from those Megafunctions - the clock input simply vanished. So I guess, this is what I require ! 

 

Moreover, since I do not require pipelining now, I can use simple constructs like A*B, and do away with the need for Megafunctions. Which translates to the ability to code everything in Verilog, and not use a BDF file. 

 

Wow ! You solved so many of my complications in so less time. 

Cheers :-)
0 Kudos
Altera_Forum
Honored Contributor II
858 Views

 

--- Quote Start ---  

Moreover, since I do not require pipelining now, I can use simple constructs like A*B, and do away with the need for Megafunctions. Which translates to the ability to code everything in Verilog, and not use a BDF file. 

--- Quote End ---  

 

Yes, that's the way most people write FPGA code. Of course pipelining may be required for complex arithmetic expressions or with a fast system clock. But except for dividers, you'll rarely need it inside an atomic arithmetic operation. So you still can control it in your HDL code. For the time being, I won't think about pipelining, as long as the timing analysis succeeds.
0 Kudos
Altera_Forum
Honored Contributor II
858 Views

Dear makon,  

you're quite welcome. 

 

A few final words of advice. 

 

First, learning how to perform static timing analysis is quite important to know that your design can actually meet your requirements, ie, that it can  

actually run at the frequency you require. 

http://www.altera.com/literature/manual/mnl_timequest_cookbook.pdf 

 

Second, you can also instance megafunctions from Verilog/VHDL code. You don't have to resort to BDF for that -- that's how people like me get by without BDF. 

 

Finally, the synthesis tools can recognize many specific functions from Verilog/VHDL code and infer the usage of optimized functions. 

http://www.altera.com/literature/hb/qts/qts_qii51007.pdf 

 

That includes integer adders, multipliers -- including pipelining -- and a lot of other things. 

Of course, there'll also be cases where that's not true and you need to instance specific megafunctions.
0 Kudos
Reply