- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The LPM arithmetic & logic functions in Quartus have an option to specify the required Output Latency in terms of no. of clock cycles. An obvious choice is to set this value to the least i.e. '1' in order to minimize the propagation delay.
Does this mean that - for each module: the output will be updated after 1 clock cycle from the input, no matter what clock frequency we use ? Suppose, I have to implement the following control algorithm in my FPGA: Vout[k] = C1*( Vin[k] + Vin[k-1] ) + C2*Vstate[k] + Vout[k-1]; (it resembles an IIR filter, and involves adders, multipliers, delays ,etc.) My aim is to reduce the overall input-to-output latency, while the throughput remains more or equal to the sampling frequency. What clock frequency should I supply to each of these modules ? My input-output (ADC-DAC) sampling frequency is 20 MHz & FPGA oscillator clock is 50 Mhz. I understand that to generate Vx[k-1] from Vx[k], we need to delay Vx[k] by the exactly 1 sampling instant (1 / 20Mhz = 50ns). So delays would necessarily be supplied with this 20 Mhz sampling clock. But what about adders, multipliers, etc. Can I not use a pll_clock = 100 Mhz (Tpll = 10ns) for these, and finish the entire equation within 10ns*5steps = 50ns ?Link Copied
6 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
" Does this mean that - for each module: the output will be updated after 1 clock cycle from the input, no matter what clock frequency we use ?" Quite so, that's what it means. "What clock frequency should I supply to each of these modules ?" Everything should be run at the DAC/ADC clock. "But what about adders, multipliers, etc. Can I not use a pll_clock = 100 Mhz (Tpll = 10ns) for these, and finish the entire equation within 10ns*5steps = 50ns ?" This is a unnecessary complication. Assuming you're working with 16 bit( wide signals, you should be do do all that in a single 50ns cycle. What you need is to disable the registers in the several LPM modules and register only the input signals from the ADC and the output signals to the DAC. Hell, with a 20 MHz clock you might even be able to skip one of those registers.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@rbugalho: Wow ! Thanks for such a quick reply. But some doubts do lurk in my mind.
--- Quote Start --- Assuming you're working with 16 bit ( wide signals, you should be do do all that in a single 50ns cycle. --- Quote End --- Really !!! (Yes I am working with 16-bit data) You mean to say that my entire algorithm can get executed within a single 50 ns time-unit, and my input-to-output latency will be just 1 sampling period. This is really alluring, I need to try this out. --- Quote Start --- What you need is to disable the registers in the several LPM modules and register only the input signals from the ADC and the output signals to the DAC. --- Quote End --- I understand what you mean here, in terms of pure Verilog coding. But how do I do this in Quartus (since I am using LPM symbols in a Quartus BDF file) When I do the "Open Design File" option on each LPM module, it takes me to the corresponding Verilog code. but i already see no registers here, all inputs outputs & intermediate sub-wires - are (by default) of 'wire' data type. Do I check the "no" answer to "do you want to pipeline the function?" option inside every LPM module ?- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
if you register both the inputs and outputs, you have a 2 cycle latency. This will work for sure at 20MHz. Actually, I did a quick implementation of that equation on a Cyclone III, with all the factors being registers, and the fMax is 93MHz. To achieve a single cycle latency, one has to drop either the input or output registers. And then one has to do a more careful analysis, including ADC tco or DAC tsu/th and board delays. But my guess is that yeah, it's probably doable at 20 MHz. Quite honestly, I never use BDF, so I'm totally unfamiliar with it. But yes, it should be related to pipelining options or registered input/output options.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That's great ... thanks again rbugalho.
--- Quote Start --- if you register both the inputs and outputs, you have a 2 cycle latency. This will work for sure at 20MHz. --- Quote End --- Earlier I was pipelining each module to 1 clock-cycle latency, and was getting an overall input-to-output latency of about 8 clock-cycles. ( I have a few more things apart from that equation). So this new value of latency (2) is a significant achievement ! --- Quote Start --- Actually, I did a quick implementation of that equation on a Cyclone III, with all the factors being registers, and the fMax is 93MHz. --- Quote End --- I appreciate you taking that effort for me. And making me aware that even such an analysis can be done. Will surely learn it in the future. Right now I need to get working with my controller. --- Quote Start --- Quite honestly, I never use BDF, so I'm totally unfamiliar with it. But yes, it should be related to pipelining options or registered input/output options. --- Quote End --- The reason I was using a .BDF file, because I wanted to use Megafunctions instead of simple Verilog constructs. I hear that they are well optimized. ( ref: "using megafunction vs just verilog expression" http://www.alteraforum.com/forum/showthread.php?t=22128 (http://http://www.alteraforum.com/forum/showthread.php?t=22128) ) And yes, when I did remove the "pipelining" option from those Megafunctions - the clock input simply vanished. So I guess, this is what I require ! Moreover, since I do not require pipelining now, I can use simple constructs like A*B, and do away with the need for Megafunctions. Which translates to the ability to code everything in Verilog, and not use a BDF file. Wow ! You solved so many of my complications in so less time. Cheers :-)- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- Moreover, since I do not require pipelining now, I can use simple constructs like A*B, and do away with the need for Megafunctions. Which translates to the ability to code everything in Verilog, and not use a BDF file. --- Quote End --- Yes, that's the way most people write FPGA code. Of course pipelining may be required for complex arithmetic expressions or with a fast system clock. But except for dividers, you'll rarely need it inside an atomic arithmetic operation. So you still can control it in your HDL code. For the time being, I won't think about pipelining, as long as the timing analysis succeeds.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear makon,
you're quite welcome. A few final words of advice. First, learning how to perform static timing analysis is quite important to know that your design can actually meet your requirements, ie, that it can actually run at the frequency you require. http://www.altera.com/literature/manual/mnl_timequest_cookbook.pdf Second, you can also instance megafunctions from Verilog/VHDL code. You don't have to resort to BDF for that -- that's how people like me get by without BDF. Finally, the synthesis tools can recognize many specific functions from Verilog/VHDL code and infer the usage of optimized functions. http://www.altera.com/literature/hb/qts/qts_qii51007.pdf That includes integer adders, multipliers -- including pipelining -- and a lot of other things. Of course, there'll also be cases where that's not true and you need to instance specific megafunctions.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page