FPGA Intellectual Property
PCI Express*, Networking and Connectivity, Memory Interfaces, DSP IP, and Video IP
6359 Discussions

Adder-Multiplier -> FIR IP trade-offs

Altera_Forum
Honored Contributor II
1,344 Views

Hi, I am a few months in to building a data acquisition system on a DE1 board (for prototyping) and have written a FIR filter using VHDL. I have noticed that Altera provides the megafunctions for FIR filters at about $4000 per license. I have also noticed that my filter gobbles up quite a bit of adder-multipliers...(I need to filter signals for a neuroscience application so they range from 1Hz to 6kHz). Ideally I would like to have 32 parallel and re-loadable filters in this project. I would prefer to obviously maximize the sharpness of the filter, which will drive the number of taps up at the cost of space on the cyclone II.  

Does anyone know what would be the most cost effective way to cut down on adder-multipliers...? would the altera IP help, or are there more slick alternatives to managing resources without breaking the bank? I would really appreciate help with this :D
0 Kudos
11 Replies
Altera_Forum
Honored Contributor II
287 Views

 

--- Quote Start ---  

Hi, I am a few months in to building a data acquisition system on a DE1 board (for prototyping) and have written a FIR filter using VHDL. I have noticed that Altera provides the megafunctions for FIR filters at about $4000 per license. I have also noticed that my filter gobbles up quite a bit of adder-multipliers...(I need to filter signals for a neuroscience application so they range from 1Hz to 6kHz). Ideally I would like to have 32 parallel and re-loadable filters in this project. I would prefer to obviously maximize the sharpness of the filter, which will drive the number of taps up at the cost of space on the cyclone II.  

Does anyone know what would be the most cost effective way to cut down on adder-multipliers...? would the altera IP help, or are there more slick alternatives to managing resources without breaking the bank? I would really appreciate help with this :D 

--- Quote End ---  

 

 

There are plenty of plans & tricks. 

1) consider IIR 

2) consider averaging filter 

3) resource reuse if your clock is much faster than signal rate 

4) exploit symmetry,power of 2, multiple additions, and so on. 

 

It depends and you need to describe your filtering in more details
0 Kudos
Altera_Forum
Honored Contributor II
287 Views

Hi Kaz, nice to meet you and thank you for your insight (ill look into those avenues further). The sampling rate should be around 32k, my clock is 50MHz, I had previously written a simple moving average (attached) based on (file:///C:/Users/Owner/Downloads/06_MVD_%20FIR_Design%20(12).pdf), with the idea that i could simply load coefficients in for the impulse response. Originally i wanted to use a microprocessor to calculate a filter coefficients automatically using Parks-Mcclellan, but that appears to be a long way off from where I'm at currently(still pretty green). 

 

 

https://www.alteraforum.com/forum/attachment.php?attachmentid=13113
0 Kudos
Altera_Forum
Honored Contributor II
287 Views

 

--- Quote Start ---  

Hi Kaz, nice to meet you and thank you for your insight (ill look into those avenues further). The sampling rate should be around 32k, my clock is 50MHz, I had previously written a simple moving average (attached) based on (file:///C:/Users/Owner/Downloads/06_MVD_%20FIR_Design%20(12).pdf), with the idea that i could simply load coefficients in for the impulse response. Originally i wanted to use a microprocessor to calculate a filter coefficients automatically using Parks-Mcclellan, but that appears to be a long way off from where I'm at currently(still pretty green). 

 

 

http://www.alteraforum.com/forum/attachment.php?attachmentid=13113&stc=1  

--- Quote End ---  

 

 

Hi, 

 

A clock rate of 50MHz and signal rate of 32K means a ratio of 1562. So you can run 1562 taps on single multiplier as multiply accumulate-reset. 

Isn't that enough for your filtering?
0 Kudos
Altera_Forum
Honored Contributor II
287 Views

Ahhh ya gotcha...cool, so if i understand you right: your saying that i just fetch a coefficient on every clock cycle multiply it, accumulate the result, and recycle the routine through the set of taps all before the next data sample arrives for convolution? 1562 taps would give me a great cutoff as well!

0 Kudos
Altera_Forum
Honored Contributor II
287 Views

indeed. 

But you are also targeting 32 parallel filters. If you are dealing with 32 channels then the above factor becomes 48. you can keep it 1562 if you use 48 parallel filters. Still you are lucky
0 Kudos
Altera_Forum
Honored Contributor II
287 Views

Ya i was thinking parallel filters would be best for me, at least it would be less complicated. Thanks again

0 Kudos
Altera_Forum
Honored Contributor II
287 Views

Hi Kaz please have a look at the two files that i have uploaded. I made a finite state machine which when driven clocks through the convolution process, which as you mentioned seems to cut down the adder multipliers (in this case to 3 with ~130 taps). However it appears that the number of registers goes very high. Is there something i am not getting conceptually or is this roughly how you would have coded it? I think Altera uses block ram in their megafunction to possibly offset some of these troubles. I inferred block ram before but have not yet used it for this project... any recommendations?

0 Kudos
Altera_Forum
Honored Contributor II
287 Views

 

--- Quote Start ---  

 

any recommendations? 

 

--- Quote End ---  

 

Yes, read about MACC filters - that is what you are designing. 

 

The Xilinx Virtex-4 Xtreme DSP guide has a pretty good description. The Microsemi DSP guide (for the SmartFusion2/Igloo2) also goes over this architecture. I don't recall if I ever found a good Altera example. The Stratix Cookbook might have an example. The trick is to use RAMs for the coefficient storage and input samples. You can also exploit symmetry, eg., even symmetry of taps can be exploited using a preadder. 

 

That should give you enough hints. If you get stuck, ask for more help. 

 

Cheers, 

Dave
0 Kudos
Altera_Forum
Honored Contributor II
287 Views

 

--- Quote Start ---  

Hi Kaz please have a look at the two files that i have uploaded. I made a finite state machine which when driven clocks through the convolution process, which as you mentioned seems to cut down the adder multipliers (in this case to 3 with ~130 taps). However it appears that the number of registers goes very high. Is there something i am not getting conceptually or is this roughly how you would have coded it? I think Altera uses block ram in their megafunction to possibly offset some of these troubles. I inferred block ram before but have not yet used it for this project... any recommendations? 

--- Quote End ---  

 

 

with 3 (say 16 bit) mults inside dsp blocks plus accumulator (32 bits) plus control and one ram I expect few hundred registers to be used. You better use ram/rom block ip
0 Kudos
Altera_Forum
Honored Contributor II
287 Views

Thank you for your guidance, will give it a try tonight:D

0 Kudos
Altera_Forum
Honored Contributor II
287 Views

Thanks Dave!, ill have a read thru, looks like some great links...and suggestions 

Cheers 

H
0 Kudos
Reply