Re: For those who care about hardware resources with the Altera FFT IP core

Altera_Forum · ‎02-13-2015

Dear all,

Apparently, there are quite a few users of the Altera FFT IP core on this forum. The hardware resources used by the FFT can be very high (especially if the FFT length is high), so for those who really care about the amount of resources, it's possible to reduce the memory usage, at the expense of a more complicate design (and possibly a small increase of the logic and DSP blocks, it depends on the FFT options considered).

The details can be found here : http://www.eetimes.com/author.asp?section_id=36&doc_id=1325667, where a pdf explaining all this in details and a zip file with example projects are available.

In summary, by applying the principle of the radix-2 FFT (separating a signal in even and odd samples) and using an Altera FFT of size N/2 instead of N, it is possible to reduce the memory by about 30/40 % keeping the same processing time. This is a bit strange, but it works; a real design has been done to verify this.

For the designer that wants to use this, it complicates the things because the proposed design requires additional adders, a multiplier, a complex exponential generator (this can be done with an NCO or a pre-initialized memory ), a "small" memory, and blocks that perform scaling. But if in some cases it can save dozens or hundreds of memory blocks, it can worth it.

I hope this can help some of you.

Jérôme

Altera_Forum · ‎02-13-2015

Thanks for the information and the link.

I wonder why Altera wouldn't incorporate this modification into their ip to save design time. Is there any other penalty such as bit true performance compared to altera fft

Altera_Forum · ‎02-13-2015

We contacted the Altera support a while ago to indicate this potential improvement, but the person who answered did not show any interest.

When you say bit true performance, you mean the error compared to a "true" FFT ?

In the design implemented, the error on the FFT output was slightly higher if the same number of bits was used in output (16 bits here, obtained by truncating the output). But without truncating the output (which has 34 bits, because the FFT output that is 16 bits is multiplied by an complex exponential that is also 16 bits, and there are adders), the error on the FFT output was slightly lower. You can see this in Figure 9 of the pdf article (http://infoscience.epfl.ch/record/204540/files/implementing%20super-efficient%20ffts%20in%20altera%20fpgas.pdf).

Having 34 bits instead of 16 bits can be or not annoying, depending on the process after the FFT (for example if there is a detection just after, it will not be a problem). But this is the design we implemented, there are probably smarter ways to do it that could improve the accuracy.

Otherwise, I don't see any additional penalty with the proposed implementation. I did not try to evaluate the maximum running frequency, but I don't think it would decrease since only simple elements are added, as you can see in the actual implementation shown in Figure 8 of the pdf article.

Jérôme

Altera_Forum · ‎02-13-2015

By bit true I mean comparing altera fft with the modified fft each same input and output resolution.

For example input width of 16 bits and output width of 16 bits.

Would both outputs be identical? if so I will be surprised if it is not adopted by Altera or Xilinx.

Altera_Forum · ‎02-13-2015

No, using the same input and output resolution, the outputs signals are not identical (this is shown in Figure 9 (b) and (d) of the pdf article).

Altera_Forum · ‎02-13-2015

I see.

looks like there is accuracy issue, may be the resolution is not well defined for precision frequency domain analysis but should help some applications.

Altera_Forum · ‎02-13-2015

I think for sure there are ways to optimize more, and more analysis could indicate how to improve the accuracy.

For example, in the implementation I did, I make a simple truncation by taking the 16 MSBs of the 34 bits. And looking back to the example available in the zip file, in the proposed implementation I see that few MSBs of the output are not used (the maximum output value measured is 11 % the maximum possible). So, there are smarter approach, I did the simplest one.

Altera_Forum · ‎02-13-2015

You have to remember, Altera's goal is to sell chips.

Without a clear benefit to sale of massive quantities of more chips, altera are unlikely to change anything.

If you can prove that you would sell many V series devices IF they changed their FFT implementation, they'd probably change it.

But for just some minor feedback, you're unlikely to get anything done.