HiI want to perform 2d fft on images, right now I'm trying to modify the 2dFFT from Altera design example though it's very restricted, can only perform fft for input size of power of 2, and only for square images (I guess with some modification it can process rectangle images), I have to do zero-padding to make it work. Other options I've considered are clfft library or the FFT IP Core provided by Altera, both I have no idea how to integrate with OpenCL for Altera. clfft targets GPU, I guess without source code there's no way it would work for Altera FPGA. FFT IP Core is HDL, I'm not sure if it's possible to incorporate it with OpenCL generated code. Any suggestion on building or using FFT on FPGA/OpenCL? Thanks!
Most common algorithms for computing the fourier transform:1) Cooley–Tukey FFT 2) Rader's FFT 3) Bluestein's FFT 4) Computing DFT directly (slow since removing 'Fast' from FFT) The Cooley–Tukey algorithm I believe is widely used for computing the FFT for powers of 2 due to it's efficiency and complexity ( for the order of O(N log N) ). Using the size N as a power of two enables the use of the twiddle factors. The design examples uses this algorithm to my knowledge. Many implementations I believe have gone into optimizing the FFT by decomposing it into two parts (radix-2 form) to allow for several tricks such as the butterfly and bit-reversal. Other optimized implementations would be Rader's FFT for prime sizes or similarly Bluestein's FFT algorithm for arbitrary sizes ( O(N log N) for prime sizes otherwise it's slower in terms of complexity ). Zero-padding to the next power of two may be your best bet for optimal performance for the FFT. The design example supports quite a large size FFT if I remember correctly. The design example could be further optimized so that it is more tuned to your target size. The clfft library I'm assuming uses some flavor of the FFTs above (but targeted for GPU, so probably not optimal on the FPGA). The second dimension is added in by computing two FFTs, one along the rows and another along the columns. The design example reuses the 1D FFT by computing the FFT along one dimension, performing a non-Hermitian transpose, FFT along the same dimension, and then performing another non-Hermitian transpose thus giving it the same effect. The FFT IP Core could possibly be used as an Altera OpenCL Library which allows the use of integrating HDL code into OpenCL but I'm not too familiar with that process or its complexity for implementing.
Thank you fand.I'm using their design example now, but the performance is not comparable to GPU. My 1060 can process 1024*1024 complex to complex in 0.6 ms while Arria 10 takes about 3ms, exact same data. Altera mentioned here (radar processing: fpgas or gpus? (https://www.altera.com/en_us/pdfs/literature/wp/wp-01197-radar-fpga-or-gpu.pdf)) it's possible to do a 7-core FFT using more resources, but I can't find document on how they did that. Anyone have any idea? thanks