Nios® V/II Embedded Design Suite (EDS)
Support for Embedded Development Tools, Processors (SoCs and Nios® V/II processor), Embedded Development Suites (EDSs), Boot and Configuration, Operating Systems, C and C++

Fft In Sopc

Altera_Forum
Honored Contributor II
2,899 Views

Guyz please help , if u have an idea. 

 

Uptil now i made a 64 point FFT using megacore and added it to the sopc as a new component but there are a lot of signals and so it gives me an error when i generate it. 

 

I believe i need a wrapper code for my FFT. CAN SUM1 HELP ME OUT WITH IT. 

THANKS 

 

webster_dev at the rate yahoo.com
0 Kudos
39 Replies
Altera_Forum
Honored Contributor II
578 Views

HI Marlon : 

 

i think swapping bytes come from using SGDMA (when it read and write data on memory), So you should swap bytes in input and output of FFT. 

 

i think it is not good way to swap bytes in software (by using pointers). maybe it spends a time from Nios II. 

 

it is better to try swapping bytes in hardware, by swapping the signal in (fft_avalon_wraper.v) file. change the connect of signal to match your swapping. 

 

for example, try to do that in (fft_avalon_wraper.v) file: 

 

wire [15 : 0] temp_input; 

 

assign temp_input[15: 8] = sink_real[7 : 0]; 

assign temp_input[7 : 0] = sink_real[15 : 8]; 

 

temp_input is the input of FFT module. 

[/B][/B]  

you should do the same for output.
0 Kudos
Altera_Forum
Honored Contributor II
578 Views

Excellent. Thanks Majd. 

 

One more question. I want to use a 16384-point FFT, with 16-bit precision. This results in a 34-bit width for both the real and imaginary part of the FFT output. Putting these together, I have 68 bits. The SGDMA is limited to 64 bits for data. Is there a way to increase this directly, or is it better to use 2 seperate avalon interfaces? 

 

Fast forwarding a bit, I tried using 2 avalon interfaces but the output data from both of the interfaces never comes out: 

 

// I start the real transfer first 

alt_avalon_sgdma_do_async_transfer(receive_real_DMA, desc1);  

... 

while(IORD(SGDMA__ST_TO_MM_FFT_REAL_BASE,0) != 14) 

}; 

 

// ...then I start the imaginary transfer 

alt_avalon_sgdma_do_async_transfer(receive_imag_DMA, desc2);  

... 

while(IORD(SGDMA__ST_TO_MM_FFT_IMAG_BASE,0) != 14) 

}; 

// end sample code... 

 

 

The code gets hung up at the first while statement. I know that both SGDMA are initialized properly because ithe initialization were not null and no error was printed. 

 

I believe that I may be missing something here...:confused:...can not quite put my finger on it. 

 

Let me also explain how i connected each of the avalon interfaces to the single FFT component: 

 

source_SOP source_EOP each connect to source_SOP_s and source_EOP_s respectively. I then connect source_SOP_s and source_EOP_S to the SOP and EOP outputs for each avalon interface (remember, there is an interface for real, and imaginary data). 

 

I follow the same procedure for source_valid, source error from the FFT, by connecte each to a signal, and then the signal to each of the avalon interfaces. 

 

With these parellel connections, I hoped that when the FFT source could simutaneously drive both avalon interfaces in parallel. However, neither is recieving data properly. 

 

Any ideas?
0 Kudos
Altera_Forum
Honored Contributor II
578 Views

Another side question too. 

 

Can you explain, or point me toward the documentation that discusses the following line? 

 

while(IORD(SGDMA_MM_TO_ST_FFT_BASE,0) != 12); // For sending the FFT input data 

 

while(IORD(SGDMA_ST_TO_MM_FFT_IMAG_BASE,0) != 14); // For recieving the FFT output data 

 

 

 

where does the 12 and 14 come from. I understand that this indicates when the transfer is complete, but where is the documentation?
0 Kudos
Altera_Forum
Honored Contributor II
578 Views

how can i use this fft system in my main system for speech recognition . 

how to merge this in to main system. i have to take speech as input to fft from memory and again put it to memory . how can i do this please give me ideas .
0 Kudos
Altera_Forum
Honored Contributor II
578 Views

i studied above tutorials. 

i have following queries 

 

>should i have to create a fft component using fft.vhd or fft.v file generated from fft megacore.if yes what files are need  

 

>>in zip file posted by majd file 1 shows only generation process of fft not the component addition process so i got confused  

>> can i use verilog wrapper code for generation wrapper component for fft . As my whole system is in vhdl.
0 Kudos
Altera_Forum
Honored Contributor II
578 Views

Hi every body,  

 

to marlon: 

 

i am sory for late.  

for quastion about using 12 and 14. 

 

while(IORD(SGDMA_MM_TO_ST_FFT_BASE,0) != 12); 

while(IORD(SGDMA_ST_TO_MM_FFT_IMAG_BASE,0) != 14); 

 

meaning: 

when i read this number from Status Registers, that mean SGDMA finished transfer data without wrong. 

 

to understand exactly what it means you can read this document: 

------------------------------------------- 

"Quartus II Handbook Version 9.1, Volume 5: Embedded Peripherals" available at http://www.altera.com/literature/hb/nios2/n2cpu_nii5v3_05.pdf 

 

to read about SGDMA :Page 209 

to read about Status Registers bit Map:Page 221 

------------------------------------------- 

 

to necsagar: 

 

when you start your Project you should use same language in all project. 

i think you start project in VHDL, So when you add new component (like FFT core) you should add it using VHDL. 

 

you can not use my wrapper_FFT file for your FFT component, But you should write new wrapper in VHDL. 

 

if you have problem in using FFT core you can read this document: 

"FFT MegaCore Function User Guide" 

 

start from page 21
0 Kudos
Altera_Forum
Honored Contributor II
578 Views

Thanks Majd, 

 

I found a temporary workaround for the 64-bit data path constraint on the SGDMA. I decided to use a smaller 4096 point FFT with a 16-bit precision and that resulted in a 31 bit output for both real and imaginary. 

 

With your help I have almost completed my "wicked" FFT algorithm for calculating very large FFTs :) Thanks for pointing me in the direction of the documentation. 

 

Necsagar, I actually converted the wrapper to VHDL. I will try to post it tomorrow for you.
0 Kudos
Altera_Forum
Honored Contributor II
578 Views

hello marlon plz post the wrapper code i am stuck in this portion

0 Kudos
Altera_Forum
Honored Contributor II
578 Views

Hello Necsagar, 

 

Sorry for the dalay. I am tying this code directly to the post, so please excuse any typos. Name the component "ff_avalon_wrapper.vhd". Code below: 

 

library IEEE; use IEEE.std_logic_1164.all; ENTITY fft_avalon_wrapper IS Port{ clk : in std_logic; reset_n : in std_logic; mm_writedata : in std_logic_vector(7 downto 0); mm_write : in std_logic; sink_valid : in std_logic; sink_sop : in std_logic; sink_eop : in std_logic; sink_empty : in std_logic_vector(1 downto 0); sink_real : in std_logic_vector(31 downto 0); sink_error : in std_logic_vector(1 downto 0); source_ready : in std_logic; sink_ready : out std_logic; source_error : out std_logic_vector(1 downto 0); souece_sop : out std_logic; source_eop : out std_logic; source_valid : out std_logic; source_data : out std_logic_vector(63 downto 0); source_empty : out std_logic_vector(2 downto 0) }; END fft_avalon_wrapper; ARCHITECTURE SYN OF fft_avalon_wrapper IS component FFT_1024 IS PORT{ clk : in std_logic; reset_n : in std_logic; fftpts_in : in std_logic_vector(12 downto 0); inverse : in std_logic; sink_valid : in std_logic; sink_sop : in std_logic; sink_eop : in std_logic; sink_real : in std_logic_vector(15 downto 0); sink_imag : in std_logic_vector(15 downto 0); sink_error : in std_logic_vector(1 downto 0); source_ready : in std_logic; fftpts_out : out std_logic_vector(12 downto 0); sink_ready : out std_logic; source_error : out std_logic_vector(1 downto 0); source_sop : out std_logic; source_eop : out std_logic; source_valid : out std_logic; source_real : out std_logic_vector(30 downto 0); source_imag : out std_logic_vector(30 downto 0) }; END COMPONENT signal fftpts_in_s, fftpts_out_s : std_logic_Vector(12 downto 0); signal real_data_s, imag_data_s : std_logic_vector(30 downto 0); signal fft_inverse_s : std_logic; BEGIN FFT_1024_inst :FFT_1024 PORT MAP{ clk => clk, reset_n => reset_n, fftpts_in => fftpts_in_s, fftpts_out => fftpts_out_s, inverse => inverse_s, sink_valid => sink_valid, sink_sop => sink_sop, sink_eop => sink_eop, sink_real => sink_real(7 downto 0) & sink_real(15 downto 8), -- SGDMA flips byts. This assignment flips the bytes back the right way. sink_imag => sink_real(23 downto 16 ) & sink_real(31 downto 24), --SGDMA flips byts. This assignment flips the bytes back the right way. sink_ready => sink_ready, sink_error => sink_error, source_error => source_error, source_ready => source_ready, source_sop => source_sop, source_eop => source_eop, source_valid => source_valid, source_real => real_data_s, source_imag => imag_data_s }; --Flip the bytes and sign extend... source_data(63 downto 56) <= imag_data_s(7 downto 0); source_data(55 downto 48) <= imag_data_s(15 downto 8); source_data(47 downto 40) <= imag_data_s(23 downto 16); source_data(39) <= imag_data_s(30) ; -- Sign extension for imag data... source_data(48 downto 32) <= imag_data_s(30 downto 24); source_data(31 downto 24) <= real_data_s(7 downto 0); source_data(23 downto 16) <= real_data_s(15 downto 8); source_data(15 downto 8) <= real_data_s(23 downto 16); source_data(7) <= real data_s(30); -- Sign extension for real data... source_data(6 downto 0) <=real data_s(30 downto 24); process(clk) begin if(rising_edge(clk)) then fftpts_in_s <= "1000000000000"; fft_inverse_s <= '0'; end if; end process; END SYN;  

 

 

A few notes about my FFT implementation 

 

  • 16 bit data precision 

  • 4096 FFT points (as specified by fftpts_in) 

  • Each fft output is 31 bits. This enable me to encapsulate each component into a single location. Naturally, you need to make sure that you extract each of the component as 32 bit values in your C code. This can be accomplished by casting the (*alt_u64) pointer into a (*alt_32) pointer, enabling you to access each component. 

  • If your FFT application uses FFT outputs that are greater than 64 bits (or 32 bits for each component), this will not work properly since the SGDMA is limited to a 64 bit word size. If you find out how to get around this limitation let me know because I would like to use a bigger FFT size, but can not since the data width will be larger than 64 bits. 

Hope this helps. Let me know if you have any questions.
0 Kudos
Altera_Forum
Honored Contributor II
578 Views

 

--- Quote Start ---  

hello marlon plz post the wrapper code i am stuck in this portion 

--- Quote End ---  

 

 

 

Any luck with the VHDL wrapper?
0 Kudos
Altera_Forum
Honored Contributor II
578 Views

i used ur wrapper and created my system using sopc.but when i see the nios_sytem.vhd file there are no fft inclusion in that file.But my project was succesfully compiled. 

 

the steps i followed was that 

 

1. i went to megacore wizard generated my fft for streaming architecture(which architecture is better for speech recognition

 

2. i went to sopc created my fft component adding fft.vhd file (i think i have problem here in majd included files i did not find picture attached for creating fft component and what files need to be included and its interfaces

 

3.i added the fft_wapper.vhd and created a new component fft_wrapper as in majd tutorial. 

 

4. i added all necessary components(sgdma,rams) 

 

Plz help me how to create fft component including your system image 

 

my project is on speech recognition and i have my presentation scheduled in DEC 5 2010 so i am getting hopeless  

plz help me.
0 Kudos
Altera_Forum
Honored Contributor II
578 Views

 

--- Quote Start ---  

i used ur wrapper and created my system using sopc.but when i see the nios_sytem.vhd file there are no fft inclusion in that file.But my project was succesfully compiled. 

 

the steps i followed was that 

 

1. i went to megacore wizard generated my fft for streaming architecture(which architecture is better for speech recognition

 

2. i went to sopc created my fft component adding fft.vhd file (i think i have problem here in majd included files i did not find picture attached for creating fft component and what files need to be included and its interfaces

 

3.i added the fft_wapper.vhd and created a new component fft_wrapper as in majd tutorial. 

 

4. i added all necessary components(sgdma,rams) 

 

Plz help me how to create fft component including your system image 

 

my project is on speech recognition and i have my presentation scheduled in DEC 5 2010 so i am getting hopeless  

plz help me. 

--- Quote End ---  

 

 

 

I believe that step 2 may be where the problem is. The beautiful thing abou the FFT wrapper file is that you do not need to add the actual FFT files directly in the SOPC builder. Try these steps: 

 

1. Generate your FFT component, just as you have before. Make note of the name of the component (for example fft_64_.vhd). It should be the "top" level for the the FFT that was generated by the wizard. 

 

2. Open up the fft_wrapper.vhd file and replace "fft_1024", with the name of the FFT file created for you through the wizard (for example, the entity name for the fft_64.vhd component you just created). 

 

 

3. You will probably need to edit the fft_wrapper further, depending on your FFT implementation: 

  • If you are using floating point (add signals for the exponent), since my implementation was fixed point. (Hint - use fixed point :)) 

  • If the bit width for the real and imaginary outputs from the FFT is greater than 32 bits (if the width is greater than 64 bits when combined into a single vector). (Hint - keep the bit width of each component below the 32 bits and you should be ok). 

  • If the input is greater that 32 bits for your FFT (16 bits for imaginary and 16 for real). 

  • If you want to control the direction of the FFT (the inverse input), you will need some additional logic, or perhaps there is another way. But, I just hardcoded the wrapper to keep the FFT in a single direction. 

4. Now, just add the wrapper as a new component in the SOPC builder. Remember do not add the fft_instance.vhd file directly. This may cause issues. Just add the wrapper, which indirectly adds the actual FFT files. 

 

5. Add all of the other necessary component as you did before. 

 

 

This should get you up and running. Its going to be impossible for me to send you a screen shot of my system, but if you send me some shots of what you have, I may be able to help further. 

 

Let me know when if this helps.
0 Kudos
Altera_Forum
Honored Contributor II
578 Views

i am not getting how to give signal to fft . i have ADC output at sdram and have to give this signal as input to fft . what about other signals are they defined in fft wrapper. plz help me add the signal and how to check whether my fft is working or not. i have to get fft output and compare it for speech recognition.plz reply soon 

in given code by majd i donot get why sig[i] is used
0 Kudos
Altera_Forum
Honored Contributor II
578 Views

Hi All, 

Does anyone know how to perform FFT using some FFT size and combine after completed all the chunk of data rather than using on big FFT size in FPGA point of view? For example, compute 32k FFT using 8192 FFT size with repeating 4x using the same 8k FFT? The reason to do this is to reutilized the FFT logic inside the FPGa in order to save logic consumption.
0 Kudos
Altera_Forum
Honored Contributor II
578 Views

xfpga, 

 

There are several ways to accomplish this with algorithms. The easiest and perhaps the most inefficient way of doing it is to: 

 

1.) Decimate the data 

2.) Perform the block FFT on each set of decimated data 

3.) Recombine each of the FFT outputs together. In order to do this you need to multiply each contribution by a coefficient that effectively "weights" each contribution. 

4.) Store the result. 

 

That is the basic gist of it. I will provide you with a more formal description to follow...
0 Kudos
Altera_Forum
Honored Contributor II
578 Views

The example that you gave was to use a 8k FFT to compute a 32k FFT. Let N be your desired FFT size, in this case 32768. Also let M be your block FFT size, which in this case will be 8192. 

 

Lets suppose that your 32768 data points are stored in a contiguous piece of memory. Lets refer to this set of data as x(n) where n ranges from 0 to 32767. 

 

Next, we perform the M-point FFTs on the decimated representations of x(n). Since M = 8192 and N = 32768, we know that we will need to perform 4 block FFT operation (note that the M does not need to be an integer multiple of M...). Let's call this number L, where L = ceil[N/M]. 

 

Follow the following steps: 

 

 

  1. Perform an M-point FFT on the x(n) samples where n = 0, 4, 8, ..., ( ((N-1) -3). We'll call those FFT results x0(k). 

  2. Store four copies of x0(k) in an array/memory. 

  3. Next we compute an M-point FFT on the x(n) samples where n = 1, 5, 9, ..., ((N-1) -2). We call those FFT results x1(k). 

  4. Store four copies of x1(k) in an array/memory. 

  5. Next we compute an M-point FFT on the x(n) samples where n = 2, 6, 10, ..., ((N-1) -1). We call those FFT results x2(k). 

  6. Store four copies of x2(k) in an array/memory. 

  7. Next we compute an M-point FFT on the x(n) samples where n = 3, 7, 11, ..., ((N-1) - 0). We call those FFT results x3(k). 

  8. Store four copies of x3(k) in an array/memory. 

 

 

At this point you are almost finished....almost :-P. You should have 4 arrays each of size N. Now the question is how do I put all of this data back together to get the 32768-point FFT that we desire? The answer is that each contribution must be "scaled" by a coefficient matrix. Since N = 32768, we know that our coefficient matrix should have 32768 value in it. The coefficient matrix, L, is defined below: 

 

L(k) = e-j2&#960;k/N; for k -> 0 : (N - 1) 

 

This is nothing more than a complete cycle of the unit circle with 32768 samples. 

 

You can find a much better description of this at the following website : 

 

http://www.dsprelated.com/showarticle/63.php 

 

Let me know if you have any questions.
0 Kudos
Altera_Forum
Honored Contributor II
578 Views

Hi,  

Thanks a lot for your advices. When implementing the FFT which method will be more efficient, in NIOS or hardware(logic gates)? The reason I ask is because from the method you mentioned it involve arrangement of data and multiplication of coefficient matrix, L. Understand that the FFT IP core from Altera consumed quite a lot of logic. Do you have any references to implement this from scratch? Thanks.
0 Kudos
Altera_Forum
Honored Contributor II
578 Views

 

--- Quote Start ---  

Hi,  

Thanks a lot for your advices. When implementing the FFT which method will be more efficient, in NIOS or hardware(logic gates)? The reason I ask is because from the method you mentioned it involve arrangement of data and multiplication of coefficient matrix, L. Understand that the FFT IP core from Altera consumed quite a lot of logic. Do you have any references to implement this from scratch? Thanks. 

--- Quote End ---  

 

 

I suppose that you could implement a full-custom FFT core. But, chances are that it will not be as effecient as the cores that Altera has already provided in the mega core. The method above can be implemented in software (utilizing the FFT megacore NIOS) without too much headache. 

 

While I do not have a reference design for you, I can give you a rough step-by-step to implement it in software: 

 

  1. Generate the largest FFT based upon the resources of your FPGA/development board. 

  2. Next, you will need to create an SOPC system that contains a NIOS core, FFT wrapper (you can refer to the wrapper in a previous thread in this post), and also some memory. I recommend you use DDR if you have it available...but i suppose internal RAM could work. The wrapper is basically a shell that encapsulated the FFT component. 

  3. Once your system is up and running, you can then write a program in C for NIOS that does all of the matrix operations for you. This is where this method loses efficiency - since NIOS is doing all of the math, it will be sllllooooooowww! The tradeoff however, is that you can accomplish larger FFTs that what is provided by Altera with relative ease. 

There may be some other stuff in there that you need :) . It may be a good idea to place internal RAM in between NIOS and the FFT core (input side) as well as on the output side of the FFT. This will enable you to utilize the SGDMA for transfering data in and out of you FFT and also reduces the cycle count of the NIOS when running the program. 

 

Implementing the FFT from scratch using logic is not trivial by any means. This method will be a lot easier. 

 

Hope this helps!
0 Kudos
Altera_Forum
Honored Contributor II
578 Views

why the program stuck in this line? 

 

while (IORD(SGDMA_ST_TO_MM_FFT_BASE,0)!=14) {}; 

 

when i check the return value of IORD(SGDMA_ST_TO_MM_FFT_BASE,0) is always 8. could somebody give me a solution? Should we add a process in the wrapper? 

thanks... 

 

Here is the fft code wrapper as mentioned earlier in the thread. 

module fft_avalon_wraper  

clk, 

reset_n, 

mm_writedate, 

mm_write, 

sink_valid, 

sink_sop, 

sink_eop, 

sink_empty, 

sink_real, 

// sink_imag, 

sink_error, 

source_ready, 

sink_ready, 

source_error, 

source_sop, 

source_eop, 

source_valid, 

source_data, 

source_empty 

// ,source_exp 

); 

 

input clk; 

input reset_n; 

input [7 : 0] mm_writedate; 

input mm_write; 

input sink_valid; 

input sink_sop; 

input sink_eop; 

input [2 : 0] sink_empty; 

input [31 : 0] sink_real; 

//input [31 : 0] sink_imag; 

input [1 : 0] sink_error; 

input source_ready; 

output sink_ready; 

output [1 : 0] source_error; 

output source_sop; 

output source_eop; 

output source_valid; 

output [31 : 0] source_data; 

output [2 : 0] source_empty; 

//output [5 : 0] source_exp; 

 

reg inverse=0; 

reg [15 : 0]sink_imag=0; 

wire [5 : 0] source_exp; 

reg [10 :0] fftpts_in; 

 

FFT fft 

.clk(clk), 

.reset_n(reset_n), 

.inverse(inverse), 

.sink_valid(sink_valid), 

.sink_sop(sink_sop), 

.sink_eop(sink_eop), 

.sink_real(sink_real[15 : 0]), 

.sink_imag(sink_imag), 

.sink_error(sink_error), 

.source_ready(source_ready), 

.sink_ready(sink_ready), 

.source_error(source_error), 

.source_sop(source_sop), 

.source_eop(source_eop), 

.source_valid(source_valid), 

.source_exp(source_exp), 

.source_real(source_data[15 : 0]),  

.source_imag(source_data[31 : 16]));  

endmodule
0 Kudos
Reply