Intel Floating Point FFT IP Gives Different Result to Python Example

AlexBeasley · ‎09-01-2022

Hi there,

I am having some issues with a simulation of the Intel FFT IP core.

I have set up the IP core with the following parameters:

And I trying a simulation with a very simple input stream of floating point numbers. Each packet is 16 data points long (there are actually 16 packets, but to simplify we will just discuss the first packet here)

The fft_points_in and fft_points_out ports are both set with 12'd16
The inverse port is set to 1'd1 (we are actually doing an ifft)

The contents of the first packet are as follows :

[2.476367e+00 -2.004069e+00j, -6.033607e+00-2.375978e+00j, 4.926060e+00+ 1.137245e+00j,-2.703433e+00+4.676073e-01j, -1.587575e+00-7.515646e-01j, 1.801851e+00-1.081441e+00j, -4.354355e+00-4.494305e-01j, 6.268810e+00+1.988309e+00j, -6.082956e+00-3.423565e+00j,	-1.739878e+00+3.414765e+00j,6.424708e+00-7.262015e+00j,	-1.318541e+00+4.476929e+00j, -1.925149e+00+1.762921e+00j, 4.511015e+00-3.556520e-01j, -4.396655e+00+1.443588e+00j, -3.116817e+00+2.427287e+00j]

And the output of the FFT module is:

[-6.850152e+00-5.850621e-01j ,
-8.229693e+00-6.425217e+00j ,
3.095238e+01-1.321568e+01j  ,
-5.182938e+00-4.233325e+00j ,
6.954326e+00+7.075030e+00j  ,
2.097159e+01+1.175418e+01j  ,
2.702558e+00-1.543010e+01j  ,
-6.319021e+00+1.134168e+01j ,
-1.252404e+00-1.055310e+01j ,
-1.200143e+01-1.787161e+01j ,
3.848109e+01+2.209911e+01j  ,
7.762767e+00-2.060383e+01j  ,
-4.190218e+00-7.037646e+00j ,
8.351004e-01-1.334330e+01j  ,
-7.578176e+00+1.071285e+01j ,
-1.743392e+01+1.425092e+01j ]

For comparison I also do the same operation in python:

import numpy as np 

data =  [2.476367e+00 -2.004069e+00j, -6.033607e+00-2.375978e+00j, 4.926060e+00+ 1.137245e+00j,-2.703433e+00+4.676073e-01j, -1.587575e+00-7.515646e-01j, 1.801851e+00-1.081441e+00j, -4.354355e+00-4.494305e-01j, 6.268810e+00+1.988309e+00j, -6.082956e+00-3.423565e+00j,	-1.739878e+00+3.414765e+00j,6.424708e+00-7.262015e+00j,	-1.318541e+00+4.476929e+00j, -1.925149e+00+1.762921e+00j, 4.511015e+00-3.556520e-01j, -4.396655e+00+1.443588e+00j, -3.116817e+00+2.427287e+00j]	

print(np.fft.ifft(data))

Which gives:

array([-4.28134688e-01-3.65664875e-02j,  6.85571121e-04-1.85955469e-03j,
        7.83755566e-04+3.85427278e-04j,  1.18608141e-03+1.93582093e-03j,
        2.46045625e-03+7.73105625e-03j,  1.69786405e+00+4.96621572e-02j,
       -2.78107982e-01-2.68442361e+00j,  1.77786332e+00+8.59785352e-01j,
       -1.36809688e-01-1.15679477e+00j,  6.72262721e-01+1.00248660e+00j,
        8.77349057e-01+1.70746302e+00j,  3.15544506e-01-9.05137254e-01j,
       -1.21734433e+00+8.15608063e-02j,  3.97639803e-01-6.11021704e-01j,
       -6.23491081e-01-6.33172444e-01j, -5.83384553e-01+3.13896581e-01j])

The two arrays are wildly different and I don't understand where the difference is coming from. If anyone can help that would be amazing!

I know the Intel FFT IP core gives the output as "digit reversed". As far as I can tell this means the ordering of the outputs will be different, but I am not sure how exactly this works. Either way, as can be seen from the example above the two outputs contain different data, not just the same data in a different order.

Many thanks

AlexBeasley · ‎09-02-2022

To expand upon this issue, I have made a simpler test case which I will now describe. I am still trying to calculate an IFFT.

The input to the FFT IP core is the following array:

[(1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j), (1+1j)]

A really simple sequence of 16 "ones".

The expected IFFT is:

[1.+1.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j,
 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j]

Very simply we have data in bin 0 (ones) and nothing in any other bin (the rest is zeros)

I have then taken two copies of the intel FFT core as configured in the post above; and I have set the "inverse" port of one of the instances to 0 and in the other instance it is set to 1. So I am expecting that one of the instances will calculate the FFT and one will calculate the IFFT.

For reference the FFT of this input array is:

[16.+16.j,  0. +0.j,  0. +0.j,  0. +0.j,  0. +0.j,  0. +0.j,
 0. +0.j,  0. +0.j,  0. +0.j,  0. +0.j,  0. +0.j,  0. +0.j,
 0. +0.j,  0. +0.j,  0. +0.j,  0. +0.j]

When I then run a simulation of the two cores I get the following outputs

Instance 1 - "inverse" driven to 0

(16.000000 16.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)

Instance 2 - "inverse" driven to 1

(16.000000 16.000000) 
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)
(0.000000 0.000000)

(Please note that if we express these values in exponent form "%e" we actually see that some of the "zero" values are actually very very small decimal values that are just greater than 0, but I can accept some variation from the expected values).

However the Big Problem here is that no matter whether I drive the "inverse" port to 1 or 0, I am only getting the forward FFT and it never calculates an IFFT for me.

For clarity, the "inverse" port value is hard coded and a reset event happens at the beginning of the simulation for 5 clock cycles before data is sent to the core.

Does anyone have any ideas on what I might be doing wrong to configure the core to give me IFFTs?

Thanks!

AlexBeasley · ‎09-05-2022

I have experimented more with this issue and found the intel design example located here:

https://www.intel.com/content/www/us/en/design-example/714680/cyclone-10-gx-fft-to-ifft-with-natural-input-and-output-order-using-cosine-data-example-design-17-1.html?wapkw=fft%20ifft

I extracted the project archive, compiled it, generated the simulation IP setup scripts (Tools > Generate Simulator Setup Script for IP) and top level verilog representation of the block diagram design file (EDA Netlist Writer).

I then ran this through the simulation and achieved the following result:

The description of the example project claims that "When both the FFT and iFFT are operating as expected, Cosine data will be recovered and observed at the iFFT output."
The output seen in the simulator is not the cosine wave as expected.

I have attempted this in both:
Quartus 17.1 Prime Pro* + ModelSim - Intel FPGA Starter Edition 10.5c

Quartus 21.4 Prime Pro+ QuestaSim - Intel FPGA Edition 2021.3

Both attempts yield an unexpected output - source_real and source_imag are not cosine waves.

(below is the output from the QuestaSim simulation)

*Questa 17.1 Prime Pro fails a full compile - the log can be seen below. I have checked and the file it claims to be unable to load does exist in the directory listed.

Problem Details
Error:
Internal Error: Sub-system: DCALC, File: /quartus/ddb/dcalc/dcalc_bcm_modules_cache.cpp, Line: 116
Could not load pdb file - c:/intelfpga_pro/17.1/quartus/common/devinfo/cyclone10gx/ddb_cyclone10gx_io_48_3v_tile-ff-3-0-hs_model_debug
Stack Trace:
    0x5d410: DCALC_TIMING_MODULES_CACHE::get_model + 0x21dbc (ddb_dcalc)
    0x2f2cb: DCALC_TIMING_NETLIST_MANAGER_IMPL::load_model + 0x43 (ddb_dcalc)
    0x41631: <lambda_27927aa62a013f38f2f5db62a47234ba>::operator() + 0x71 (ddb_dcalc)
    0x41446: tbb::interface6::internal::partition_type_base<tbb::interface6::internal::auto_partition_type>::execute<tbb::interface6::internal::start_for<tbb::blocked_range<int>,tbb::internal::parallel_for_body<<lambda_27927aa62a013f38f2f5db62a47234ba>,int>,tbb::auto_partitioner const >,tbb::blocked_range<int> > + 0x6e (ddb_dcalc)
    0x413d0: tbb::interface6::internal::start_for<tbb::blocked_range<int>,tbb::internal::parallel_for_body<<lambda_27927aa62a013f38f2f5db62a47234ba>,int>,tbb::auto_partitioner const >::execute + 0x20 (ddb_dcalc)
    0x1c1f3: tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all + 0x193 (tbb) at d:\sj\nightly\17.1\240\w64\acds\quartus\extlibs64\tbb\tbb42_20131118oss_altera\src\tbb\custom_scheduler.h:472
    0x19afe: tbb::internal::arena::process + 0x18e (tbb) at d:\sj\nightly\17.1\240\w64\acds\quartus\extlibs64\tbb\tbb42_20131118oss_altera\src\tbb\arena.cpp:105
    0x16867: tbb::internal::market::process + 0xf7 (tbb) at d:\sj\nightly\17.1\240\w64\acds\quartus\extlibs64\tbb\tbb42_20131118oss_altera\src\tbb\market.cpp:479
    0x10eac: tbb::internal::rml::private_worker::run + 0x6c (tbb) at d:\sj\nightly\17.1\240\w64\acds\quartus\extlibs64\tbb\tbb42_20131118oss_altera\src\tbb\private_server.cpp:283
    0x1111a: tbb::internal::rml::private_worker::thread_routine + 0x5a (tbb) at d:\sj\nightly\17.1\240\w64\acds\quartus\extlibs64\tbb\tbb42_20131118oss_altera\src\tbb\private_server.cpp:240
    0x24f7e: _beginthreadex + 0x106 (MSVCR120)
    0x25125: _endthreadex + 0x191 (MSVCR120)
    0x154df: BaseThreadInitThunk + 0xf (KERNEL32)
     0x485a: RtlUserThreadStart + 0x2a (ntdll)

End-trace


Executable: quartus_fit
Comment:
None

System Information
Platform: windows64
OS name: Windows 10
OS version: 10.0

Quartus Prime Information
Address bits: 64
Version: 17.1.0
Build: 240
Edition: Pro Edition

Kshitij_Intel · ‎09-12-2022

Hi,

What is your FFT_SIZE?

Thank you

Kshitij Goel

AlexBeasley · ‎09-13-2022

Hi there,

I have tried many sizes. From my design examples I have tried anywhere between 16 and 2048 points.

The above design using the design example from Intel I have not edited the code and have just checked the FFT points are set to 128.

Thanks

Kshitij_Intel · ‎10-04-2022

Hi,

I have checked your python output is (FFT(fft_input)/16), output is in reverse order.

To debug your Intel FPGA IP. Please share your simple project.

Thank you

Kshitij Goel

Kshitij_Intel · ‎10-12-2022

Hi,

Any update on this.

Thank you

Kshitij Goel

Kshitij_Intel · ‎10-18-2022

Hi,

As we do not receive any response from you on the previous reply that we have provided. Please login to ‘https://supporttickets.intel.com’, view details of the desire request, and post a response within the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support. The community users will be able to help you on your follow-up questions.

Thank you

Kshitij Goel