We are having trouble with the migration of the fixed point FFT core to the floating point FFT core on Arria 10. On example of strange behavior is seen in the the SignalTap below: the SOP en EOP markers are not aligned with the (de)assertion of the valid. We have not seen this before.
We have already clocked the core down from 300Mhz to a comfortable 200Mhz, but no change. We have not been able to reproduce this behavior in simulation.
Reproduced in simulation. We stream 6 x 4096 samples in the FFT core, and as can been seen the EOP and the end is not streamed out by the core. (Also checked, some final samples are missing, the last output is not complete at 220000ns). When we start a new run of 6 x 4096 samples, it outputs the missing data. So the core is lagging behind further and further going out of sync with the input data.
The reason for this behavior, which is new relative to the fixed point FFT, is an extra invalid word just after the SOP at the output.
This caused the behavior that the output will require a clock cycle more to output all data than is required to input the data.
Not a problem, as long as the core would output its data eventually, but it seems to inhibit its output when sink_invalid is inhibited:
And this is not what the UG says from my understanding. The text suggests that the core will output pending data after the EOP? (see 2)
Is this a bug or are we missing something here?
How to get the data in the core without clocking new valid data in?
As I understand, it you are having some inquiries related to the A10 FFT IP core. Specific to your initial observation where the SOP and EOP not aligned with the valid signal. Sorry for the any confusion, would you mind to further pin point ie with some illustration on the screenshot to show the specific location of the not align? thank you very much.
By the way, if you are referring to SOP/EOP not assert/de-assert together with valid signal, there should be no issue with this. Only those SOP/EOP that assert during the assertion of valid signal, will be valid.
I believe we will need to further look into your second observation where the last EOP is not streamed out by the core. Just would like to check with you if you have had any chance to try the following:
Please let me know if there is any concern. Thank you.
Well, basically, there is only a single issue: the FFT does not output pending data on its source when a valid eop has been
registered on its sink I our preliminary hypothesis.
I tested it with 2 x 4096 consecutive streans, and it shows the same behavior. Only in this case only a single output word
is omitted. The EOP is asserted for the second stream, but the valid has already be deasserted.
And an overview:
I have not looked at the generated tb yet. Can you confirm that we have understood correctly that the FFT core
should present all data pending in the pipeline after the EOP on sink, independent of the valid state of the sink after
Ran some tests with the "example design". Modified it to run 6 x 4096 streams. The output is correct. However, this little test_program
only demonstrates the ideal world of all valid data and an always ready-in of the source. My hunch is that the behavior is caused by
the non-valid data between the streams (say 8 cycles).
Just one little observation, at the source just after the SOP there is a single cycle valid deassertion. Never observed it before it other core variations, and cannot image any reason for it.
We have the same issue with the FFT IP from Quartus17.1.
This issue does not occur in Quartus 16.1 (FFT IP version 16.1).
The issue occurs when the FFT IP is configured to 'Single Floating Point' and 'Enable Hard Floating Point Blocks' and Output Order set to 'Natural'. The issue does not occur with Output Order = 'Digit Reverse'.
The timing and behavior on the source output stream is exactly as desribed above in this case.
Thanks RGroo3 for sharing.
As I checked into the internal database, this seems to be similar to one of the known issue which was reported to Factory already. The Factory is currently looking into this for future enhancement.
As a workaround, it is recommended for you to try using 16.1 core which seems to be OK from RGroo3's observation to avoid further gating your progress. Sorry for the inconvenience.
Enhancement?? Defect better describes it. IMHO it should be fixed in 17.1.
The 16.1 core seems to be less optimized and runs on a lower maximum clock after fitting.
We would like to run at 300MHz (same as the memory clock) which is on the edge of the 16.1 core in our design (probably some optimizations in the Arria 10 DSP IP).
We will reimplement the affected IP.