Re: Stuck on Fast QSPI

Altera_Forum · ‎11-08-2017

I have a 100 MHz clock generated by an internal PLL in a Cyclone IV design, and I would like to read from a QSPI peripheral at 100MHz (i.e. 400Mbit per second).

That seems like it should be possible, and I have had some success on prototype hardware. I wrote logic that is pulling data from flash at 100 MHz on real hardware without data errors. However, it fails timing in Quartus. I do not expect it to work reliably over process/voltage/temperature variations.

I would like to correct the design so that it passes timing, but I am stuck because I have not worked with a serial interface this fast before. Usually I am dividing down from a master clock (e.g. 100 MHz -> 25 MHz). I have PLL outputs available for phase shifting, but I do not know how to apply or constraint them. I have not been able to find any examples of this situation.

Can anyone help or point me to any resources on this scenario? I have not been able to find anyone locally who knows enough about timing constraints.

Altera_Forum · ‎11-08-2017

There is a fairly good reference guide as to which timing constraints you need for Altera's Parallel Flash Loader (PFL) core available here:

https://www.altera.com/documentation/sss1411439280066.html#sss1411979414512

I acknowledge this isn't the quad memory device you're discussing. However, if you consider the clock and synchronous constraints it discusses it covers everything you need to consider for your application.

Also use Altera's "an 433: constraining and analyzing source-synchronous interfaces (https://www.altera.com/en_us/pdfs/literature/an/an433.pdf)". This will confirm the exact syntax you need and discusses synchronous constraints more generically.

Cheers,

Alex

Altera_Forum · ‎11-09-2017

I wrote my output constraints based on AN433. I used Figure 14 (Circuit with Common Data Clock and Output Clocks) without the DDR flops. Then I used the System-Centric output examples to set output delays. Timequest seems to be happy with that. There aren’t any failures reported on data output paths.

However, the data inputs are different. AN433 says it only constrains cases where clock and data are provided by the same device. That is not the case for data coming back from the QSPI part, since the QSPI part is not providing the clock.

I wrote some input_delay constraints based on what made sense to me, but they are reporting failures by about 400ps in one of the corners (slow, 85C). Timequest shows that in the 10ns window, I am losing:

1) 6.21ns to QSPI device Tco and board delays

2) 3ns to FPGA clock output delay

3) 1.135ns to FPGA data input delay

I do not know how to proceed from here. I already have fast input registers enabled, so I don’t see the data input delay getting any shorter. Is there a way to make the clock delay shorter?

Altera_Forum · ‎11-09-2017

Hi,

can you provide your design archive or at least .sdc file? It would be easier to understand your design and I will try to help you.

Altera_Forum · ‎11-11-2017

Sure, I can do even better. This zip file contains a complete project with everything stripped out except the flash support. It also includes a simple self-checking ModelSim Altera testbench if you run “run_sim.do” in the sim folder.

One thing that may be confusing is that the flash logic clocks data in on falling edges. Since data is clocked out on falling edges, that allows a full period to deal with Tco delays instead of only half a period.

Altera_Forum · ‎11-14-2017

In your case I would invert output clock FLASH_SCLK by using Altera altddio_out. This way you can avoid clocking FLASH logic inside FPGA from negative edge. In this case you have to change generated clock constrains for CLK_FLASH and define false path on output clock port FLASH_SCLK. Let me know if you need some help with timing constrains.

Altera_Forum · ‎11-15-2017

The Tco of the QSPI part (6ns) is longer than the time from a falling edge to a rising edge (5ns). Why would moving to the 5ns window instead of the 10ns window help? Additionally, won’t that break the data output paths to the QSPI part since those share the same clock as the data input paths?

TimeQuest is also telling me that adding a DDIO block to the clock path increases the FPGA’s clock output delay.

Do you have any examples of what you are describing?

Altera_Forum · ‎11-15-2017

--- Quote Start ---

The Tco of the QSPI part (6ns) is longer than the time from a falling edge to a rising edge (5ns). Why would moving to the 5ns window instead of the 10ns window help?

--- Quote End ---

Inverting output clock gives same window for input paths but you have to change your internal FPGA registers to be clocked on rising edge of clk98meg. It would be same window as you were getting by clocking your internal FPGA registers from negative edge see attached image. But by inverting clock on output you will avoid this confusion on negative edge clocked registers inside FPGA.

https://www.alteraforum.com/forum/attachment.php?attachmentid=14392

--- Quote Start ---

TimeQuest is also telling me that adding a DDIO block to the clock path increases the FPGA’s clock output delay.

--- Quote End ---

Yes this is true. So I have modified your design and added second PLL to shift clock FLASH_SCLK. Phase shift is slightly less than 180 to meet timing requirements.

Altera_Forum · ‎11-21-2017

Thank you, this looks great. I had thought that the input delay constraint clock needed to be the actual clock to the input registers. It really helps to have an example like this to see how to work around that.

Similarly, I thought the output delay constraint clock needed to be the output register clock (or physically connected to it, at least). With the way that you have it written:

1) Do the tools automatically account for the phase difference between CLK_FLASH and CLK_CORE?

2) What is “set_false_path -to [get_ports FLASH_SCLK]” doing? I took that out and did not notice anything change.

Also, how did you come up with the specific phase shift amount? This is probably all pretty basic, but it is my first time working with phase-shifted copies of clocks.

Altera_Forum · ‎11-27-2017

--- Quote Start ---

1) Do the tools automatically account for the phase difference between CLK_FLASH and CLK_CORE?

--- Quote End ---

In my constrains I have included -phase option. So phase difference should be accounted.

--- Quote Start ---

2) What is “set_false_path -to [get_ports FLASH_SCLK]” doing? I took that out and did not notice anything change.

--- Quote End ---

It prevents FLASH_SCLK from being analysed as data. By using this command only the FLASH_SCLK as data is cut off, but as a part of a clock path it will not be effected. In some cases timequest will treat your clock output as data port and you will see it in TimeQuest unconstrained paths report list. set_false_path command removes it from there.