Re: Timequest - Constrating Centre-Aligned SDR

Altera_Forum · ‎11-19-2015

No matter how many times I try and read through the various documents on timing constraints, I can never quite fathom how to apply it to my design.

For example, I have the following which I want to constrain:

(1) I have a PLL which generates an 80MHz clock. This clock in turn drives two DDIO_OUT blocks, one which will send data and one which sends the clock signal. The DDIO block which sends out the clock signal is set up so on a rising edge it clocks out a 0, and on the falling edge it clocks out a 1 which will phase shift the clock by 180 degrees so that it is centre aligned. I use the following constraint for the clock:


create_generated_clock -name adc_refclk_out -phase 180 -source |muxsel}]

I will also need to add set_output_delay commands for -min and -max if I am not mistaken, but I'm struggling to work out what to set them to. This is what I have at the moment:


set_output_delay -max ?? -clock  
set_output_delay -min ?? -clock

(2) The outputs (both clock and data) feed out from LVDS buffers on the FPGA (Stratix V 5SGSMD5K2F40C2L) then through approximately 30cm of cable (actually they are SATA cables simply due to the widespread availability of cheap connectors and cables). After this both signals enter a distribution IC (Texas Instruments CDCLVD1216). The drivers have the following specifications:

Tdelay: Min=0, Typ=1.5ns, Max=2.5ns

Trise: Min=0.05ns, Max= 0.3ns

Max Skew (part to part): Max=0.6ns

From the distribution ICs, the signals are fed through another 30cm SATA cable to several FPGAs (including back into the one where the data/clock originated). The signal comes back in to the FPGAs in via LVDS RX pins.

~5ns/m is a fairly decent approximation for propagation delay, so based on that there will be approximately 3ns delay in the cabling, but as all of the cables are the same, I don't think the skew between the clock lines and data lines from the cabling will be too high - maybe +/-200ps (just a guess).

The clock comes in and feeds a PLL set to Source-Synchronous mode which adds 0 degrees of delay (it generates other clocks as well, but those aren't important). The data comes in through a DDIO_IN buffer which is clocked by the output of the PLL.

For the input, I have got as far as making a virtual clock


create_clock -name {sync_data} -period 12.50

And I have the PLL clock already generated from the 'derive_clock_uncertainty' command. But for simplicity, lets call it "rx_clock" and pretend the following command was used to make it:

# The input clock
create_clock -name {adc_refclk} -period 12.500 -waveform {0.000 6.250} { adc_refclk_in }# The VCO clock was autogenerated with (where refclkin comes from adc_refclk):
create_generated_clock -name {....|vcoph} -source {....|refclkin} -divide_by 2 -multiply_by 8 -duty_cycle 50.00 {.....|vcoph }# And then the output clock of interest is:
create_generated_clock -name {rx_clock} -source {....|vco0ph} -divide_by 4 -multiply_by 1 -duty_cycle 50.00 {...|divclk }

So given that information, could someone point me in the right direction. As I say I've read many different documents but have the time they seem to contradict themselves, or the examples don't match my design (it's usually all DDR type stuff, mine is SDR). Most of the stuff in Quartus I've gotten my head around, by constraining interfaces is something I've never quite grasped and as I am the only one of my colleagues that works with FPGAs, they can't offer much assistance.

If anything needs clarifying let me know.

Altera_Forum · ‎11-20-2015

Looking at the output side first, why is it SDR and not DDR? You say your clock drives two DDIO outputs, one for clock and one for data, which would make your data DDR, right?

Assuming the data output is SDR and changes on the rising edge of clock(either by not having DDIO on the data output, or sending the same data value twice in a row), then you are sending it SDR with the clock center-aligned to the data. For FPGA -> FPGA interfaces, I find it best to constrain the output side first and get it as tight as possible. So for 80Mbps data rate, that is 12.5ns. Since it is center-aligned, TimeQuest setup analysis should show the setup relationship at 6.25ns and the hold relationship at -6.25ns. So do a pass where your external delays chew up all that margin:

set_output_delay -max 6.25 -clock [get_clocks adc_refclk_out] [get_ports sync_out]

set_output_delay -min -6.25 -clock [get_clocks adc_refclk_out] [get_ports sync_out]

With the external delays chewing up all of the margin, this will fail timing because even 1ps of skew between clock and data will show up as negative slack. So let's say after compiling your setup slack is -1.6ns and your hold slack is -1.7ns. That means your clock could come as much as 1.6ns after the clock to 1.7ns before the clock, i.e. that is the skew(ignoring the 180 degree phase-shift). Be sure to check the slacks across timing models to really get the worst. With that, modify your constraints and recompile:

set_output_delay -max 4.65 -clock [get_clocks adc_refclk_out] [get_ports sync_out]

set_output_delay -min -4.55 -clock [get_clocks adc_refclk_out] [get_ports sync_out]

You may want to round them to something that isn't down to the exact ps to meet timing, i.e.:

set_output_delay -max 4.4 -clock [get_clocks adc_refclk_out] [get_ports sync_out]

set_output_delay -min -4.4 -clock [get_clocks adc_refclk_out] [get_ports sync_out]

So now confirm you can recompile and it meets timing. What this constraint says is the data can be skewed by +/-1.85ns compared to the clock and it will still meet timing. Now that the output is constrained about as tight as it can go, you can take that number and plug it into the input side.

Note that this approach is good when timing is tight. Another approach that might work is to just give it a decent amount of room. For example, just plug in:

set_output_delay -max 4.0 -clock [get_clocks adc_refclk_out] [get_ports sync_out]

set_output_delay -min -4.0 -clock [get_clocks adc_refclk_out] [get_ports sync_out]

This is without compiling anything, just saying the data can be skewed by +/-2.25ns and still meet timing. If you compile and meet timing, great. That means the board + receiver can then add in the extra +/-4ns of skew and still meet timing. Those are pretty large windows and it might just all work and have plenty of margin.

(When both transmit and receive are FPGAs and have variable timing, it's gets more complicated because you constrain each one in relation to the other. That's why you either need to lock one side down first, or pick something in the middle that allows for a decent amount of skew on both sides)

Altera_Forum · ‎11-20-2015

Okay, quick look at the receive side. First, I'm confused by what you're doing. If you have a PLL in ssync mode, just use "derive_pll_clocks" and don't bother doing the generated clocks. (I dislike that whole vcoph clock in the 28nm PLLs and glad we got rid of it in Arria 10. It just confuses things that don't need to be confusing).

The big thing is you want to say the clock is coming in center-aligned. That can be done by either shifting the external clock 180 degrees or shifting the internal clock 180 degrees. I prefer doing it on the external one, so do:

create_clock -name {sync_data} -period 12.50 -waveform {6.25 12.5}

Then:

create_clock -name {adc_refclk} -period 12.500 -waveform {0.000 6.250} { adc_refclk_in }

Then:

set_input_delay -clock sync_data -max# [get_ports {sync_in}]

set_input_delay -clock sync_data -min# [get_ports {sync_in}]

Let's take the output case where we set them to +/-4.4ns. That means the transmitter skews it by +/-1.85ns. If the board had 0 skew, then we just plug that directly into the set_input_delay:

set_input_delay -clock sync_data -max 1.85 [get_ports {sync_in}]

set_input_delay -clock sync_data -min -1.85 [get_ports {sync_in}]

Now, I'm not accounting for board skew. I'll let you figure that out, but let's say it adds +/-0.6ns. Just increase the set_input_delays by that:

set_input_delay -clock sync_data -max 2.45 [get_ports {sync_in}]

set_input_delay -clock sync_data -min -2.45 [get_ports {sync_in}]

Be sure to run:

report_timing -setup -detail full_path -npaths 50 -to_clock adc_refclk_out -panel_name "setup: ssync out"

report_timing -hold -detail full_path -npaths 50 -to_clock adc_refclk_out -panel_name "hold: ssync out"

report_timing -setup -detail full_path -npaths 50 -from_clock sync_data -panel_name "setup: ssync in"

report_timing -hold -detail full_path -npaths 50 -from_clock sync_data -panel_name "hold: ssync in"

I like to look at the Data Path tab and see how it traces all the delays through the FPGA and uses the clock edges and external delays.

Altera_Forum · ‎11-20-2015

I'm using DDIO blocks to try and minimise any skew between the clock and data - feeding the clock and data out directly would result in big differences as the clock is on the global network and the data has to travel a fair way through the chip to it's IO pin. I basically for the clock have it drive a 0 on the rising edge (so connected to the H input) and 1 on the falling edge (L input). For the data both the H and L inputs of the DDIO block are connected together - essentially forming an SDR interface. I used to use Xilinx FPGAs and this was apparently the optimal way of doing it on those, so I'm applying the same principal here.

I had initially tried with settings of I think about -max of 3 and -min of -3 based on the various examples (and calculating with the figures I have for delays and whatnot). But this failed timing with hold violations of pretty much any number I put in the -min constraint, which is why I was getting very confused. There were no setup violations, just hold.

I was trying to get timequest to spit out information on why the path was failing using the "Report Timing..." option, but I couldn't seem to get it to tell me anything - it kept saying no paths found to each of the searches I tried.

Altera_Forum · ‎11-20-2015

I'm just waiting for a compile to finish now (takes over an hour!), but once that is done I will try your suggestions and see how I go.

Just to clarify (saw your second post after making mine), for the input side I'm not generating the clocks myself, I'm using derive_pll_clocks, I was just including that information in case it was important (the commands were extracted from what had been automatically generated).

Altera_Forum · ‎11-20-2015

Tried the report_timings commands you posted and it did actually give me something useful this time. I see two hold paths, one with a relationship of -6.25ns with 3.84ns of slack (good), but also a second path which has a relationship of 0ns and -2.66ns of slack.

I think I am right in assuming this second path (the one which is violated) is actually a false path, because it seems to relate the falling edge of the launch clock (the clock driving the DDIO block) with the rising edge of the latch clock (adc_refclk_out). So I should probably cut timing paths between the falling edge of the internal one and the rising edge of the external one given that I actually clock the same data out on both rising and falling edges of the launch clock.

Altera_Forum · ‎11-20-2015

Yes. To be honest, I don't think you need the DDIO on the datapath. It's one of those things I've seen done now and then to more carefully match clock and data, but the SDR output register is really just one of the DDR registers, and the path out is almost identical. I have done a thorough enough analysis, but have never seen a case where going to DDIO registers on data output for an SDR interface is the difference between making timing and not. (In theory, it could even make timing worse). If it is better, it's probably less than 100ps difference.

Altera_Forum · ‎11-20-2015

I appreciate your assistance, it makes much more sense now. I think I was getting confused by the false path.

Altera_Forum · ‎11-20-2015

It is about the most difficult thing you can do. (And there are a lot of cases, each one just different enough from the other that if you don't know what to look for and what you're looking at, it gets confusing). Being able to read the report_timing Data Path tab and understand it has been the most useful thing for me. I find people just plug constraints/numbers into the .sdc and look at the slack, but don't really get how those constraints are used and what's being analyzed. If you have some time:

http://www.alterawiki.com/wiki/source_synchronous_analysis_with_timequest