Proper method to constrain internal paths in TimeQuest?

Altera_Forum · ‎10-01-2008

I have rather extensively studied TimeQuest over the past couple of weeks and i'm working on fully constraining I/O of my design. When adding I/O constrains i have started getting internal routing delays that causes negative slack. This is the current path delay of one of the failing paths:

Info: Path# 1: Setup slack is -0.120 (VIOLATED)

Info: ===================================================================

Info: From Node : SdramFifo:inst3|dff0:inst4|lpm_ff:lpm_ff_component|dffs[0]

Info: To Node : N_SLWR

Info: Launch Clock : pll_clk_48

Info: Latch Clock : CLK_OUT_48

Info:

Info: Data Arrival Path:

Info:

Info: Total (ns) Incr (ns) Type Element

Info: ========== ========= == ==== ===================================

Info: 0.000 0.000 launch edge time

Info: 0.084 0.084 R clock network delay

Info: 0.361 0.277 uTco SdramFifo:inst3|dff0:inst4|lpm_ff:lpm_ff_component|dffs[0]

Info: 0.361 0.000 RR CELL inst3|inst4|lpm_ff_component|dffs[0]|regout

Info: 3.304 2.943 RR IC inst3|inst2|datab

Info: 3.825 0.521 RR CELL inst3|inst2|combout

Info: 8.998 5.173 RR IC N_SLWR|datain

Info: 12.225 3.227 RR CELL N_SLWR

Info:

Info: Data Required Path:

Info:

Info: Total (ns) Incr (ns) Type Element

Info: ========== ========= == ==== ===================================

Info: 20.833 20.833 latch edge time

Info: 24.205 3.372 R clock network delay

Info: 12.105 -12.100 R oExt N_SLWR

Info:

Info: Data Arrival Time : 12.225

Info: Data Required Time : 12.105

Info: Slack : -0.120 (VIOLATED)

Info: ===================================================================

Is it correct to simply add a set_max_delay SDC command between the 'SdramFifo:inst3|dff0:inst4|lpm_ff:lpm_ff_component|dffs[0]' and 'N_SLWR' nodes in the design? If so, why doesn't QII properly meet timing since it knows the data required time? If not, how do i fix the above path delay error?

Thanks,

/John.

Altera_Forum · ‎10-01-2008

- What device and speed grade? It helps to see if 3ns hops are that long

- I don't see a Location column. Is this a post-fit timing analysis? It's tough to tell if the placement is bad

- The set_max_delay shouldn't make any difference, as Quartus already knows it needs to get the data out before the latch clock hits the external register. Adding a set_max_delay doesn't provide anymore information. The two things I can think of are:

1) This node fans out to other IO or something. Do a report_timing -setup -detail full_path -npaths 200 -from SdramFifo...lpm_ff_component..|dffs[0] and see how many paths this goes to. Then highlight all the paths in the Summary view and right-click Locate Path to Chip Planner. In there you can hit on the Expand button a few times to break out the hops and IC if it helps, but basically if this feeds other stuff on the other side of the chip or something, then you've got conflicting placement requirements(it needs to feed two separate locations).

2) You have a conflicting timing requirements. Right-click on the path and do a report_timing -hold -detail full_path. Then change the set_operating_condition to the min model. Re-run the hold check and see how much it makes timing by. The fitter will add routing delays to meet hold requirements, which works directly against your setup requirement.

Altera_Forum · ‎10-01-2008

It was brought to my attention that when you run "Report Worst Case Paths", you get a report like this that doesn't have Locations. My general recommendation is to not run that command, and instead always run "Report All Summaries". This covers the different types of analysis, separates them into their logical types(setup, hold, recovery and removal) and by domain for each one. From there, you can right-click Report Timing on any particular domain and get the exact information you are looking for.

Altera_Forum · ‎10-02-2008

Hi Rysc,

This is a Cyclone II, EP2C5Q208C7.

I generated the above report by the following TCL commands (post-fit netlist):

1) create_timing_summary -setup -panel_name "Summary (Setup)"

2) Right clicked on the CLK_OUT_48 clock in the "Summary{Setup)" pane, selected "Report Timing". This results in the following TCL: report_timing -to_clock {CLK_OUT_48} -setup -npaths 10 -detail path_only -panel_name {Setup: CLK_OUT_48}

The location column is available as well as the fanout column when viewed in the "Data Arrival" section of the slack report but these columns for some reason are not visible in the console window.

I'm compiling the QII design in multicorner mode and only the slow model fails. Note that this signal drives an external FIFO that has a 12.1 tsu and 3.6 th requirements. I therefore use the below TCL constraints:

set_output_delay -max -clock [get_clocks CLK_OUT_48] 12.1 [get_ports N_SLWR]

set_output_delay -min -clock [get_clocks CLK_OUT_48] -3.6 [get_ports N_SLWR]

When using the fast model, i see a data arrival time of 5.388 and a data required time of 5.317 (slack 0.071ns). Does this mean that the fitter cannot meet both the tsu in the slow model because it only has 0.071 ns of th to spare in the fast model?

The funny thing is that the timing was met ok until i started constraining other I/O in the design. The fitter for some reason could no longer find a fit that meets timing when more I/Os are involved.

I have attached the external FIFO's timing requirements (Cypress FX2 slave FIFO link). As can be seen the SLWR signal tsu requirement is 12.1ns on a 20.82 ns period clock.

I'm not sure how to fix this so any tips are very much appreciated.

Thanks,

/John.

Altera_Forum · ‎10-02-2008

Yes, you probably can't meet that timing. Note that your clock period is ~20ns, and you're using 15.7ns of that externally(not counting any board delay. So that only gives you a ~4ns window to get data out of the FPGA.

The second constraint is that it gets out greateer than 3.6ns(i.e. min Tco) in the fast model. Taking a general guesstimate, any delay in the fast model will be twice that in the slow model. So if it just makes 3.61ns to get off chip in the fast model, it takes 7.2ns in the slow model, which just makes your setup. But it's easy for the delays to be more than 2x, especially if you're not in the fastest speed grades(which don't affect the fast model, but make the slow model slower, and the "spread" larger.)

Note that you do have board delays, so if the're 1ns, you can decrease your minimum by that, which shoudl help. The only other major thing I can think of is to modify the clock delays(skew them, make them source synchronous, etc.)

Altera_Forum · ‎10-02-2008

The interface should be source synchronous since both the clock and signal is generated inside the FPGA. The clock comes from the PLL and the timing IS met in fast/slow models unless other (unrelated) I/O is constrained (like i'm trying to do now).

Can i lock a certain routing (that works) before adding constraints for other I/O? The fact that timing *can* be met tells me that the fitter does a worse job when it has to deal with other I/O. Does the *order* constraints are listed in the SDC file matter?

Thanks,

/John.

Altera_Forum · ‎10-02-2008

--- Quote Start ---

... The clock comes from the PLL and the timing IS met in fast/slow models unless other (unrelated) I/O is constrained (like i'm trying to do now).

... The fact that timing *can* be met tells me that the fitter does a worse job when it has to deal with other I/O.

--- Quote End ---

I realize that variation in the routing is probably the entire cause of your negative slack, but have you checked the Fitter report to see whether the output delay chain is set differently for the data and/or clock when you get negative slack versus when you don't? I think the Fitter uses course timing estimates when it picks delay chain values. Setting the output delay chain manually for data and clock outputs might help.

If the delay chains are done early in fitting as I think, then that is long before the routing stage. Manually setting the delay chains might make no difference if the entire problem is bad choices by the router. One or two routing hops being different in the routing stage can have a big effect on slack.

--- Quote Start ---

Can i lock a certain routing (that works) before adding constraints for other I/O?

--- Quote End ---

Individual routing lines can be controlled in an .rcf file (routing constraints file), but you probably wouldn't want to do that. Maybe you can isolate the portion of the design with the troublesome outputs into its own design partition and use incremental compilation to preserve the routing for that partition after you get good results for it.

--- Quote Start ---

Does the *order* constraints are listed in the SDC file matter?

--- Quote End ---

The order can matter for some things. If you had a problem related to the order of constraints, you would probably have messages (in at least some cases they are warnings) saying something about an order-dependent choice made by TimeQuest.

Maybe a simple way to test whether two different SDC constraint orderings are equivalent would be to run write_sdc with each order. If the two .out.sdc files are identical, then the order doesn't matter. Use the -expand argument for write_sdc. The -expand argument might be necessary to check for the order of derive_pll_clocks versus manual clock constraints. That's an order-dependent thing for which a change was made a few versions ago.

Altera_Forum · ‎10-03-2008

Hi Brad,

Thanks for the info. I have below pasted in the 'good path' (setup/hold met). When comparing this path with the one posted yesterday (10/1/08) it can be seen that the timing in the 'good' path is slightly better. Examining the path in the chip plannel i see that the fitter has simply chosen a slightly shorter path for the 'good' fit.

Info: Path# 1: Setup slack is 0.241

Info: ===================================================================

Info: From Node : SdramFifo:inst3|dff0:inst4|lpm_ff:lpm_ff_component|dffs[0]

Info: To Node : N_SLWR

Info: Launch Clock : pll_clk_48

Info: Latch Clock : CLK_OUT_48

Info:

Info: Data Arrival Path:

Info:

Info: Total (ns) Incr (ns) Type Element

Info: ========== ========= == ==== ===================================

Info: 0.000 0.000 launch edge time

Info: 0.110 0.110 R clock network delay

Info: 0.387 0.277 uTco SdramFifo:inst3|dff0:inst4|lpm_ff:lpm_ff_component|dffs[0]

Info: 0.387 0.000 RR CELL inst3|inst4|lpm_ff_component|dffs[0]|regout

Info: 3.415 3.028 RR IC inst3|inst2|dataa

Info: 3.960 0.545 RR CELL inst3|inst2|combout

Info: 8.637 4.677 RR IC N_SLWR|datain

Info: 11.864 3.227 RR CELL N_SLWR

Info:

Info: Data Required Path:

Info:

Info: Total (ns) Incr (ns) Type Element

Info: ========== ========= == ==== ===================================

Info: 20.833 20.833 latch edge time

Info: 24.205 3.372 R clock network delay

Info: 12.105 -12.100 R oExt N_SLWR

Info:

Info: Data Arrival Time : 11.864

Info: Data Required Time : 12.105

Info: Slack : 0.241

Info: ===================================================================

Info:

I'm not sure how to proceed at this point. I know that the timing can be made if the fitter does the right thing. I rather therefore not lower the clock frequency. Are there any other tricks, optimizations etc that can be done to made the fitter 'try harder'? I tried to enable the 'extra' setting in the 'physical synchesis efforts' section in the settings dialog but i now don't meet hold timing in the fast model :(

Altera_Forum · ‎10-03-2008

I wouldn't expect physical synthesis to help on this register-to-pin path unless retiming resulted in the combinational node being moved before the register.

Your inst3|inst4|lpm_ff_component|dffs[0] register and the inst3|inst2 combinational LUT apparently are being placed quite a way from the N_SLWR device pin. Along the lines of what was said in Rysc's first post, this is likely because these nodes are fed by or also feed something else placed somewhere else, with the other connection pulling them away from the pin because the Fitter was trying to meet the timing requirement on that connection too. To see whether it would be OK to place these nodes closer to the pin without breaking another path using these nodes, place the register in the LAB adjacent to the pin or place a LogicLock region beside the pin with the register-to-pin path assigned to the region. If the register and/or combinational node feed more than one pin, place them in the center of all those pins.

Altera_Forum · ‎10-03-2008

--- Quote Start ---

Your inst3|inst4|lpm_ff_component|dffs[0] register and the inst3|inst2 combinational LUT apparently are being placed quite a way from the N_SLWR device pin. Along the lines of what was said in Rysc's first post, this is likely because these nodes are fed by or also feed something else placed somewhere else, with the other connection pulling them away from the pin because the Fitter was trying to meet the timing requirement on that connection too.

--- Quote End ---

I'm still learning but isn't the long path simply because of the 12.1ns required tsu? It seems to me like QII simply delayes the signal to line it up roughly 12ns before the output clock CLK_OUT_48 (although it slightly misses the mark).

--- Quote Start ---

To see whether it would be OK to place these nodes closer to the pin without breaking another path using these nodes, place the register in the LAB adjacent to the pin or place a LogicLock region beside the pin with the register-to-pin path assigned to the region. If the register and/or combinational node feed more than one pin, place them in the center of all those pins.

--- Quote End ---

I will study this option. But surely this would break the tsu=12.1ns and th=3.6ns requirements?

Altera_Forum · ‎10-03-2008

--- Quote Start ---

I'm still learning but isn't the long path simply because of the 12.1ns required tsu? It seems to me like QII simply delayes the signal to line it up roughly 12ns before the output clock CLK_OUT_48 (although it slightly misses the mark).

--- Quote End ---

For clock setup (FPGA input tsu, FPGA output tco corresponding to your external device tsu, and internal paths), Quartus does not delay signals. Quartus tries to minimize the data path delay.

Quartus delays signals only for clock hold (FPGA input th, FPGA output minimum tco corresponding to your external device th, and internal paths), and it does so only if "Optimize hold timing" has an appropriate setting.

Note that your 12.1 ns is subtracted from the data required path (the line with the oExt type). The bigger your external device required tsu (which means, the smaller the FPGA allowed tco), the smaller the data required time will be, and the more Quartus will need to try to make the data required path quicker.

Altera_Forum · ‎10-04-2008

I ended up lowering the clock frequency from 48 to 40 MHz and i can now meet timing.