set_output_delay -min / -max does not have intended effect

Altera_Forum · ‎09-06-2008

Hi,

Per earlier recommendations i have switched to TimeQuest in order to correctly constrain output signals in my design. The FPGA is a Cyclone II and i'm using Quartus II 7.1.

I'm trying to achieve correct timing for a source synchronous output bus that drives an external FIFO. I have a 48 MHz clock, a 16-bit data bus and a 'write'-signal. The external FIFO requires the following setup and hold timings:

data tsu_needed = 4ns

data th_needed = 5ns

write tsu_needed = 13ns

write th_needed = 5ns

Given the source synchronous interface, i'm using the following SDC Settings (the complete SDC file is attached):

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 13 [get_ports {N_SLWR}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -4 [get_ports {N_SLWR}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[15]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[15]}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[14]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[14]}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[13]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[13]}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[12]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[12]}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[11]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[11]}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[10]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[10]}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[9]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[9]}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[8]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[8]}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[7]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[7]}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[6]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[6]}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[5]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[5]}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[4]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[4]}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[3]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[3]}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[2]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[2]}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[1]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[1]}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[0]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[0]}]

Unfortunately, the timings as measured by my Logic Analyzer attached to LAI pins on the FPGA shows that the following timings result:

data th_actual = 1.3ns

write th_actual = 2ns

Changing the -min time to something more negative does not affect the hold time a bit.

Any ideas where to start debugging this?

Thanks,

/John.

Altera_Forum · ‎09-06-2008

Very simple, the timing constraints can only achieve delays that are feasible within the given logic structure. The means to adjust the timing are very limited, mainly usage of different routing pathes and a rather small (around 0.5 ns maximum) additional output delay available with some FPGA families. I does neither cause structural changes, as moving registers from interal logic to output cells or vice versa nor adjust output drive strengths.

Above several ns delays usually can't be achieved by timing constraints without choosing an appropriate structure in advance respectively introducing structural changes manually. Another (probably better) way could be to use a phase shifted clock from a PLL.

Enabling physical synthesis may also help, but I would generally prefer a solution where the timing constraints are used to balance unavoidable routing delays rather than forcing a huge imbalance arbitrarily.

Altera_Forum · ‎09-06-2008

Just looking at the timing constraints, I don't see it as source synchronous, and think it might be set up incorrectly. Your clk_out_48 gets a create_clock assignment, which means a clock comes into it. If this is an output, create a generated_clock assignment on it, and the -source should be the PLL output name that is a generated clock(I'm assuming a PLL is driving this output). That will make a drastic change in your timing results, if that's how your circuit actually works. If everything is going into the output register, there is a fine grained output delay chain in Cyclone II(search for IOE programmable delay in the handbook to see the values), but I think they are too small to meet your delay requirement of padding the data by 5ns. The fitter might move the registers out of the IO cell to meet timing, but that's only once the constraints are correct.

Altera_Forum · ‎09-07-2008

Rysc:

I have changed my SDC but the clock to out delay does not change. I think the problem might be that the source synchronous clock is generated internally in the PLL while Quartus/TimeQuest appears to use the input clock to the PLL as reference. Since there is a path delay of 3.3ns from the input 48 MHz clock to the output 48MHz clock the timings appear to be some 3ns off (confirmed by my Logic Analyzer). The TimeQuest reports all show output delay min/max in relation to the input clock, not the output clock. This appears incorrect for a source-synchronous design?

Are PLL outputs phase locked to the PLL inputs? The reason i have a 48 MHz PLL output while the input clock is 48 MHz is that i'm generating a 48 MHz, 60 MHz and 120 MHz clock in the PLL and i want them all to be phase locked to each other. Since all clocks are phase locked i would get away with a single reset synchronization circuit for all three clocks.

--- Quote Start ---

Very simple, the timing constraints can only achieve delays that are feasible within the given logic structure. The means to adjust the timing are very limited, mainly usage of different routing pathes and a rather small (around 0.5 ns maximum) additional output delay available with some FPGA families. I does neither cause structural changes, as moving registers from interal logic to output cells or vice versa nor adjust output drive strengths.

--- Quote End ---

FvM: I have seen discussions about this in other threads (Brad, Rysc, FvM and others) and i believe the conclusion was that very large routing delays would be inserted as long as the output flops were not in the output cell. I believe an example project was created where 20 ns was correctly inserted.

My latest SDC file is attached.

Thanks,

/John

Altera_Forum · ‎09-07-2008

FYI: I have also tried using the -reference_pin option as explained in the Altera online example "TimeQuest Example: Basic Source Synchronous Output". This didn't help a bit but apparently this is because the approach suggested earlier to create a clock with create_clock for the output pin and then use it with set_output_delay should have the same effect (or none effect - as in my case...).

Note that the TimeQuest timing reports STILL use the CLK_IN_48 clock as reference. It looks like it ignores the 48 MHz PLL clock, perhaps because the PLL clock has the same frequency and phase as the input 48 clock.

I also tried to use the PLL in "source synchronous" mode but no improvement can be measured on my LA.

Any ideas how to continue?

Altera_Forum · ‎09-08-2008

You're source for the output generated clock seems to be a different name then the target of the PLL clocks. It still might work, just something I noticed. Everything will be related to the input clock, but since the data starts there and the clock going off chip starts there, they just cancel each other out. It's still correct that they start there.

(I've used reference_pin once, and though I got it to work, prefer putting a generated_clock on the output pin because it makes more sense to me and I have more control, since I've had cases where I needed to modify the clock constraint as it went off chip.)

Anyway, run the following:

report_timing -hold -detail full_path -npaths 10 -to [get_ports DATA_OUT*] -panel_name "h: data_out" -file "Data_Out_Hold.txt"

report_timing -setup -detail full_path -npaths 10 -to [get_ports DATA_OUT*] -panel_name "s: data_out" -file "Data_Out_setup.txt"

Then .zip up the two files and attach them. I'll get a much better sense of what's going on from that.

Altera_Forum · ‎09-08-2008

Hi Rysc,

I have attached the reports as well as my updated SDC file. I previously had some warnings in the Quartus build log that are now gone. I now get an interesting message in the 'warning' log which seems to indicate that a correct routing delay has been inserted:

Warning: 17 (of 9813) connections in the design require a large routing delay to achieve hold requirements. Please check the circuit's timing constraints and clocking methodology, especially multicycles and gated clocks.

Unfortunately, the DATA_OUT or N_SLWR setup and hold timings have not changed as seen on my Logic Analyzer. I'm using the LAI outputs but i would assume that the timings acquired via LAI should be the same as seen on the output pins?

Thanks,

/John.

Altera_Forum · ‎09-08-2008

LAI output timings are not related to the IO timing at all(it would be impossible to do). LAI, I believe, is purely for looking at functional issues, not timing issues. You need to scope the outputs to see what is really happening.

Looking at the first hold path, it takes 17.244ns to get the data out, and the clock gets out in 3.373ns. This is at the slow model, where this really needs to be checked at the fast model too(which TimeQuest should automatically do, you're just not doing it when re-running TimeQuest), but I am quite certain it will make timing at that model to. So I think everything is working as you would like.

To be honest, I don't like adding huge delays like this and relying so much on min models. Not that they shouldn't work, but you're relying on the fitter to do something right everytime(and it should), but you're really not taking advantage of the source synchronous interface, whose major goal is to send the clock and data with almost identical delays, and just have the clock phase shifted so it's centered on the data eye.

For example, with the data, if you just inverted the clock in your design, either by having a tap of the PLL for it that phase-shifts it 180 degrees, or invert it on the output port(if you do the latter, you might have to add a -invert to the generated clock on the output port, I'm, not sure.) Now your output clock changes directly in the middle of the data you're sending. These track nicely over PVT, and that's why source-synchronous interfaces are required for high-speed I/O. It's just a suggestion, as again, what you have now looks right.

Altera_Forum · ‎09-08-2008

--- Quote Start ---

LAI output timings are not related to the IO timing at all(it would be impossible to do). LAI, I believe, is purely for looking at functional issues, not timing issues. You need to scope the outputs to see what is really happening.

--- Quote End ---

The LAI interface is in my experience not adding any significant delay to the monitored signals. I have used it extensively in the past to monitor timing related issues and the signals that came out from the LAI pins exactly have matched the actual output pins. The LAI gives much more flexibility and ease of use when checking timing on various boards because i don't need to solder wires onto the device pins etc.

I have also previously compared the LAI/LA timing with timing information captured via the SignalTap feature and they correlate very well. To make sure, i just put the scope on the signals and the scope verifies the timing information acquired via the LAI pins and displayed on my Logic Analyzer.

--- Quote Start ---

To be honest, I don't like adding huge delays like this and relying so much on min models. Not that they shouldn't work, but you're relying on the fitter to do something right everytime(and it should), but you're really not taking advantage of the source synchronous interface, whose major goal is to send the clock and data with almost identical delays, and just have the clock phase shifted so it's centered on the data eye.

--- Quote End ---

The problem is that the receiving external FIFO i'm writing to requires a SLWR signal tsu of 13ns and a tsu on DATA_OUT of 5ns. Both signals require a th of 4-5 ns. I have not been able to meed the external hold times (now around 2-3 ns). I could use the PLL to introduce a phase shift since i'm using it for the 48 MHz signal but i would run into the same issue on both the 60 MHz and 120 MHz buses which also will need to be properly constrained and i don't have more PLL outputs available. I was hoping i could rely on the fitter to take care of all these details for me but it seems to be less than straight-forward.

Perhaps i would need to go back and start playing with the clock signals again. The last time i tried this it ended up becoming very messy which is why i wanted an alternative approach in the first place.

I also tried to double the set_output_delay (min) from -5 to -10 but i only noticed a mere 1ns improvement in th. Right now i'm rather confused. It bothers me that i'm not able to achieve the correct timing information by using the current approach. If i can't trust set_output_delay to adjust the timing i'm back to fiddling with LCELLs etc which is a major mess, really.

Thanks,

/John.

Altera_Forum · ‎09-08-2008

Also, why are all output timings related to the input clock to the PLL and not to the actual clock the data is related to (in this case the source-synchronous interface)? It seems rather insane, especially in cases where the PLL output clock is not integer multiple of the PLL input clock. It seems useless to me to receive timing information that says that the clock to output delay is 17ns when the clock is not the one i'm interested in looking at.

Altera_Forum · ‎09-08-2008

--- Quote Start ---

FvM: I have seen discussions about this in other threads (Brad, Rysc, FvM and others) and i believe the conclusion was that very large routing delays would be inserted as long as the output flops were not in the output cell. I believe an example project was created where 20 ns was correctly inserted.

--- Quote End ---

Yes, I've noticed it. It's in deed a remarkable achievement of the Quartus II fitter. But to my opinion, it's far from a meaningful way to use timing constraints. I already made a remark regarding balanced timing. I think, the ideal balanced case is a design, that achieves the intended timing when all FPGA elements show their specified typical timing. Up to moderate clock frequencies, it wouldn't need timing constraints, cause the delay skew is below the available margin.

I understand, that this approach may be inapplicable for some designs. But this doesn't dispel my doubts about unnatural timing constraints.

Altera_Forum · ‎09-08-2008

The clocks always relate back to the original input. If it doesn't the different delays can cancel out. (Let's say your clock and data were created from two different PLLs. If it ignored the delays up to the PLL, then you would have incorrect timing analysis since they would be slightly different. This is best practice, and in the case where that delay is the same for the data and clock, they both just cancel out and there is no problem. Note that it doesn't say your clock to out is 17ns, that's the information I pulled out of it(and it is adding 13ns of routing delay, so it's somewhat accurate). What TQ is telling you that when you launch data from time 0ns, it gets out there after the clock launched at time 0ns, including your -min delay requirements, which is what you want.

I ignored your SLWR signal's requirement, or at least wasn't thinking about it. You've got a 20ns clock period, and this signal chews up 13ns for setup and 5ns for hold, allowing the FPGA to try to align the clock and data with a max difference of 2ns. I hate to say it, but that's probably not going to happen. You really have too slow of a memory. You'll probably either need a faster run, or make this particular path a multicycle(i.e. when you send data on that line, you don't expect it it reach the destination for 2 clocks). I don't know the behavior of this signal to know what it's affect will be.

And probably the reason doubling your set_output_delay didn't have as great affect is that it's already doing everything it can. Like I said, it's adding 13ns of routing delay already, which easily meets timing in the slow corner, but probably doesn't meet by much in the fast corner. This is why "adding delay to meet timing" can only go so fast, because the slow and fast corners can differ by a lot. When the clock and data paths are aligned, then the paths will still vary a lot, but they vary together and everything works out nicely.

Altera_Forum · ‎09-08-2008

Thanks FvM and Rysc. Your feedback here is much appreciated. I'm still getting up to speed on FPGA best practices and your comments are invaluable.

What still confuses me is that if i don't constrain my I/O at all i'm approximately getting the same I/O timing as when i specify the SDC that was posted. It is the hold timing that is not met (around 2-3 ns regardless if the I/O is constrained or not). Since the SLWR signal requires 13 ns setup time and 4ns hold time , this is what i put into the SDC. The same goes for the DATA_OUT bus but that has much less stringent tsu.

So, basically, i just want to delay the SLWR and DATA_OUT signals a few ns (2 to 5 would do) so that the th is met. That however doesn't guarantee tsu unless it is also constrained. This brings me back to the original problem with how to set up the SDC.

I'm hoping you see the problem and can give me some hints how to properly constrain this I/O. It seems like playing with the clock signal is not the correct method when the data timing is only a couple of ns off.

Altera_Forum · ‎09-08-2008

When you say I/O timing, do you mean through the LAI interface or what TimeQuest reports. The path I looked at added a 13ns route to your output path, which is something I have never seen done before, and I am certain is due to your timing constraints. So I still don't see the problem as it looks like what is occuring is correct(and impressive, as adding delay used to be very difficult for any FPGA fitter just a few years ago).

You say adding 2-5ns would do, but you're saying it has to add 2-5ns across all timing models, which is difficult to do. My feeling is that you're trying to do something difficut(interface to a RAM that just isn't made to run at 48MHz), but it might be possible. What are the SLWR signals slack for setup slow model and hold fast model? You could attach those too, but it's a very tight window you're shooting for.

Altera_Forum · ‎09-08-2008

--- Quote Start ---

When you say I/O timing, do you mean through the LAI interface or what TimeQuest reports.

--- Quote End ---

I'm always measuring my timings with the LAI interface as well as scope to see the actual output timings. This is mostly because i am not yet comfortable trusting reports from Quartus but also because i'm not sure where to look for these numbers due to the c onfusing clock reference used. Measuring timing via LAI has been very accurate over the 2+ years i have used this method. Altera has probably made an effort to keep the LAI pin delays as small as possible to allow this.

--- Quote Start ---

The path I looked at added a 13ns route to your output path, which is something I have never seen done before, and I am certain is due to your timing constraints. So I still don't see the problem as it looks like what is occuring is correct(and impressive, as adding delay used to be very difficult for any FPGA fitter just a few years ago).

--- Quote End ---

I however wonder why 13ns is reported in the first place since the non-constrained timing (th) was only some 2-3 ns off. It seems to me a delay of 3ns should be sufficient.

--- Quote Start ---

You say adding 2-5ns would do, but you're saying it has to add 2-5ns across all timing models, which is difficult to do. My feeling is that you're trying to do something difficut(interface to a RAM that just isn't made to run at 48MHz), but it might be possible. What are the SLWR signals slack for setup slow model and hold fast model? You could attach those too, but it's a very tight window you're shooting for.

--- Quote End ---

I don't know the actual slack right now (i'm at my day job). How do i generate text-formatted reports for this? The receiving FIFO is an FX2 USB controller that is rated up to 48 MHz. It requires the timings i stated earlier per the datasheet. I have the option of sourcing the clock from the FX2 CPU, in which case the large 13ns setup requirement on SLWR will go down to around 4 ns. This unfortunately requires more changes to the design.

It looks more and more like the best route is to simply delay the 48 MHz clock so that the slack is spread evenly over tsu and th. I then should lock in the timing tsu/th with the SDC to constrain the design.

Altera_Forum · ‎09-08-2008

As Rysc metioned, the timing windows for SLWR is small. (Without roundig up the FX2 specifications as you did, I get 4.4 ns, that sounds a bit better). But I would try to use a structure, that has a low delay skew by design.

I understand, that the 48 MHz CLK output is a dedicated PLL output. If SLWR and data is sourced from an output register clocked by the same clock, you get a precise timing, but the hold time (related to PLL output) is most likely 1 or 1.5 ns too short, if CLK and SLWR use the same drive strength. Making the CLK output fast (maximum drive strength) and SLWR slow (lower drive strength), is hopefully sufficient to achieve the required timing. The about 0.5 ns output delay could be used additionally.

Altera_Forum · ‎09-09-2008

Ok, i have now found out that the LAI output DOES add a fair amount of delay compared to the actual output pins. This means Quartus, the Classic Timing Analyzer, TimeQuest as well as you has been right all along. Thanks for pointing me in this direction - i had assumed that the LAI had much less latency...

When measuring directly on the output pins i am meeting the I/O constraints with 1 ns to spare (th for the SLWR signal). The other worst-case slack ranges from 2.5 to 3.3ns. I have enabled "multi-corner analysis " in the settings dialog, what other settings must i do, if any) to ensure that my timings are correct for slow/fast devices?

Thanks,

/John.

Altera_Forum · ‎09-09-2008

If you make timing with multi-corner analysis(i.e. you should see 2 or 3 different timing analysis runs, or you can do it manually by running create_timing_netlist with different options or doing it once and changing with set_operating conditions). You may want to make your requirements a little bit worse, since your board will add a little more skew beween the data and clock, but if you have 1ns to spare, you should be fine.

Altera_Forum · ‎09-09-2008

Thanks Rysc. I changed the operating conditions to 'fast' and i now have one failed path:

Slack: -0.159

From node: ...dffs[0]

To node: N_SLWR

Launch clock: pll_clk_48

Latch clock: CLK_OUT_48

I assume this is the 1ns hold time that has been eaten up by the faster model.

How is the above best fixed? I believe i have my design optimized for speed per the design assistant's directions.

Edit: I fixed the fast model th violation by Selecting "Standard fit (highest effort)" in the QII Fitter settings. My design now passes both fast and slow timing models.

Thanks again,

/John.