Constraining external ripple clocks correctly on external pin (pad_io)

Altera_Forum · ‎09-01-2011

Regarding constraining of external clocks generated by registers.

I have now used several days trying to make correct constrains for my external clock generated by a DDR register driven by a PLL clock.

The external clock must have the the correct phase compared to the data signals to make a proper analyse and make me able to close timing as my design is tight,

because my databus is really bidir. But bidir is not the issue here, I just pretend the ssync_tx_data are my address bus to simplify the case.

So my design is the same as shown in Wiki Source_Synchronous_Timing.pdf case 3 page 15 published 30 aug 2011.

But the Timequest generate clock does NOT have the correct phase compared to the Timequest datasheet report nor compared to the timing shown in a gate level simulation?

The difference is quite big somethink like 1ns for fast and about 3nS for slow. To me it looks like the delay from the ddr_tx_clk input clock to the pin (pad_io is not included)?

However for the ssync_tx_data the delay all the way to the pad_io is included!

It is possible to extract the missing delay manually and add it as an offset to generated clock statement, but the delay differs for slow and fast so adding it manually do not reflect the real world!

The proper slow and fast delays must be included by TQ to be correct for both cases.

The constrains in the user guides are:

create_clock -period 6.25 ns -name fpga_clk [get_ports fpga_clk]

derive_pll_clocks

create_generated_clock -source [get_pins {inst1|altpll_component|auto_generated|pll1|clk[1]}] -name ssync_tx_clk_ext [get_ports {ssync_tx_clk}]

# External device delays

# setup requirement is 1.4 and hold is 0.4ns

set_output_delay -clock { ssync_tx_clk } -min [expr 0.4 + 0.150 - 0.05] [get_ports {ssync_tx_data[*] ]

set_output_delay -clock { ssync_tx_clk } -max [expr 1.4 + 0.150 - 0.05] [get_ports {ssync_tx_data[*] ] -add_delay

What I see is that the phase reported by TQ clocks and used together with my set_input_delay and set_output_delays differs from the actual real world timing (gate level simulation)!

I believe the generated clock named ssync_tx_clk_ext shows the timing on the input of the DDR register and NOT the pin=pad_io ssync_tx_clk!

For the data signals outputs timing in TQ seems to fit well with datasheet report tco and gate level timing, but not for the clock.

We need to have a generated clock that includes the ddr_tx_clk input clock to pad_io delay which differs a lot for slow and fast silicium to do a proper analyze.

Is there a way to generate the clock at the actual pin (pad_io)?

Or have I misunderstood somethink here?

Any help is appreciated

:confused:

Altera_Forum · ‎09-01-2011

Can you run the TQ_analysis.tcl file but add the following two lines and attach the output? Just want to understand what you're looking at.

-file "SSYNC_setup.txt"

-file "SSYNC_hold.txt"

I believe a timing simulation will look differently than static timing analysis. For example, in a timing sim you will have your base clock and the PLL will then phase-shift it 90 degrees. In static timing analysis, it will look like the clock is phase-shifted 90 degrees from the beginning(but will only feed destinations fed by the phase-shifted clock). In the end they're equivalent, but may look different.

Also, one important thing to note, your Th of 0.4 becomes a negative external delay. A device with a Th of 0.4 is basically saying its internal data path to the register could be 0.4ns SHORTER than the clock path to the register(and hence you must hold the data externally an extra 0.4ns). Because its shorter, that is a negative delay. So your -min value should have -0.4 as its requirement. The TI 5675A example might help(although that device's Th is 0, so negating or not is kind of a don't care).

Anyway, attch the files and I'll take a look.

Altera_Forum · ‎09-01-2011

Hi Rysc

Thanks for fast respons

TQ_analysis.tcl ?

I am afraid I do not know what TQ_analysis.tcl is. I have searched both the quartus install directory and the design and not found it

I use XP and Q 11.1

My background is HW I do both HW and some FPGA designs.

I have done ALtera FPGA design for decades, mostly timing stuff like PLL's and I/O drivers.

I am new to TQ, this is only my second desing using TQ, so I properly get somethink upside down.

About the hold time, I am not sure I understand what you are saying?

The external device has a POSITIVE hold time.

Data must be hold valid 0.4nS after the rising clock edge on external device = the FPGA output pins.

So should I define it as a positive or negative number in my set_output_delay constrain?

As I am not sure about how to make the timing constrains so they match real HW I try to compare the clock phases in TQ with the real world signals.

I compare two ways, one with TQ datasheet report Tco so see if the phase diff between the clock input pin and the clock output pin match Tco reported.

Second I compare it to the waveforms seen during gate level simulations. When they do not match I think the constrains are wrong.

My desing is a little more complex than your example, the pll input clock is the system clock and actually do DDR sampling of the databus when reading the SRAM etc.

But lets stick to basic issue here for now.

TQ Clocks reports

CLK_80M Base 12.500 80.0 MHz 0.000 6.250

phase_locked_loops:pll_inst|Clk_Sram:clk_sram_inst|altpll:altpll_component|_clk0 Generated 6.250 160.0 MHz -0.156 2.969

phase_locked_loops:pll_inst|Clk_Sram:clk_sram_inst|altpll:altpll_component|_clk2 Generated 6.250 160.0 MHz 0.781 3.906

SR_CLK_EXT Generated 6.250 160.0 MHz 6.094 9.219

While TQ Clock to output Times report

Maximum

SR_CLK CLK_80M 1.402 1.402 Rise

SR_CLK CLK_80M 1.402 1.402 Fall

Mimimum

SR_CLK CLK_80M 0.777 0.777 Rise

SR_CLK CLK_80M 0.777 0.777 Fall

The gatelevel sim shows tco's close to above.

SR_CLK_EXT wan be one period shifted in clock report: I get 6.094 - 6.250ns = -0.156 ns

But 6.094 or -0.156 nS is far from what the Tco shows, How can this produce correct results for the external device timing?

Second there is only ONE clock report in TQ, but the clocks phase of SR_CLK_EXT must be different for FAST and SLOW.

Alternative TQ must modify all the related signals delay or setup with the diff.

If the later is the case I do not know how to verify my constrains are correct.

Altera_Forum · ‎09-01-2011

TQ_analysis.tcl should be in the Case 3 project. Just launch TQ, double-click Update Netlist, and then go to the Script pull-down menu. I just want to see what timing report you are looking at.

Note that looking at Tcos is not the correct method, especially in this case. Looking at the report_timing reports shows what is actually being analyzed by Quartus. (The Tco is not a bad way to get a feel for what's going on, but it's not a constraint, just an output report).

One thing I see is that the generated clock you posted has the -source [get_pins {inst1|altpll_component|auto_generated|pll1|clk[1]}]. Yet when you run report clocks it shows clocks 0 and 2. If clk1 doesn't exist or isn't hooked up, that's a problem. I'm also a little surprised at the names, as it doesn't seem to match the syntax(i.e. mine ends in |clk[1] while your has |_clk0. Did you add -use_tan_name to derive_pll_clocks?

Altera_Forum · ‎09-01-2011

Hi Rysc

Sorry for the switch, in my first post I referred to your example because it is the same construct as mine. Your case has the drawing that make thinks easier to understand.

But I have not run your example at all. My own design has different names.

I switch to use my own design in the second post.

Yes I used: derive_pll_clocks -use_net_name

which produce the different name syntax.

My design names are:

DDR Ripple output clock: SR_CLK (TQ name = SR_CLK_EXT)

Outputs: SR_A, SR_CE, SR_WE

Bidir data SR_DQ

CLK_80M is the base clock input pin.

I have two PLL clocks, one for SR_CLK output and one for other SR_* outputs.

If two abstract I can run go through your example on monday (I am off friday) if you think that is the best way to help me?

Altera_Forum · ‎09-01-2011

Hi again

I do not understand how looking at Tco can be wrong. Until some month ago we used the old timing analyser and the datasheet report. Tco, Tsu and Th was our reference to see if timing closed on interfaces to external devices.

I have never found any think wrong in these datasheet reports and they have allways match fairly well with gate level simulations and scope measurements.

If the datasheet report do not match the real world HW, Altera will get complains from 1000's of HW designers, they simply must be correct.

Altera_Forum · ‎09-01-2011

Just run:

report_timing -setup -npaths 20 -detail full_path -to [get_ports SR_*] -panel_name "s: sr" -file "TQ_setup_sr_out.txt"

report_timing -hold -npaths 20 -detail full_path -to [get_ports SR_*] -panel_name "h: sr" -file "TQ_hold_sr_out.txt"

Then attach the two files.

What are the phases on your two PLLs? They seem strange(clk_0 has a slight negative shift, clk_1 has a +45 degree shift?)

Altera_Forum · ‎09-01-2011

Not wrong, but not what TimeQuest is using. (People use Tco for years and its worked fine). The one big thing is that Tco doesn't account for on-die variation or proper rise/fall comparison. With the proper constraints, TQ will compare on-die variation sub-models. So at the slow corner timing model, for example, there are two sub-models to account for the fact that not every delay will be pegged at the absolute worst under those conditions, i.e. on-die variation. So it has a fast and slow "sub-model". For setup analysis it will use the slow sub-model on the data path and the fast sub-model on the clock path. For hold analysis it will do the opposite. This will cut into your margin some. It also does rise/fall variation and looks at legal values. (For DDR, almost all combinations are legal, but if it were SDR, for example, it wouldn't use the clock fall time as it's being sent off chip, just the rise time since that is what is used to capture data at the far end).

It's not a huge difference from what you're doing, but I have seen in old designs where people compare Tcos and see them being off by 20ps or so, and thing that's how much skew there is. That's because both cases use the slow sub-model. In reality there will be more skew on Tcos.

(One example I always use as the issue with Tcos is to think of a 20ns clock driving an output through a register. TAN might report the Tco as being 7ns. Now let's say the user inverts the clock so it drives out on the falling edge. Is the Tco now -3ns, 17ns or 7ns? The answer is that it doesn't change, it stays at 7ns. So two different outputs, one clocked on the rising and one on the falling, whose outputs come out at very different times would have the same Tco. Another example is rather than inverting the clock, the user does a 180 degree shift on the PLL. What is that Tco? What about -180 degrees? I've seen users do almost -360 degree shifts on their PLLs because they get really good Tcos and they thing that will make a difference. With the TimeQuest way, by knowing what clock latches the data, all of this is accounted for. Sorry for the long explanation...)

Altera_Forum · ‎09-01-2011

Sure I can do this and you will see my timing is not closed at the moment.

It is great you are trying to help me, but my deisng is more complex.

I have not been able to close timing because I do not understand TQ properly and because my design is more complex. it is hard to explain but includes DDR inputs registers running on the system clock while the output runs on the double clock etc.

I might change the design but first I must understand TQ ways of doing thinks.

I pretty sure I can close timing I have done stuff like this before, without TQ.

But first I must understand how TQ can do an analyze without including the pad_io in my external clock driving the external device?

My phases are odd because the total path when reading from the external device is longer than one period etc, they may need to change, don't worry about that for now.

It would be great if you can make me understand the way TQ does it, because in my eyes it looks like it can not do it correctly.

Altera_Forum · ‎09-01-2011

TQ does include the pad_io delays. I'm guessing the generated clock on the output does not see a path to the -source. There would be a warning about incorrect latency to this I/O.

I would suggest going back to the document I posted and use Case 3. Analyze it very carefully and make sure you understand it. Also look at pages 26-28 to see the report_timing analysis as well as the case where the clocks aren't there.

One other thing about Th. Th is a spec that is opposite in sign of Tsu. Basically they're telling different things(Tsu says to get the data there early, Th says to get it late), yet they both have the same sign.

set_output_delay says how much delay is outside the device. It's -max and -min values say the same thing, just a range of what that external delay is. In trying to convert Th in terms of delays, it gets inverted. Re-read my explanation on this before and see if it makes sense. Tsu converts directly, because a device's Tsu is basically saying its data path to the input register is Tsu ns longer than the clock path(that's why you have to have the data there that much earlier.)

I'm getting busy with other stuff and out for a few days, so might not respond. Good luck.

Altera_Forum · ‎09-01-2011

Hi Rysc,

Forgive me but I have to explain my views on some of your points raised in this post.

Apologies for diverting away from original thread but I feel it is useful in any case.

--- Quote Start ---

set_output_delay says how much delay is outside the device. It's -max and -min values say the same thing, just a range of what that external delay is.

--- Quote End ---

not correct at all.

The equations say:

max = tsu + (max data delay – min clk delay)

min = - th + ( min data delay – max clk delay)

Thus the main theme here is external device requirements, not external delays. The actual external board delays are secondary terms and are given anyway in the equations and they cancel out if equal irrespective of amount of delay leaving tSU/tH as main terms.

--- Quote Start ---

In trying to convert Th in terms of delays, it gets inverted.

--- Quote End ---

purely a tool related issue. The constraints were predefined by synopsis and set_output_delay –max refers to how much offset is to be inserted by fpga relative to next latching edge while –min refers to how much offset to be inserted to data relative to current edge.

--- Quote Start ---

Tsu converts directly, because a device's Tsu is basically saying its data path to the input register is Tsu ns longer than the clock path(that's why you have to have the data there that much earlier.

--- Quote End ---

Wrong. Device’s tSU spec is only and understandably meant to be at pins. The requirements of internal registers are referred to the pins. tSU is not a delay, it is part of timing window. The register timing window will get shifted at pins if its internal data/clk paths are unequal. tSU stays as tSU in its concept whether referred to pins or directly to registers.

Altera_Forum · ‎09-01-2011

I'm trying to correlate TimeQuest's timing model to external values and make sure they make sense. In TimeQuest, the set_output_delay constraint says there is an external register being driven by the output port, that register is clocked by the -clock option, and the delay to that register is a -max and -min value. That's what it does timing analysis and I think it's easiest to understand when you think of it that way.

When TimeQuest does setup timing analysis, it needs to make sure the data gets to that register before the latch clock. For hold, it needs the data to get there after the latch clock.

So that's how it's being analyzed, but as you point out, external devices don't usually say what there delays are inside themselves. They might say they have a Tsu of 3ns. One way a device has a Tsu of 3ns is by saying internally its data path is 3ns longer than its clock path, and hence the data must be available at the ports 3ns before the clock is available. That may not be what's happening, i.e. it may be the paths are equal but there is a PLL that phase-shifts the clock forward 3ns. I don't know, but for all intents and purposes I don't care. When I increase my -max value by 3ns, I am saying externally the data path is 3ns longer than the clock path, based on TimeQuest's model. So I'm making the datasheet match the model.

You're right that the -min value is predefined by Synopsys. They could have had an option called set_output_delay -th, and the user could put it in directly. But then when absorbing the board delays, they would need to do max_clk_dly - min_data_dly, i.e. they would have to invert what they did for the max value. That's why I think Synopsyis/TQ's way is more consistent. Anything on the data path is always added and anything on the latching clock path is subtracted. For the -max value you use the larger value for the data path and smaller value for the clock path. For the -min value you do the opposite. But Synopsys could have done it a different way.

So you're right that I don't know what's going on inside the device, and the Tsu is only at the I/O ports of the device, but the description of what's going on inside the device is not wrong, as you get the same analysis if you think of it that way. I'm just trying to help visualize it rather than plug in equations, which is how I think a lot of people get into trouble.

And there is another way to think of it that you might like. When the -max value gets larger, that is telling the FPGA it needs to get its data out more quickly to meet the setup relationship. So when the -max value becomes 3ns, you're telling the FPGA to have its data available 3ns before the clock. Thinking of it that way is not implying what's going on inside the FPGA(but when you do timing analysis, you will see the data path has become 3ns longer).

I think I understand your point, but I also don't think I said anything wrong either.

Altera_Forum · ‎09-08-2011

Hi Rysc

Thanks for clearing up the hold time.

After studying your case 3 example in details I finally realized I had misunderstod the way TQ handles ripple clocks completely.

Instead of generating a clock at the flipflop output /in this case the pad IOO) it generates the clocks on the clock input of the FF. Then it takes the delay from this clock to the actual pad IO and move it into the data delay as a negative delay! Very simple when you get it, but very different from what I expected.

The naming of the generated clocks as *_ext in the docs does not make it easier to see,

I would prefer *_int, but never mind it is just names.

I think it would be a help for many people if you added a chapter in your source synchronous document that spell this out (Cut it out in Carbon as we say in Danisch).

So why is it made this way by Synopsys, I can only guess ?

The backside is that when I look at waveform signals in TQ they do not represent the actual Clock signal that drives the external chip.

However I see that it has one advance, if you include delays in clocks nets that vary much between FAST and SLOW HW, you risk the clock edge is before another clock edge in FAST while after in SLOW. This will gives difficulties to decide which clock edge is the right one to use in the analyze. By using fewer signals as clocks and stick to global nets and phase aligned PLL outputs this problem is minimized and the analyze is simpler when having fewer signals as clocks.

Altera_Forum · ‎09-08-2011

To be honest, I didn't follow the first part. The latching clock for case 3 should start at the clock coming into the FPGA, through the PLL, global clock tree, any ripple clocks on the path(which would have a generated clock assignment) and finally to the output port driving the clock out(which also has a generated clock on it). I call this final generated_clock *_ext because it is whast is used to clock the external register being driven by our output data ports.

Anyway, it sounds like you have it and our comfortable, I'm just trying to understand the confusion better to see if there's a better way to explain it(or cut it out in carbon).

Altera_Forum · ‎09-08-2011

Hi Rysc

I think it is important to explain what have confuse me and properly others when starting to use TQ.

So I will try again in another way, please ask or correct me if you think I do not understand it correctly now.

From case 3:

create_generated_clock -source [get_pins{inst1|altpll_component|auto_generated|pll1|clk[1]}] -name ssync_tx_clk_ext [get_ports {ssync_tx_clk}]

When I viewed above clock in TQ waveform I expected it to view the signal as it will appear on the HW pin ssync_tx_clk: so it should view delayed phase compared to the sys_clk_90shift.

But TQ does not do that. Instead it view the clock signal named ssync_tx_ext as it is on the outclock input of ddr_tx_clk which is in the same phase as sys_clk_90shift, actually the same phase.

So I did not understand how the analyze could be correct and tried to add the delay to the padio manually because I though TQ did not support ripple clocks properly. Finally I realize that TQ moves this clock delay into the data delay as a negative data when displayed in TQ waveform.

Altera_Forum · ‎09-09-2011

So yes, the generated clock will look identical to the sys_clk_90shift, because that's what it is based on and there is no -phase or anything. Remember that the clock launch and latch edges are "ideal", i.e. how the clocks are described in the .sdc before any place-and-route is done. If the time to get off chip was 2ns or 200ns, it wouldn't have any affect on where the launch and latch edges are.

My concerns is the last statement that the clock delay is moved into the data delay. The clock delay is still part of the latch clock path. So in the waveform view, the second dotted line labeled "clock delay" is what this time is to get off chip, and it starts from the latch clock time. I prefer looking at the Data Path tab, which has more information but is harder to visualize. If you look at the bottom Data Required Path window, it starts with thte Latch Edge, which is the same as the sys_clk_90shift clock. If you follow the location column, you'll see that clock is described as coming into the FPGA at input port fpga_clk, then goes through the PLL, global, DDR output, and finally out ssync_tx_clk. The User Guide briefly discusses this, but generated clocks should always start way back at the base clock, so it can account for any delays getting to the generated clock. Anyway, take a look at that and see if it makes sense. In the end, the Data Path tab has the Data Arrival Path, which is how long your data takes to get out, and your Data Required Path, which is how long the clock takes to get out. It also has the output delay subtracted from it. Hopefully that makes sense.