Re: set_output_delay explained for dummies

Altera_Forum · ‎11-03-2016

Hello. I have tried to wrap my brain around the documentation describing "set_output_delay", but I just don't seem to get it.

My brain thinks like this:

"Relative to the rising-edge of my output clock, the output data needs to be valid after 1 ns and remain valid for 2 ns".

The documentation describes this:

"The maximum output delay (-max) is used for clock setup checks or recovery checks and the minimum output delay (-min) is used for clock hold checks or removal checks. "

So to me (please correct me if I'm wrong), "setup" sounds like the earliest time the data needs to be valid, and "hold" sounds like where the data is no longer required to be valid....But datasheets usually define setup as the time to the left of the clock, and hold as the time to the right of the clock. Since the valid window is to the right, one of these values should be negative?

set_output_delay -max -1

set_output_delay -min 3

am I even close?

Altera_Forum · ‎11-04-2016

You need to start with Ryan Scoville's excellent tutorial on using TimeQuest. Maybe someone can post a link for this?

Altera_Forum · ‎11-04-2016

http://www.alterawiki.com/wiki/timequest_user_guide

Altera_Forum · ‎11-04-2016

Thanks for the plug, gj_leeson. :)

http://www.alterawiki.com/wiki/timequest_user_guide

It's probably better explained in the document, but here goes a condensed version. First, I think of set_output_delay as if it were describing a circuit. So if you have:

create_clock -period 10.0 -name fpga_clk [get_ports fpga_clk]

derive_pll_clocks ;#Let's say fpga_clk drives a PLL but the output is also 10ns

create_clock -period 10.0 -name ext_clk ;# This is a virtual clock, because it's not applied to anything physical inside the design

set_output_delay -max 3.0 -clock ext_clk [get_ports {dout[*]}]

set_output_delay -min -1.0 -clock ext_clk [get_ports {dout[*]}]

The set_output_delay constraint says there is an external register who'd D port is driven by dout[*] and who's CLK port is driven by ext_clk. Before even worrying about the -max/-min values, note that we know have a reg to reg transfer, where the launch register is the output register in the IO cell and is driven by the PLL clock, and the latch register is this one you've just described in your constraint. They are both driven by 10ns clocks, so the default setup relationship is 10ns and the hold is 0ns. We can do timing analysis like any reg to reg path. The -max and -min values state what the external max and min delays are to this external register. Looking at -max 3.0 since it's easier to understand, this says there is a 3ns delay to the external register. Since we have a 10ns setup relationship, then the FPGA must get the signal out dout[*] ports by time 7ns or it will fail setup. You can think of this simple case to be like a Tco requirement of 7ns. (There are cases where Tco isn't very good, but most of them line up nicely).

I think one of the really hard things to grasp is that this is all dependent on the clock rates. So if someone asks me if "set_output_delay -max 3.0..." is a tight constraint, it all depends. If the clock period is 50ns, then yes, it's very easy. If it's 5ns, it's difficult. Also note that the -max value usually lines up with the Tsu requriement of the external device. So if the Tsu were 2.7ns, and the max board trace delay was 0.3ns, then we would do a -max of 3ns.

The -min value works similarly stating that the external delay could be as short as -1ns. Since our hold relationship between the clocks is 0ns, the data must get across the interface in 0ns. So if the external delay is -1ns, then the FPGA must be at least +1ns to meet timing. In this case, the min value usually matches up to the negative of the external devices hold relationship. So, if your external device had a hold relationship of 1.1ns, and the board trace delay could be as fast as 0.1ns, then the external delay is (-1.1 + 0.1) = -1ns.

Hope that helps.

Altera_Forum · ‎11-04-2016

Thank you very much for the explanation and I will read through that guide. I'm closing in on understanding, but I'm still not there.

One thing that confuses me about your post is "Looking at -max 3.0 ... then the FPGA must get the signal out dout[*] ports by time 7ns of it will fail setup".

In my example scenario, the external device is expecting these events at its input pins:

t=0.0, clock edge

t= (0.0,1.0) data does not need to be valid.

t = [1.0, 3.0] data must be valid.

t = (3.0,10.0) data does not need to be valid.

So when you say "get the signal out by 7ns", what does that mean? If the data isn't valid until 7ns, my external device will not work. Am I misinterpreting what you are saying?

Altera_Forum · ‎11-04-2016

The simplest way is to use the equation:

set_output_delay -max tSU

set_outpt_delay -min -tH

( minus tH)

This applies when clock and data go together with same delay

what might confuse here is that

for set_input_delay we give offset relative to launch edge

for set_ouput_delay we give offset relative to latch edge

Altera_Forum · ‎11-04-2016

Yes, in my example the data can be invalid between times 1-7ns(and is valid between times 7-11ns, where the latch edge is at 10). You need it valid between times 1-3ns. That's kind of a strange requirement, but nothing wrong with it. There are a couple ways to do this. Let's start with what you wrote, so data must be valid between 1-3ns, i.e. you're latching between 1-3ns. Then you would have:

set_output_delay -max 9

set_output_delay -min 7

So the data must get out of the FPGA and valid within 1ns. For hold, we're saying the data being launched can't corrupt the previous latch edge, which could be as high at time -7ns, so the min delay is +7ns. By having a positive 7ns delay, your Tco could be -7ns and still meet timing. Of course, that physically can't happen. I strongly recommend drawing a waveform with the launch clock, another with the latch clock, and then draw squares to show where the data is valid(e.g. -9to-7, 1to3, 11to13,etc.) and then everything in between is invalid. One quick test I do is that 9-7=2ns, so the data must be valid for at least 2ns.

So that is probably not possible to do in hardware. When you launch data at time 0ns, your 10ns latch edge is better off grabbing the valid data at times 11-13. So for that you could subtract 10ns from everything:

set_output_delay -max -1

set_output_delay -min -3

Now your data must get out later than time 3ns but no longer than time 11ns. That should be pretty easy to do. (I have no idea what your clock period is, so if it's not 10ns then how easy or not easy this is could change)

Altera_Forum · ‎11-04-2016

Lights are turning on...I'm getting closer! Hopefully this ascii art prints properly...

#        launch edge                     
#                   |                               
#                   |                               
#                   V                               
#                    ________________                ________________
#  Launch CLK ______|                |______________|                |_____________
#                   ^                               ^
#                   t=0.0                           t=10.0
#  
#                     _______________
#  Data       XXXXXXXX___V_A_L_I_D___XXXXXXXXXXXXXXXX  
#                     ^             ^
#                     t=1.0         t=3.0
# 
#                                          latch edge
#                                                   |
#                                                   |
#                                                   V
#                    ________________                ________________
#  Latch Clk _______|                |______________|                |_____________
#                   ^                               ^
#                   t=-10.0                         t=0.0
#                                                   ^
#                     _______________               |
#  Data       XXXXXXXX___V_A_L_I_D___XXXXXXXXXXXXXXX|
#                     ^             ^               |
#                     t=-9.0        t=-7.0          |
#                                                   |
#                     <--------max(+)---------------|
#                                    <----min(+)----|
# 
#     max/min arrows point to positive value convention
# 
#     set_output_delay -max 9
#     set_output_delay -min 7
# 
# 
# 
#  Alternative
# 
#        launched here                        latched here
#                   |                               |
#                   |                               |
#                   V                               V
#                    ________________                ________________
#  Latch Clk _______|                |______________|                |_____________
#                   ^                               ^
#                   t=-10.0                         t=0.0
#                                                   ^
#                                                   |   _______________
#  Data       XXXXXXXX??????????xxxxxxxxxxxxxxxxxxxx|xxx___V_A_L_I_D___xxxxxxxxxx
#                                                   |   ^             ^
#                                                   |   t=1.0         t=3.0
#                                                   |
#                                                   |---> max(-)
#                                                   |-----min(-)------>
# 
#     set_output_delay -max -1
#     set_output_delay -min -3

I'm not sure I get the difference between method 1 and method 2. They are essentially defining the same setup and hold times, just at different points. Would the 2nd version indicate that when I launch the output data, it would take an extra clock cycle before being latched compared with 1st version?

Altera_Forum · ‎11-07-2016

I guess so. Note that by default the latch edge is always after the launch edge(we could use multicycle assignments to change that, but for now let's stick with that.) So when data is launched at time 0ns, the first assignment says it will be available between 1-3ns. First off, you're probably not going to meet that requirement, so if you go with this you won't be able to close timing. It's also kind of weird to say that your data needs to be available 9ns before the latch clock edge, as that's such a large number. (I'm somewhat curious where your 1-3ns range came from, as it's kind of strange. It could be correct, but it's uncommon.)

The second set say when it launches at time 0ns that data will be latched by the 10ns clock between times 11-13ns. You are much more likely to meet timing here, but this is still somewhat strange as your data doesn't have to be valid until after the clock edge. I have seen this more often, but it's still pretty uncommon. (And is the clock period really 10ns, as I thought I made that up as an example.)

Altera_Forum · ‎11-07-2016

--- Quote Start ---

I guess so. Note that by default the latch edge is always after the launch edge(we could use multicycle assignments to change that, but for now let's stick with that.) So when data is launched at time 0ns, the first assignment says it will be available between 1-3ns. First off, you're probably not going to meet that requirement, so if you go with this you won't be able to close timing. It's also kind of weird to say that your data needs to be available 9ns before the latch clock edge, as that's such a large number. (I'm somewhat curious where your 1-3ns range came from, as it's kind of strange. It could be correct, but it's uncommon.)

The second set say when it launches at time 0ns that data will be latched by the 10ns clock between times 11-13ns. You are much more likely to meet timing here, but this is still somewhat strange as your data doesn't have to be valid until after the clock edge. I have seen this more often, but it's still pretty uncommon. (And is the clock period really 10ns, as I thought I made that up as an example.)

--- Quote End ---

In my opinion both cases are equivalent in terms of valid window.

The second case says in effect I don't mind one clock delay ad should be easier to achieve. This is same as applying mcp of 2 for setup, 0 for hold.

For io stream, we usually don't care if stream is delayed by (n) clocks as long as stream is correct.

Altera_Forum · ‎11-07-2016

The example was fake and exaggerated, just for me to get the basic understanding, +/- convention down. So now that I know how to constrain an output valid window, I want to address a different part of your post: "so if you go with this you won't be able to close timing." I thought the constraint simply defines timing relative to a clock: For example, if I generate a clock on an output port and send it to the receiving device, the timing tools will check if the output data can meet timing relative to my output clock. How I actually implement the design could be different, right? Maybe I use a PLL to generate a synchronous clock with a phase offset and use that as the output clock port, but the data is still being clocked out with a different phase. This is doable, correct? And the tools should be able to interpret my circuits and still tell me that the output pin timing is met?

I assume Altera parts have programmable delay lines to also help move data/clocks around? I'm targeting Cyclone IV BTW.

Altera_Forum · ‎11-07-2016

Yes, I'm just saying it's unlikely you'll make timing. The delay chains only make it longer. They can't make the delay negative. You can shift your launch clock back with a PLL and use multicycles to say the data is launched before 0ns, but I don't think that really makes sense.

Altera_Forum · ‎11-08-2016

Isn't pushing the output clock forward time equivalent to pushing the data backwards in time? Why wouldn't that make sense?

I thought I had it, but I think I'm back to being confused. I guess I'm still confused as to why anything needs to be pushed negative in time. If the clock is output a time zero, and the data needs to arrive at t=1.0ns, wouldn't the proper skewing be that I need to add positive delay to the data and no delay to the clock?

Altera_Forum · ‎11-08-2016

The ideal clock occurs at both the FPGA and the external device at times 0ns, 10ns, etc. If the FPGA needs to get its data out within 1ns, then making the clock delay to the external device longer would help with this setup, but the clock delay to the external device is outside the FPGA, i.e. the FPGA fit can't affect that. The only things the FPGA can affect are the clock coming into the FPGA to the output register(launch clock path) and the delay from the register to the output port(data path). These two things make up the Data Arrival Path, and we essentially want them to be less than 1ns.

I recommend not spending a lot of time on hypotheticals and instead figure out your specific timing case. There are all sorts of ways you can confuse yourself with hypotheticals that are not worth the time.

Altera_Forum · ‎11-08-2016

Ah okay. Your post has made me realize that I was thinking the data would arrive too early (and maybe/maybe-not going invalid before 3.0 ns) while I think you were implying it would probably arrive too late to meet the 1.0 ns requirement.

In super general concepts: The FPGA data output is going to have its own output-valid window relative to that register's clock. I need to do whatever it takes to make sure the output valid window overlaps the downstream's devices input valid window (delay either clock or data, as appropriate). I should account for PCB delays and length-mismatches as well.