Re: set_input_delay And set_output_delay .SDC Constraints

Altera_Forum · ‎03-29-2017

Hello! I have little familiarity with the set_input_delay and set_output_delay SDC and was wondering whether the following commands would correctly constrain the inputs as per the requirements in the attached figure..

https://alteraforum.com/forum/attachment.php?attachmentid=13431&stc=1

what the diagram shows : I have an 8-bit port on a Video Decoder which outputs data to the FPGA.The clock used in the FPGA is output from the

same device.The device is configured to change its data on the falling edge of the clock.The device opertes on a 27 Mhz clock i.e a period of 37.037 ns.The datasheet lists T5 as Output Hold Time and gives a minimum Value of 10ns.The datasheet lists T6 as Output Delay Time and gives a maximum Value of 25ns.So, I conclude that with respect to the falling edge of the clock, the data begins to change 10ns the after falling edge and regains stability 15ns later.Is this what the diagram depicts?If so, I think the constrains that specify this should be :

create_clock -period 37.037 -name clkvin;# virtual clock for input constraintset_input_delay -clock clkvin -max 25  -clock_fallset_input_delay -clock clkvin -min 10  -clock_fall

Setting the input delay constraints like this ends up giving the following failing constraints in the Timing Analyzer.

https://alteraforum.com/forum/attachment.php?attachmentid=13432&stc=1

CH2-BITEC-IN are the physical input pins on the HSMC connector.The data on these pins is registered with the clock CH2-BITEC-CLK like so :

always @ (posedge CH2_BITEC_CLK) begin input_data <= CH2_BITEC_IN; end

Have I deduced the requirements correctly from the diagram? If so, then why does the design fail and how can I make it work?

Thank you!

Altera_Forum · ‎03-29-2017

UPDATE : When I remove the -clock_fall constraint, which I included since the data was changing on the negative edge of the clock, the failing constraints go away.With the following constraints, I get no errors.

set_input_delay -clock clkvin -max 25 
set_input_delay -clock clkvin -min 10

The only change I've made is the removal of the -clock_fall

Altera_Forum · ‎03-29-2017

you need to set offsets as they are:

min = 15 &max = 16 and both from falling edge. It should be no problem to pass

How did get those figures of 25/10??

I needed glasses.min = t5 &max = t6 yes. If it fails then you need to try PLL or use same edge. Also you have assumed board delays to be equal for clk/data

Altera_Forum · ‎03-29-2017

Thank you for the prompt feedback and I apologize for the poor quality of the picture and any other shortcomings!

Just to be certain, I wrote 't5' and 't6' on the left side of the diagram and not 15 and 16.Again, My bad.Now, as per the diagram attached the data sheet lists the values for 't5' and 't6' labelled in the diagram as "10ns" and "25ns" . My reasoning behind arriving at these values was that 10ns after the falling clock edge, the data begins to change and 15 ns later (25 ns after falling edge),new data is available, hence the min and max values, Please correct me if I'm wrong,

Secondly, when I remove the -clock fall constraint, the analyzer reports no warnings.Why is this so?

If it should still be 15 and 16 then, I am hopelessly lost. :( . Please have a look at the attached diagrams.

https://alteraforum.com/forum/attachment.php?attachmentid=13434&stc=1 https://alteraforum.com/forum/attachment.php?attachmentid=13435&stc=1

Altera_Forum · ‎03-29-2017

On a side note, Here's what I think setting the output_delay -max attribute does : It informs the synthesizer that the input register, the one with its "D" pins connected directly to the input pins on the FPGA can not be sampled at the positive edge of the same clock that is used between two internal registers on the FPGA. Consequently, when I constrain the output_delay -max to be 25 ns, it leaves out 12 ns (in a 37ns) for the input pins data to appear at the output of the register and then be used for any combinational logic(like comparators e.t.c) before the final result.When the constraints fail, this simply means that the remaining time isn't sufficient for the subsequent processing? If so, then

What is the purpose of the minimum delay?

Why did the design pass when I removed the -fall attribute?

What can be done to assure maximum data integrity?

Altera_Forum · ‎03-29-2017

you are right about the meaning of these commands. You inform timequest what offset is coming of data relative to its clock edge launch.

so you are directly describing what the diagram is implying.

Then you are sampling data on the opposite edge. The fitter seems unable to manage that. By mis-informing the tool that it is relative to rise edge it managed that. You must enter the figures relative to falling edge i.e. 10/25. Or if you wish modify them to become relative to rise edge by adding 37/2 = 28.5/(43.5-37) i.e. 6.5/28.5 and so min/max offset also reversed.

It is obvious that the tool is happier if your figures of 10/25 were relative to rise edge but they are not. so either you sample on falling edge at fpga or use pll to invert the clock so you sample it on rising edge. The tool will be aware of that and should pass

Altera_Forum · ‎03-29-2017

also make sure you have set registers to be io registers, not fabric (in project settings)

Altera_Forum · ‎03-30-2017

Thank you so much!I am especially thankful for the last reply and configuring the IO registers as this was something I didn't even care to ask(and didn't know).

The device is configured via I2C to output data on either rising or falling edge.Initially, I thought that outputting data on the negative edge and sampling it on the positive edge would leave out one half clock cycle for set up time, but this seems to have been wrong.I can configure the device to output data on the rising edge and set min_delay to 10 and max_delay to 25.This should do the trick?

Now, I don't have values for board delay or clock skew.What do you recommend I do to incorporate this uncertainty as well?

Let's assume that board delay = b ns and that clock skew = +-c ns.

Then,

input_delay_min = 10ns + b - c.

input_delay_max = 25ns + b + c.

If the above equations are right, what values do I select for b or c? I apologize if this is something only I can know because I don't have any information about these parameters.

Altera_Forum · ‎03-30-2017

yes launching external device data on rising edge looks better in your case.

data and clock board delays are unique to your design. if both are equal, same material and thickness then you can ignore it. After all such delays are in picoseconds and if you are not sure you may just add extra margin on your sdc figures to further limit the sampling window e.g. 9/26 instead of 10/25

The theory goes too far to max/min of board delays but this only practically is worth checking in very fast paths.

Altera_Forum · ‎03-30-2017

Thank you. This thread will prove to be extremely helpful when I get to the implementation part of my design!

Altera_Forum · ‎04-01-2017

Hello again!

I was wondering that since my design depends on detecting a pattern initiated when the external device outputs an 8'hFF, if I would not be better off using an additional register (in addition to the IO Register). I figure that if I use a comparator on the IO Register, it would only have 37ns - (26ns + TCO of IO Register) before the next positive clock edge, but if I use one extra register, it would give me nicely aligned data and a complete 37 ns while only increasing the latency by 1. Similarly, when outputting to an external device, one could add a simple register before the Final Fast Output register. In short, is it not better to have something like the following?

For Input :

Input Pins ---------------> IO Register ---------------> Register----------->Design

For Output :

Design------------> Register--------------> IO Register------------> Output Pins

I realize that all this may sound extremely repetitive, but it would go a long way in deepening my understanding if you confirm this.

P.S. (I just have a hard time wrapping my head around how the IO registers get sampled after the TCOmax of the external device).

Thank you!

Altera_Forum · ‎04-01-2017

--- Quote Start ---

Hello again!

I was wondering that since my design depends on detecting a pattern initiated when the external device outputs an 8'hFF, if I would not be better off using an additional register (in addition to the IO Register). I figure that if I use a comparator on the IO Register, it would only have 37ns - (26ns + TCO of IO Register) before the next positive clock edge, but if I use one extra register, it would give me nicely aligned data and a complete 37 ns while only increasing the latency by 1. Similarly, when outputting to an external device, one could add a simple register before the Final Fast Output register. In short, is it not better to have something like the following?

For Input :

Input Pins ---------------> IO Register ---------------> Register----------->Design

For Output :

Design------------> Register--------------> IO Register------------> Output Pins

I realize that all this may sound extremely repetitive, but it would go a long way in deepening my understanding if you confirm this.

P.S. (I just have a hard time wrapping my head around how the IO registers get sampled after the TCOmax of the external device).

Thank you!

--- Quote End ---

you are a bit off track on fpga timing strategy.

The io timing is your responsibility as a user. Beyond that inside fpga the tool takes responsibility of any register chain timing.

Your design will eventually needs register-comb-register chains and that will be managed by tool (and easily for 27MHz).

You are free to use such chains as required for your tasks. The depth of comb section is better kept short so that you achieve timing readily.

Altera_Forum · ‎04-01-2017

I understand the register------>comb------>register part.But, kindly answer me this :

1) Let's say that at t = 0, we have a rising edge on the clock.

2) The external device outputs data at 26ns (TCO max).

3) Question :

Will the IO register(the one that has to read the value at the input pins) latch this value at t >= 26 ns?

If so, this value becomes available in the FPGA at t >= 26ns or (26/37) of the clk period? Would it not be better to add another register(cascade/series) that stores the value at the next rising edge? Doing so would use up one clock cycle though.

This scheme is only for the input/output from the FPGA.

Thank you!

Altera_Forum · ‎04-01-2017

--- Quote Start ---

Will the IO register(the one that has to read the value at the input pins) latch this value at t >= 26 ns?

--- Quote End ---

That is what input path timing pass/fail will tell you. If it passes or fails a second register after io is not going to do any good because that path is a new path that the tool takes care of. So you don't need to visualise what happens at that path, leave it to the tool.if any path in your entire design fails the tool will tell you.

--- Quote Start ---

This scheme is only for the input/output from the FPGA.

--- Quote End ---

Such scheme is not needed specifically for io. each io needs just one register. a second register can be added (back to back) but is not going to do much for timing. It is not the user responsibility to look after delay figures for internal paths. (only exceptionally one might do chip planning for some difficult paths).

back to back registers have very short routing, pass timing, do nothing apart from latency matching or random effect on overall routing.

Altera_Forum · ‎04-01-2017

Thank you for your patience!

I think I have something fundamentally wrong.

Let's assume that there are two registers.One is the Input Register, and the other is a simple register somewhere in the design.

Now, when I set the output_delay to 26 ns, I imagine that the Input Register and the other register get clocked differently.

To me, the input register gets its clock pulse at 26, 26 + 37 , 26 + 2*37 ,..........., 26 + n*37......

And the other register is clocked at 37, 37 + 37, 37 + 37.

So, when I think like this, I imagine that whatever combinational logic is inserted between the register that gets clocked at 26, 26 + 37....(Input Register)

and the other register which gets clocked at (37, 37 + 37), has only 37 - 26 = 11 ns to settle.If instead we used back to back registers to just crossover from the input domain to the simple register domain, we'd have the complete 37ns between clock pulses?

Altera_Forum · ‎04-02-2017

--- Quote Start ---

Thank you for your patience!

I think I have something fundamentally wrong.

Let's assume that there are two registers.One is the Input Register, and the other is a simple register somewhere in the design.

Now, when I set the output_delay to 26 ns, I imagine that the Input Register and the other register get clocked differently.

To me, the input register gets its clock pulse at 26, 26 + 37 , 26 + 2*37 ,..........., 26 + n*37......

And the other register is clocked at 37, 37 + 37, 37 + 37.

So, when I think like this, I imagine that whatever combinational logic is inserted between the register that gets clocked at 26, 26 + 37....(Input Register)

and the other register which gets clocked at (37, 37 + 37), has only 37 - 26 = 11 ns to settle.If instead we used back to back registers to just crossover from the input domain to the simple register domain, we'd have the complete 37ns between clock pulses?

--- Quote End ---

Each register has its own tCO. In your case the external device has tCO max of 26ns and assuming it stays so at fpga pins.

The fpga registers have their own tCO and a fast io register is expected to have its tCO short enough (that is why we say fast io register).

The clock period is 37 ns for all paths clocked by that clock but the window available for sampling is much less on any path because it starts from max tCO moment and ends at tSU restrictions.

The tCO at any register should be as short as possible without hitting back on tH of previous latch edge, and shouldn't be as long as to hit tSU of next latch edge.

The fitter ensures tCO doesn't hit tH as clock is designed always faster than data(this is done at silicon level i.e. fast clock networks).

The tSU will be safe from violation as long as the clock is not too fast or data delay is not too much.

Long combinatorial sections mean more data delay and may violate tSU so either break it up into thinner sections or lower the clock rate.

In short "enter input constraints and check timing of rtl chains in your design and if you run into trouble then you start to help the tool" by pipelining ...etc.

Altera_Forum · ‎04-02-2017

I found this guide by Ryan Scotville and it seems to have done the trick.With your posts here and that guide, I think I now have an understanding of how this works.I realize that the only requirement on the input register is that its setup time should be less than the TCO max of the device and that its hold time must be less than the TCO min. However, I have one final problem that I've run into.While trying to constrain the output side, (which too is at 27Mhz, I seem to have a rather strange result). Please view the attached diagrams. The setup time -minimum value is 0.5 ns.The hold time -minimum value is 0.5 ns, but the Hold Time is described wrt to the falling edge of the clock(Data at output gets latched on the rising edge of the clock).I tried to implement this constraint by setting the minimum output delay to (18.5ns + 0.5ns), but the constraints fail.I have seen you post elsewhere that the min delay should be set to -tH, but that seems to be the case when tS and tH are wrt to the same edge.

Is this the correct way to constraint what is given in the diagram?

https://alteraforum.com/forum/attachment.php?attachmentid=13451&stc=1

https://alteraforum.com/forum/attachment.php?attachmentid=13452&stc=1

Altera_Forum · ‎04-02-2017

--- Quote Start ---

I realize that the only requirement on the input register is that its setup time should be less than the TCO max of the device and that its hold time must be less than the TCO min. However,

--- Quote End ---

The wording is not right as tCO is relative to launch edge while tSU or tH are relative to latch edge. But I get what you think. The early margin of invalid window must be further from tH point and late margin must be before tSU point.

--- Quote Start ---

I have one final problem that I've run into.While trying to constrain the output side, (which too is at 27Mhz, I seem to have a rather strange result). Please view the attached diagrams. The setup time -minimum value is 0.5 ns.The hold time -minimum value is 0.5 ns, but the Hold Time is described wrt to the falling edge of the clock(Data at output gets latched on the rising edge of the clock).I tried to implement this constraint by setting the minimum output delay to (18.5ns + 0.5ns), but the constraints fail.I have seen you post elsewhere that the min delay should be set to -tH, but that seems to be the case when tS and tH are wrt to the same edge.

Is this the correct way to constraint what is given in the diagram?

--- Quote End ---

You are showing two clocks; clock and its inversion and then you state that it is relative to falling edge.

if your fpga launches on rising edge then you need to adjust figures or launch on falling edge to make life a bit easier or use falling_clock if available in command

to adjust the figures I will assume that with respect to rising edge of device tSU = 18.5 - 0.5 = 18 & tH = 18.5 + 0.5 = 19

Then comes the entry of sdc commands. I feel both are negative now as they are behind presumed rising latch edge (or could be made positive relative to next latch) but I am not sure and you need to see what the tool does and how it interprets your entries, look at results such as output registers tCO(frm datasheet) or io path waveforms.

Whether it fails timing or not is a different issue. The starting point is enter the constraints correctly.

Altera_Forum · ‎04-02-2017

I think I might know what's happening here.The two clocks are shown because of the I2C configurable latching(rising edge or falling edge for single latching mechanism).The diagram here shows dual latching (on both rising and falling edges), it basically latches twice per clock cycle, which is why the tS and tH are labelled as they are.I think the diagram just says that in the 1 ns around any edge(0.5 ns for setup and 0.5 ns for hold), the data should remain constant if the dual latching mechanism is used.

And yet again thank you for your patience and help! :)

Altera_Forum · ‎04-02-2017

But even if this is right, it still doesn't explain why timing fails if I increase the output_delay_min beyond -3ns(like -4,-5 e.t.c).

Altera_Forum · ‎04-02-2017

--- Quote Start ---

But even if this is right, it still doesn't explain why timing fails if I increase the output_delay_min beyond -3ns(like -4,-5 e.t.c).

--- Quote End ---

sorry but can you explain what you mean by "if this is right".