Re: setup and hold time with set_input_delay

Altera_Forum · ‎03-10-2009

Hello,

I have some problem understanding the set_input_delay min and max

constraint.

Assume that you have an interface that is connected to an FPGA.

This interface has a clock (Clk) and a databus (DB).

The datavalid window is centered around the rising edge of Clk. Assume

a setup time of 1ns and a hold time of 2 ns. The clk period is 10 ns.

How should I constrain this with the set_input_delay command?

thanks for helping me,

Altera_Forum · ‎03-10-2009

1) Create a constraint on the clock coming in. Something like the following(I'm typing all constraints from memory, so syntax may be off):

create_clock -name fpga_clk -period 10.0 [get_ports clk_in]

2) Create an external clock. (Users should always create a virtual external clock for I/O interfaces. This isn't emphasized enough in the documentation.) Since your clock coming in is phase-shifted, you can phase-shift it externally:

create_clock -name ext_clk -period 10.0 -waveform {5.0 10.0}

This basically phase-shifts the clock 180 degrees. Note that there are variations on this. You could have the external clock not be phase-shifted and phase-shift the fpga_clk. The analysis will end up being the same.

3) Add I/O constraints with 0.0ns delays, just as a place holder.

set_input_delay -clock ext_clk -max 0.0 [get_ports din*]

set_input_delay -clock ext_clk -min 0.0 [get_ports din*]

4) Undestand the setup and hold realtionship between ext_clk and fpga_clk. You can run TimeQuest and do a report_timing -setup and -hold between these two clocks. But just drawing the waveforms, it's pretty obvious the requirements are a 5ns setup time and a -5ns hold requirement.

5) Change the delay values to match your external delays. I don't follow your descriptions of "Assume a setup time of 1ns and hold time of 2ns". How much could the data come out before the clock, including board delays. If that's 4ns, then your -max delay should be 4ns and since the setup requirement is 5ns, TimeQuest is left trying to meet a 1ns setup time. If you're saying it could be delayed by as much as 1ns, then have that be the -max delay.

For the hold, if the data may leave by as much as 2ns after the clock, the the -min delay is -2ns. Since the hold requirement is -5ns, then only if the dta is held by another -3ns will it fail timing.

The language of setup and hold always seems confusing, especially the sign. Just start putting in numbers, rerunning TimeQuest and analyzing the setup and hold on the input paths. Look at the waveform to see if the analysis it is doing is what you want.

Altera_Forum · ‎03-11-2009

Ok many thanks Rysc for this useful explanation! Timequest documentation is not really easy to understand...

Yet I have a question about this:

1) I do not understand why there is no relation between fpga_clk and the virtual ext_clk ? fpga_clk could not be created from ext_pll using create_generated_clock ?...

(moreover there is a Quartus warning about this:" Warning: From ext_clk (Rise) to fpga_clk (Rise) (setup and hold) "

2)I don't know why "Report Fmax summary" is not available In Timequest report? It says "No paths to report", yet there is a path!

Altera_Forum · ‎03-11-2009

1) The assumption with generated_clocks is that there is a physical connection in the FPGA between the base clock and the generated clock, which TimeQuest then uses to calculate latency(delay). Note that clocks are related by default, so fpga_clk and ext_clk do have a relationship based on their edges.

2) I dislike Report Fmax and strongly recommend using slack as your measuring stick. Fmax works nicely when there is one clock domain and the transfers are all rising to rising, but can fall apart under more complex scenarios. Fmax calculations ignore transfers between different domains, because in many cases they don't work.

Altera_Forum · ‎05-20-2009

Hi,

I am currently having an almost identical problem in a design.

I hope its ok to add to this post and use it as a reference.

I do not fully understand part of the first response by Rysc.

--- Quote Start ---

5) Change the delay values to match your external delays. I don't follow your descriptions of "Assume a setup time of 1ns and hold time of 2ns". How much could the data come out before the clock, including board delays. If that's 4ns, then your -max delay should be 4ns and since the setup requirement is 5ns, TimeQuest is left trying to meet a 1ns setup time. If you're saying it could be delayed by as much as 1ns, then have that be the -max delay.

For the hold, if the data may leave by as much as 2ns after the clock, the the -min delay is -2ns. Since the hold requirement is -5ns, then only if the dta is held by another -3ns will it fail timing.

--- Quote End ---

I understood before that the set_input_delay constraint was to account for board delays on the tracks? On my particular interface the rising edge of the clock and the data leaving the external device at the same moment.

If in the example described by Rysc the max constraint is 4ns and the min is -2ns, basically Rysc is saying that the data could arrive to the FPGA ports 4ns after or 2ns before the clock. I do not understand what effect this constraint has. What is the advantage of telling this to Timequest? I don´t really have a clear understand of this constraint.

Also in the previous example it is claimed that it is obvious that the setup time is 5ns and the hold time is -5ns. Again this is probably the dumbest question but can someone explain to me why that is the case.

Fianlly after reading the timequest chapter in the Quartus handbook I do not understand the concept of virtual clocks. I would be very grateful if someone could explain to me in simpler terms why it is advisable to create one.

I am a beginner to timequest and have been studying the documenation over the last few days. So far I am having trouble making sense of it. Hopefully if i can get a push in the right direction and get an understanding of the basics then it will start to come together.

Many thanks for your help.

Altera_Forum · ‎05-20-2009

I'm not following the whole post. First off, don't spend forever reading the documentation. When you have a few constraints in, do something like the following, where you put in the portname you want to analyzer:

report_timing -setup -from [get_ports your_portname*] -detail full_path -panel_name "Inputs setup"

report_timing -hold -from [get_ports your_portname*] -detail full_path -panel_name "Inputs hold"

At that point, look at the Data Path tab in excrutiating detail and figure out what it's calculating. Your clock relationship will be reflected in the launch and latch times, the external delays will just be a line item, and everything else should be either clock or data delay in the FPGA.

Yours is a source synchronous output with clock and data edge aligned. Is it single-data rate or double-data rate? How are you shifting your clock to center it's edge on the data? (For single data rate you can just invert it at the capture register, or do a 180 degree phase shift in a PLL. If it's double-data rate, you'll want to use a PLL and phase-shift it 90 degree. Also, the PLL should be in source-synchronous mode.)

Altera_Forum · ‎05-21-2009

Thanks for the response Rysc.

Sorry for the vaugness of my post.

I am truly lost when it come to timequest. Previously I have been using the classic timing analyzer and only placing constraints on a design when they were nessesery. So if the FPGA gave my the timing by default I wouldn´t do anything. Anyway, I´m now working on a new design where another team will do another part of the design so I want to learn timequest and learn how to fully constrain a design as it´s about time I learned and also when another team are involved its probably vital that my part of the design is constrained.

In the classic timing analyzer I seem to remember that Quartus would present me with the setup times for the worst case paths and I also had the option to place a constraint for the setup time.

In Timequest I can only view the setup time using the -report_timing command when a constraint is already in place?

The interface I am currently trying to constain is a parallel interface between the PowerPC 440GX and the FPGA. The Power PC is bus master and the interface consists of a 32 bits data bus, 32 bit address bus various control lines (CS, Write enable, output enable, parity). All lines change in a selectable sequence but the changes on the input signals always occur on the rising edge of the clock leaving the PPC.

So in an attempt to learn timequest, I have been attempting to constain this interface, hopefully by doing this I can understand the basics and therefore have less problems constraining the remainder of the design.

So one contraint nessesery is the set_input_delay constaint?

What I understand from reading the documentation is that this constraint specifies the data arrival time with respect to the clock.

In order to intelligently place the values in this constaint, I understand that I need to obtain the delay values on the PCB tracks in order to know if the data or clock are likely to arrive at different times?

If for example I see that the track for the clock is slightly longer than for the data and that the clock will arrive lets say 2ns before the data then I should place a constraint as follows for each port:

set_input_delay -add_delay -min -clock [get_clocks {PPC_GLUE_PER_CLK}] -2.000 [get_ports {PPC_GLUE_PER_DATA[0]}]

Please let me know if I am way of the mark with the above statement.

So what confused me about Rysc´s reponse to the intial post was that he gave by way of example placing a constraint of 4ns for the max and -2ns for the min. Is this basically saying that there can be a difference of 6 ns between the arrival of the clock and the data? and that timequest must deal with that and meet the setup and hold times for them?

In my case, all inputs on this particular interface come from the Power PC and once the intial timing values are calculated then they should not change. i.e. If by analyzing the track delays I see that the clock will arive 2ns before the data bits and 1 ns before the address bits, 3 ns before CS etc, then I should put only the set_input_delay max constraint, which would be valid for both the max and the min?

By placing these constraints, it allows timequest to meet setup and hold times, which the user cannot constrain directly? Is this true to say.

I really appreciate the help you guys give, hopefully a push in the right direction will help to to understand the basics

Many thanks for the help.

EDIT:

Perhaps I should add that the frequency of the clock is 66 MHz.

I previously implemented this interface without constraining it on a different design. I used the falling edge of the clock in my design, while the PPC asserted all its signals of the rising edge. The FPGA gave the the timing I required and in the end i didnt constrain it.

Altera_Forum · ‎05-21-2009

So your clock coming into the FPGA(from the PPC) is:

create_clock -period 15.0 -name ppc_clk [get_ports ppc_clk]

(Naturally the port name will be different, you can give it whatever name you want, and the period might not be correct, as I wasn't sure if it was 6.6666)

The external clock is:

create_clock -period 15.0 -name ppc_ext

So we have an external clock with a period of 15.0ns and the FPGA clock with the same. Now, since you're inverting the clock in the FPGA(and TimeQuest will automatically know that), it will know you have rising edges at 7.5, 22.5, etc. So when the source clock ppc_ext has rising edges at 0, 15, etc., and the destination is inverted, you end up with a setup relationship of 7.5ns and a hold relationship of -7.5ns. If that doesn't make sense, draw the clocks out, and you'll see that when ppc_ext launches data, it's a setup failure if the data delay is greater than 7.5ns, and a hold failure if it's less than -7.5ns.

So everything looks good, and you probably have tons of margin. But you haven't accounted for any differences between your clock and data coming into the FPGA. There are a few ways to do this, but when it's source-synchronous, I like to think of set_input_delay as showing slack. So let's say the worst case scenario is the data leaves the PPC 1ns after the clock, and it has a 500ps longer trace delay(so in essence, it hits the port of the FPGA 1.5ns after the clock hits the port.) That would be:

set_input_delay -clock ppc_ext -max 1.5 [get_ports PPC_DATA*]

Since our setup relationship was 7.5ns, Quartus/TQ knows you will have a setup failure only if the data delay inside the FPGA is 6ns longer than the clock delay. (Remember it's cumulative. The clock relationship gave us 7.5ns to work with, 1.5ns of that was chewed up outside of the FPGA, and there will be a setup failure if the FPGA chew up the final 6ns.)

Does that make sense? Try doing the hold timing and what your thoughts are for that.

Altera_Forum · ‎05-21-2009

Hi Rysc, thank you so much for your reply. Things are starting to become much clearer now.

Just to be doubly sure about one thing..

--- Quote Start ---

So your clock coming into the FPGA(from the PPC) is:

create_clock -period 15.0 -name ppc_clk [get_ports ppc_clk]

(Naturally the port name will be different, you can give it whatever name you want, and the period might not be correct, as I wasn't sure if it was 6.6666)

The external clock is:

create_clock -period 15.0 -name ppc_ext

--- Quote End ---

The clock you refer to as external clock is a 'Virtual Clock'? which is 'simulating' the clock on the Power PC driving the data lines etc to the FPGA?

I placed the set_input_delay constraint with max delay 1.5ns and was able to analyze the resulting setup times using the report_timing command. It makes perfect sense and thank you for the explanation. There is however one thing I am not sure about. When analyzing the setup slack and viewing the data path tab, in the data_required_section the final increment is Tsu. My understanding is that this time should be subtracted from the total but it seems that it is added to the total time thus increasing the Data Required Path. Below is a snapshot of the last two stages.

10.358 0.558 FR CELL 1 FF_X34_Y1_N7 RESET_CONTROL_LOGIC:RESET_CONTROL_LOGIC_Inst|Reset_Ctrl_Reg[8]

10.376 0.018 uTsu 1 FF_X34_Y1_N7 RESET_CONTROL_LOGIC:RESET_CONTROL_LOGIC_Inst|Reset_Ctrl_Reg[8]

I placed another set_input_delay for the hold time.

set_input_delay -add_delay -min -clock [get_clocks {PPC_Virtual}] -1.500 [get_ports {PPC_GLUE_PER_DATA[0]}]

I am able to view the individual paths and it makes sense.

For the hold time constraint, I used the same value, but for example if from analyzing the trace delays on the board, the clock will always arrive 500ps before the data pins, would it always be nessesery to put a min constraint in place (and vice versa)? Or is it good practise to always leave a bit of a 'buffer' just to be sure that things aren´t too tight?

One final thing. So when I put in place the set_input_delay max and min contraints, does quartus use these only to produce the setup and hold time details, or does it take these and actively try to meet setup and hold time on each signal specified. Will the fitter make extra effort if it sees that it cannot make a setup time based on the set_input_delay constraint placed by the designer?

I am going to continue to place these contraints for each signal on this particular interface. Just to get an idea what other constraints would be required before I can say that the interface is completely constrained?

Many thanks for your help so far, things are starting to become clearer (but please put me back in my box if anything I said above is untrue!)

Reagrds,

Ardni

Altera_Forum · ‎05-21-2009

Yes, I would say the uTsu is generally subtracted from the Data Required. That being said, I've seen cases where the models show stuff that's not really "real". For example, they may model some of the clock delay at the flip-flop, to the point that the uTsu looks positive. I've seen a few things that don't work when dissected, but for the full-analysis are correct, and imagine this to be the case. Naturally, Altera is doing less and less of this. THe fact that you noticed that is a good sign that you get what's occuring.

Note that you've basically said the data may hit the FPGA +/-1.5 from the clock. The +1.5 was just made up by me, so try to get something that reflects what's occurring.

And the fitter will try to meet your timing. In fact, that's why it's good to do both -max and -min, so it knows what the mid-point is, that gives the best slack for both setup and hold. (Since you have symmetric external delays, the mid-point is to have the clock and data paths match exactly, which makes sense from a logical perspective.)

Altera_Forum · ‎05-22-2009

Thanks again Rysc for your response.

Just one further question regarding the maximum and minumun delay that should be selected.

Generally speaking what would be the approach when deciding what delays to use for the set_input_delay constraint?

Would it be analysis of the datasheet of the external device to uncover exactly what delays they may be between a clock and a data signal leaving the external device and also analysis of the respective board delays?

The problem I am having is that the exact times are not given in the PowerPC handbook as to the difference in time of the clk and data for example (although I want to check and see if more docs are available). Would the best idea be to allow for a small +/- delay here? or is it essential to uncover the exact specs.

The second issue I have is that we are still in the process of doing the board schematics but from talking to one of the guys he is not sure if whatever simulation software will provide us with the times, but I am not sure if this is the case. But I am assuming it is imperative to have an accurate idea of different trace delays? Do most designers usually obtain these times?

Basically what I am looking for is an account of anyone´s experience in doing this. Is it something that is normally done very accurately or is an exagerated delay usually placed just to cover all scenarios?

Many thanks.

Altera_Forum · ‎05-22-2009

It's often good practice to get exact delays.

I STRONGLY recommend documenting your .sdc. You can just do comments like:

# The PPC datasheet has a clock Tco of 6-8ns from datasheet dated July 2007,

# And the board delay is 1ns +/-50ps from board layout, so adding those together...

or you can do equations in Tcl:

set ppc_max_tco 8.0 ;# PPC datasheet July 2007

set ppc_clk_max 1.05 ;# Max board delay Dave told me when doing board sims..

set max_ppc_clk [expr $ppc_max_tco + $ppc_clk_max]

That way the constraints are documented. I can't tell how many times I've seen people with a "well, the Tsu constraint is 4ns, and someone else did it and I have no idea how they came to that, but I have to meet that. And if we change the board layout I'm not sure what to do, or if we get different speed grade parts I'm not sure...."

That all being said, you have LOT of margin. Source synchronous interfaces run very fast(think 200MHz DDR, or a 400Mbps is pretty common). So you could just throw in something big. Then try to see if something's available from the datasheet, or measure it, or just say, hey that should be more than enough. Remember at one point you weren't constraining this at all, so you need to balance the amount of time it takes to get it exact(you could do HSPICE modeling and all sorts of other stuff, but if you meet timing by ns, why spend the time). If you upgrade to a faster interface, then maybe you'll want to refine it, but if it's documented how you got the numbers, you have a good starting point.

Altera_Forum · ‎05-25-2009

Thank you Rysc for your reply, things are certainly much clearer now.

I was having a read of the datasheet of the PowerPC and on the I/O specifications chapter there is a spec given for "Valid Delay Tov max" which from looking at timing diagrams seems to be the max delay between the rising clock edge and the time at which the referenced signal is stable on the putput of the PowerPC.

In the case of the databus the max delay is 6.6ns and the min 0ns.

Given that the setup relationship is 7.5 ns, it looks like this presents a problem.

Anyway I am thinking that multicycle constraints will be nessesery here. I may have to have the PowerPC configured to prolong the transfer and present the data for a longer period. eg 1 cycle more that it is currently set for. By setting placing a multicycle on the register which latches the data I should be able to succesfully meet timing.

I am resonably familiar with multicycles and from having a quick look around the forum there seem to be plenty of threads on this topic which I can study.

Really I was just wanting to know if my thinking was right on this, or indeed if there is another/better solution when this problem arises.

Many thanks for the help.

Altera_Forum · ‎05-26-2009

First off, make sure the spec is what you're saying. Outputs are often spec'd in terms of the input clock, i.e. the data is valid between 0 and 6.6ns after the input clock. This, by itself, is useless, since we want to know what the data output is in relation to the clock output. So if the clock output, in relation to the clock input, was 0 to 6.6ns, then we could figure out the skew between these.

Or is the spec saying the data comes out anywhere between 0ns(with the clock) up to 6.6ns after the clock? I don't know the answer, but it naturally makes a big difference. And skewed by anywhere between 0 and 6.6ns is a pretty big number(but not impossible).

Now, your setup requirement is 7.5ns. So that only leaves 0.9ns of skew inside the FPGA. First, if using a PLL in source synchronous mode, it's possible that you can meet that. Secondly, don't forget that you have -7.5ns of hold to work with. So if you're just barely meeting setup timing and have tons of hold timing, then Quartus might be able to reduce the data delay to borrow some of that hold timing. Or, you can add a positive phase shift of a few nanoseconds to your capture clock. (And you would add a multicycle -setup of 2 between the external clock and the PLL clock to get the correct requirement).

Bottom line is you still might meet timing, so give it a try. I do not think you need to go to a slower data rate.

Altera_Forum · ‎05-26-2009

Thanks again Rysc for your reply.

I am fairly sure that the delay of 6.6 ns specified is the max delay between the rising edge of clock leaving the PowerPC and the time the data bits are stable. I have attached the datasheet and on page 78 these details are given. Perhaps you could have a quick look at this if you had time. On page 69 there is a timing diagram where the tov max delay term is used and it seems to refer to what I think.

Anyway I took your advice and used a PLL to shift the clock and I have been able to meet timing. I had to shift the clock by +5ns to meet timing. I now have a worst case setup time of 0.790ns and a worst case hold time slack of 2.123ns.

I have not included board trace delays in the input_delay constraints, its something I will add later, but I am assuming that they would typically quite low in the order of 100´s of ps?

I am also receiving some warnings about no clock uncertainty assignment on the clocks. From looking at the attached datasheet on page 67 it specifies the max and min high and low time for PerClk (the clock in question). The values given are that the clock can be high for between 7.5 and 9.9ns (50-66%)of a 15 ns cycle and low for between 4.95ns and 7.5 ns (33-50%) on a 15 ns cycle.

Does the potential difference in duty cycle represent clock uncertainty or am I misunderstanding clock uncertainty?

If my understanding is correct, then between the 6.6ns delay on the databus and the difference in potential duty cycle, it would make meeting timing even more difficult.

Do you have any thoughts on this?

Many Thanks

Altera_Forum · ‎05-26-2009

I think it just means to add the following line to your .sdc:

derive_clock_uncertainty

(That should be in all .sdc files for devices 65nms and newer...)

And note that, even though your slack is getting tight, if you meet timing, and if everything is correct, then you're in good shape. I've seen designs meet timing by a few picoseconds.

Altera_Forum · ‎05-26-2009

Thanks Rysc, placing derive_clock_uncertainty in the .sdc file seemed to sort it out

Altera_Forum · ‎10-14-2009

hello Everyone,

Sorry for posting question in middle of no where, I did not find how to start new thread,Ok anyways

My question is,

The LPM Ram or Ram created thr MEGAWIZARD has a input latch and an output latch, because of which we are getting the 2 cycles delay in the output, which is very crucial in our design, how to over come the delay caused by the RAM, Cant we overcome this latch, if so how do we do it.

Altera_Forum · ‎10-15-2009

In the Megawizard there is a checkbox to disable the output latch. (The input/address latch is always there).

Altera_Forum · ‎10-15-2009

Thanks for the reply, It indeed allows to uncheck latches for the RAM, but i'm not sure weather it also allows for CYCLONE 3 FPGA.

It indicated some error when i did,and tried to compile.

Altera_Forum · ‎10-15-2009

Cyclone III's M9Ks have an output register that is bypassable. Not sure what error you're seeing, as it's definitely possible.