Re: Clock network delay + internal cell delay > Minimum timing requirement

Altera_Forum · ‎10-24-2007

Hi,

I have a SDRAM controller running at 133MHZ which is currently violating the setup requirements of the RAM. I have a register connected to an I/O pin which tri-states the data bus when the FPGA OE is un-asserted. The timings for the register are:

Clock delay (2.7) + oe signal delay (0.2) + register delay (3.3) + RAM setup time (1.5) = 7.7ns

This results in a violation of 0.2NS. I'm not sure what to do as the delay is completely within one cell, there are no other routing delays except for the global clock.

I have taken the following measures to avoid this problem:

- Ensured the clock of the tri-stating register is global

- Enabled "Fast Output Register"

- Enabled "Fast Output Enable Register"

- Enabled "Speed Optimization Technique for Clock Domains"

- Registered the databus signal with the system clock

- Fitter effort set to 3

The Cyclone II i'm using is about half full and i'm just about out of ideas.

Any advice would be greatly appreciated!

Evan

Altera_Forum · ‎10-24-2007

The output delay chain is probably already at its fastest setting, but that's something to check.

Consider using a PLL to shift the clock of the output registers to reduce the FPGA tco.

Altera_Forum · ‎10-24-2007

Hi Brad,

Thanks for your suggestion, I've checked the output pin delay in the resource property editor and it was already on the lowest (105ps).

Unfortunately I'm already using all available PLLs.

Is this an inherent limitation of the Cyclone II device? I would have thought 133MHz would have been reasonable, but perhaps i'll have to consider running the memory at 100MHz.

Anyone else had experience with a SDRAM controller running at this speed on a Cyclone II or similar devices?

Thanks, Evan.

Altera_Forum · ‎10-24-2007

If you can't use a PLL, perhaps it would just happen to work out to invert the clock for the output registers to adjust the timing by a half period. You might then have to slow the output delay chain if the half period is too much.

Are you certain you have the I/O timing constrained correctly? You should use both maximum and minimum output delay constraints (preferred for either timing analyzer) or both tco and min tco (OK for the Classic Timing Analyzer). If you are just calculating the timing manually, you might have missed something.

Altera_Forum · ‎10-24-2007

If anything, the limitation is on the number of PLLs. All devices have long clock tree delays, and they actually get longer in the higher-end parts, but the PLLs are designed to counteract this.

At those speeds, you may want to take the clock off a global, especially if the source is anywhere near the SDRAM interface. Local routing can actually be faster, getting you a faster output, but you will get more skew across the bus since local routing is not de-skewed like a global tree. This output skew may be perfectly acceptable though.

Altera_Forum · ‎10-25-2007

Thanks for the advice Brad and Rysc, I've gone back and reviewed my timing constraints in my SDC file. I think i've incorrectly put the setup time where the max tco should be.

Here's what i've got now:

set_output_delay -add_delay -clock [get_clocks {clk_133}] -reference_pin sclk_p -min -0.800 [get_ports md_p*]

set_output_delay -add_delay -clock [get_clocks {clk_133}] -reference_pin sclk_p -max 6.000 [get_ports md_p*]

With a system clock of 133MHz (7.5ns period), would these would correspond to the RAM requirements of a hold time of 0.8ns and a setup time of 1.5ns? If so then i don't have any timing problems and I'd be very happy!

Altera_Forum · ‎10-25-2007

On first glance, looks correct to me, but hard to say without knowing all the details of the the design, layout, clocking scheme, etc.. I didn't realize that you were doing a source synchronous interface. If that's the case, then clock delay generally doesn't play a part, you just want to make sure your clock and data come out with the proper relationship. This makes the write side pretty easy to meet(often you have to do a phase shift on the clock if, for example, it needs to be 90 degrees or something from the data). Usually the read side is the more difficult interface, but it looks like you're on the right track.

Look at the waveform shown in TimeQuest on a single data output(both the setup and hold waveform on that output). See if what its showing you correlates to what you expect to happen(and the slack basically says, my delay could vary by this much and I would still have a successful transfer). Source synch interfaces are one thing that, for the life of me, I can't just look at someone's constraints and understand. I still have to draw it out every time and say, does that look right?

Altera_Forum · ‎10-25-2007

I had to install 7.2 to get the wave viewer, but its very useful. I can see that my original values were correct, and the max output delay should be set 1.5ns, as you can see from this screenshot:

http://img120.imagevenue.com/img.php?image=89373_timing_diagram_122_747lo.jpg

Why does the clock delay not play a part, seeing as the clock is external to both the FPGA and the RAM?

Is it common practice to invert the launch clock (and delay it) to achieve setup times? I can see this getting messy.

Altera_Forum · ‎10-25-2007

I was basing my statement solely off the fact that you were using the -reference_pin option. This generally means you're sending a clock off chip alongside the data(the clock is sclk_p and the data is md_p*). In this scenario, much of the clock delay inside the FPGA cancels itself out. For example, if the clock tree is 2.7ns, it takes ~2.7ns to get to the data output register and ~2.7ns to get the clock to the port where it leaves the chip. These cancel each other out in the timing analysis(so if they were 100ns delays, you could still meet timing). Source synchronous often allows higher-speeds to be achieved because a lot of the variation tends to cancel out. Is a source synchronous interface what you're doing, or do you have a board oscillator that feeds the FPGA and the SDRAM?

One thing I don't see, and it might be there but I just can't tell in the waveform, is if the delay to sclk_p in the data required time being removed. I'm not sure exactly how it's shown there, but it looks like your latch edge is at 7.5ns and doesn't move from there.

Altera_Forum · ‎10-25-2007

The following equations can be used for either a system-synchronous clock (same clock on the board going to both the FPGA and the SDRAM) or a source-synchronous clock (FPGA output driving clock to the SDRAM). For a source-synchronous clock, the board clock skew is simply the board delay from FPGA clock output to SDRAM clock input.

--- Quote Start ---

output delay max = board delay (max) - board clock skew (min) + tsu (external device)

output delay min = board delay (min) - board clock skew (max) - th (external device)

--- Quote End ---

As Rysc said, -reference_pin is used for a source-synchronous interface where FPGA output device pin sclk_p drives the SDRAM clock. If that's how your design is set up and if the board data delay and board clock delay between the FPGA and SDRAM are exactly matched, then the above equations reduce to what you're using:

--- Quote Start ---

output delay max = + tsu (external device)

output delay min = - th (external device)

--- Quote End ---

--- Quote Start ---

Why does the clock delay not play a part, seeing as the clock is external to both the FPGA and the RAM?

--- Quote End ---

Your question makes me (and Rysc) wonder if you have a system-synchronous setup. If that's the case, use the top equations above to account for your board delays and get rid of the -reference_pin arguments. The rest of this post is for a source-synchronous interface.

If clk_133 is the clock setting on the FPGA clock input device pin, then to use clk_133 as the clock with -reference_pin requires there to be no keeper (register or port) between your FPGA clock input device pin and the sclk_p clock output device pin. Combinational logic in the clock path is OK with -reference_pin. You probably wouldn't be able to analyze timing at all if you violated this -reference_pin restriction, so I assume you have no register in this clock path. I'm mentioning it anyway for the benefit of others reading this. From check_timing on-line help:

--- Quote Start ---

The reference_pin check verifies that reference pins specified in set_input_delay and set_output_delay using the -reference_pin option are valid. A reference_pin is valid if the -clock option specified in the same set_input_delay/set_output_delay command matches the clock that is in the direct fanin of the reference_pin. Being in the direct fanin of the reference_pin means that there must be no keepers between the clock and the reference_pin.

--- Quote End ---

--- Quote Start ---

Is it common practice to invert the launch clock (and delay it) to achieve setup times?

--- Quote End ---

You adjust the clocks as needed. Usually that's done with an adjustment to the FPGA output clock, which is the latch clock. When a 180-degree shift happens to work well, then yes, it is common simply to do an inversion for that regardless of whether a PLL is used.

--- Quote Start ---

This generally means you're sending a clock off chip alongside the data(the clock is sclk_p and the data is md_p*). In this scenario, much of the clock delay inside the FPGA cancels itself out. For example, if the clock tree is 2.7ns, it takes ~2.7ns to get to the data output register and ~2.7ns to get the clock to the port where it leaves the chip. These cancel each other out in the timing analysis(so if they were 100ns delays, you could still meet timing).

--- Quote End ---

This works best if you drive both data outputs and the clock output with I/O cell registers.

You can use DDIO registers in the sclk_p I/O cell to drive the clock. Driving the clock with the DDIO registers and also driving the data outputs with I/O cell registers will align the clock output to the data outputs for 0-degree and 180-degree alignments. For this method, tie one of the sclk_p DDIO register inputs to a hard high and the other register input to a hard low. Swap the high and low between the registers to change between 0-degree and 180-degree alignments. The sclk_p frequency will be the frequency of the clock for these DDIO registers. You can use the DDIO registers with -reference_pin because the path that actually matters goes through a mux in the I/O cell that selects between the two registers; the clock output path is the combinational path through the mux select line, not a path through the DDIO registers.

Even if you just need single-data-rate registers to drive the data, use DDIO registers for the data too so that their tco will better match the tco of the DDIO registers for sclk_p. For each single-data-rate output driven by DDIO registers, connect both register inputs to the single signal that you would normally connect to the input of a single I/O cell register. With the altddio_out megafunction, each bit of data connects to both datain_h[*] and datain_l[*] inputs of the megafunction.

Altera_Forum · ‎10-26-2007

Apologies for the confusion, I obviously didn't understand the use of the reference pin command. I'm using a system-synchronous clock setup:

http://img104.imagevenue.com/img.php?image=60343_timing_problems_122_1170lo.jpg

From what i can see, i should still be able to use the inverted clock to launch the data, as long as i can ensure enough delay for Tco min:

http://img185.imagevenue.com/img.php?image=61160_phase_delay_122_139lo.jpg

Looking at the first picture i posted:

http://img120.imagevenue.com/img.php..._122_747lo.jpg

The clock network delay + data delay values should ensure that the hold and setup times are met.

Would this seem correct? There's two issues i can see with implementing this, the first being ensuring i can register the data within half a clock cycle on the inverted clock and the other being how to configure the SDC constraints to recognize what i'm doing.

Thanks for you feedback.

Altera_Forum · ‎10-26-2007

--- Quote Start ---

Looking at the first picture i posted:

http://img120.imagevenue.com/img.php..._122_747lo.jpg

The clock network delay + data delay values should ensure that the hold and setup times are met.

--- Quote End ---

That link didn't work for me, but it sounds like you're looking at the same timing report to look at both tco (setup) and min tco (hold). Did you do report_timing for both setup and hold for the device output pins?

Altera_Forum · ‎10-28-2007

Sorry, here's that link that didn't work:

http://img120.imagevenue.com/img.php?image=89373_timing_diagram_122_747lo.jpg

Here's the timing report for hold time:

http://img144.imagevenue.com/img.php?image=11386_hold_slack_122_723lo.jpg

Altera_Forum · ‎10-29-2007

Well I believe I've found a solution, though I had to mod the board. I've re-routed (externally) the system synchronous clock to a Dual-Purpose DPCLK/DQS pin (previously using a LVDSCLK input). These pins have "dedicated DQS phase shift circuitry" that can be used to delay the clock. I've played around with the delay parameter and can achieve a global clock network delay anywhere from 2.7ns to 6.5ns. Since the period of my clock is 7.5ns, neither of these delays is great enough to solve my problem, and so i've used a delay value of ~4ns and inverted the clock, to achieve an effective phase shift that should negate the global clock network delay.

My only problem now is how to configure time quest to report based on the new delayed + inverted clock. Time quest has obviously detected i'm using an inverted clock and is now reporting everything based on the falling edge of the original clock. Does anyone know how i can avoid this? I'd like it to treat the new clock as if it was an external input, and not derived from the original clock input.

Thanks for your time, esp Brad and Rysc.

Altera_Forum · ‎10-29-2007

I don't know whether you'd have to make an adjustment for the clock inversion. In most cases TimeQuest does what you need automatically, but in some situations you need to tell TimeQuest the clock edge you care about.

It sounds like you now have a source-synchronous setup for which you can use the -reference_pin method. Instead of -reference_pin, you can create a generated clock on your new clock output device pin and use that generated clock in the set_output_delay constraints.

Altera_Forum · ‎10-30-2007

But i'm still using an external clock input? The only difference is now i'm phase shifting that input clock to counter the global clock network delay. Its still driving all the internal logic and launching the data.

While in principal this idea should have worked, i found through experimentation the delays given by time quest did not represent measured delays at the output. This meant the amount by which i was phase shifting the clock was unknown and was unreliable (data corruption resulted). So i've scrapped the whole idea and i'm back to square one.

Altera_Forum · ‎10-30-2007

I didn't think through your previous post well enough and got the wrong idea. I was thinking you meant that you rerouted the clock to a different pin to drive it out to the SDRAM.

Since you've already been willing to modify the board, maybe you should go ahead and change the clock to a source-synchronous output (what I wrongly thought you did). If the only thing connected to the SDRAM is the FPGA (SDRAM buses not also accessed by something else), if the board connections allow this kind of change, and if the timing analysis says that's better, it might be worth the trouble.

Altera_Forum · ‎10-31-2007

Can you attach a sample design and identify which pins the data is going out on? Please make it as similar as possible, i.e. the same device you're using, with clock and data pins assigned to the same pins, and the PLL configured the same way? Also, in a sentence or two, give a quick description of how the clock drives the FPGA, how it drives the upstream device, the upstream device's requirements, and the board delay from the FPGA to the upstream device? This shouldn't be too hard, but I think we're having difficulty trying to describe what's occuring with words rather than just doing an actual design. I'm out tomorrow, but will take a look at it the next day and get something back to you. This way we have something we can all work from. Thanks.

Altera_Forum · ‎10-31-2007

Thanks for your suggestion Brad, but unfortunately the clock is generated by the CPU, which also requires access to memory.

I have come to my conclusion, and that is there are only two solutions to this problem:

- Increase the track length of the clock from the FPGA to the RAM (to delay the clock by the required 200-300 ps relative to the data)

- Move to the next speed grade (-7)

I talked to my altera representative and he basically confirmed these timing problems are inherent limits with my device. He also suggested i ensure the global system clock only feeds into clock inputs of LEs, as otherwise this can increase the clock network delay substantially. So having done everything possible I'll have to ask my superiors what action to take.

So my advice to anyone implementing a system synchronous data bus, make sure you have a pll free!

Thanks for your suggestions Brad and Rysc.

Altera_Forum · ‎10-31-2007

Agreed. Going above 100MHz without a PLL is pretty fast. Another problem of not having a PLL is your variance over slow/fast timing models. Static timing analysis does a default slow model timing analysis, but you'll also want to run a fast model analysis, and now that all your delays can vary over this range with Process/Voltage/Temperature. The PLL not only removes the global clock tree delay, it removes it in a PVT invariant way, so that your timing analysis volitility is reduced.

Note that reducing the fanout shouldn't make much of a difference. The global trees are pre-laid out and rebuffered throughout, so most of the loading will be on side branches that have minimal capacitive affect.

Did you try taking the clock off a global? (You may have responded to this, but the thread is getting long.)

Also, if you're failing by 200-300ps, I can say almost without a doubt that your interface should work on a nominal board(you're not at the worst case PVT corner, so all the delays will be faster than the slow model reports). If you're seeing failures, I think something else is going wrong.

Altera_Forum · ‎10-31-2007

Sorry Rysc, i didn't see your reply. He's a simple drawing of my bus:

http://img120.imagevenue.com/img.php?image=01654_system_122_927lo.jpg

The delay between the CPU and FPGA is around 200ps.

The delay between the FPGA and RAM is also around 200ps.

The FPGA i'm using is the EP2C5F256C8.

There is no PLL (already both being used to generate audio clocks.)

I've uploaded an empty sample project, basically with the *.qsf assignments and the top level of vhdl (just a wrapper where i register the data outputs).

My timing problem occurs on the output of registered data on the I/O pins md_p[0 .. 31].

Basically all the delay is within the fast output register at the pin, and so theres simply nothing i can do to reduce that delay except make sure the clock gets there sooner (or equivalently to the ram later.)

Thanks for helping anyway, I think i'll be pushing for a faster speed grade.