How to utilize programmable delay chain?

Altera_Forum · ‎11-27-2016

Greetings,

I'm using a Cyclone IV, and I have a bi-directional DDR data strobe line that I am using to clock in data (clock in when I'm performing a read.

I am basically using this document as a reference design: https://www.altera.com/en_us/pdfs/literature/an/an348.pdf

when performing a read, the data is clocked out of the ezternal device edge aligned with the data strobe. According to the document page 7 and 8, I need to apply a programmable delay chain to get the data strobe center aligned with the data for proper input timing.

I can't figure out how to do this. Do I instantiate something in HDL, or what?

So far, I created some input timing constraints and they seem to be getting applied, I'm just not meeting setup, but I have plenty of hold slack to play with. I believe all I need is some static delay, as opposed to dynamic delay.

Google revealed this: https://www.altera.com/support/support-resources/knowledge-base/solutions/rd05052011_592.html

But in the assignment editor, I could not find a parameter with "D2". I did find "D1" and "D4" and "DQS Delay", which all sounded promising, but did not seem to have an effect on the end results.

After I do a build, I look at the fitter report under "Delay Chain Summary", go to my data strobe line, but all entries in the report are either "0 ps" or blank "--".

Can someone give me some pointers?

Altera_Forum · ‎11-27-2016

I believe D1 is what you want. The TimeQuest report annoyingly doesn't have a separate line item for this delay chain so you don't see it explicitly, but the delay values should change. If you locate the input pin from the fitter report or a timing report to the Resource Property Editor, it is explicitly done there.

That being said, the fitter should automatically set it for optimal timing. I normally just enter timing constraints and that's all. I just threw down a quick input -> output design in Cyclone IV and the Fitter Delay Chain summary clearly shows a Pad to Core 1 delay chain and it's set to 6. (Cyclone IV doesn't have explicit input DDR registers, it just quickly drives the LAB next to it, which is just about the same speed)

Altera_Forum · ‎11-27-2016

Enabling that D1 setting seemed to not make a difference. I did check the fitter report, but everything has a delay chain value of 0ps ...

So I am betting that I am doing something horribly wrong. Either I don't have some tool settings enabled, my input register structure makes it impossible to apply a programmable delay (I'm using ALTDDIO_BIDIR as the input register, or my timing constraints are so bad that the tools immediately give up on trying to meet timing.

I did manage to make a minimalist project that replicates my problem. I just deleted every single thing ezcept this input clocking structure. Maybe someone can check it out?

Altera_Forum · ‎11-28-2016

I downloaded and tried to run report_timing on the input pins, but nothing came back. I think the input DDR registers were being synthesized out, so I hooked up some input pins to datain_h/l. (Seems like that would affect the output, but I didn't look at it too closely).

After compiling, I see:

1) You're doing a same-edge capture, via a multicycle hold of 0. That's a good idea when you don't have a PLL compensating for the clock tree, e.g. the clock path will be much longer than the data path, so using the same edge to capture works well.

2) The setup relationship is 0ns and hold is -5ns, which is half the clock period, or exactly the data period. Looks good.

3) After compiling, the setup slack is ~0.6 and hold is ~2ns(at the slow corner). So you're meeting timing. In the Resource Property Editor, the Input Delay from Pin to Array is 0, which means it is as fast as it can be. That is correct, since making it slower would only make the setup timing worse, so I don't see a reason to change it.

(I looked quickly in the Assignment Editor and could not see an assignment for this. I'm really scratching my head. The Fine Grained control is a separate "half step" delay that is really small and either on or off. I don't know why there isn't a D1 delay chain assignment. Perhaps file an SR? I'm not looking into it more right now because as of right now, the fitter is doing the right thing setting this to 0.

Altera_Forum · ‎11-28-2016

Hrmmm...Are we getting different results or is my understanding flawed? I just downloaded the zipped file on a different computer, built with Quartus Prime Version 16.1.0 Build 196, and without changing anything (just double clicking "Compile Design") I get a failed timing result under "Fast 1200mV 0C Model." Here are the results of the worst case timing:

Slack: -0.457

From Node: iop_slv_DQ[0]

To Node: Test:Lower_DQ_BIDIR|altddio_bidir:ALTDDIO_BIDIR_component|ddio_bidir_prr:auto_generated|input_cell_l[0]

Launch Clock: virt_clk

Latch Clock: CLK_ddr_ldqs_n

Relationship: 0.000

Clock Skew: 1.743

Data Delay: 1.307

But you got nothing? Weird....

Also, I'm a bit confused on your statement that an input delay of 0 would be the best case for setup time? Did you mean an input delay of 0 on the data pins (DQ) as opposed to the clock pin (DQS)? In my mind, delaying the clock would make the data arrive "earlier" relative to the clock, which would help meet setup instead of hurt. This seems to be what the an348 was getting at as well.

Altera_Forum · ‎11-28-2016

I wasn't able to get the inputs timed with the .zip. But once I've made the change, you're right that the setup fails at the fast corner. Basically the clock is on a global clock tree and is basically a fixed path. The data comes in and the delay chain is set to 0. At the fast corner this still fails setup timing. (The default setup is that the data path needs to be faster than the clock path, which should be easy since the clock is on a global. But once we add in the 900ps of external delay from your timing constraints, we're saying the data path must be more than 900ps faster, and it just isn't.)

I decided to go the other way, and rather than a same-edge capture, do the normal next edge capture. To do this, I commented out the multicycle and false path exceptions on these I/O. After doing this, the setup relationship is 5ns and the hold relationship is 0ns, i.e. the next window after data is launched. (There is also a setup relationship of 10ns and hold of -5ns, which you could false path, but I never do this since it doesn't help anything, since the correct analysis is more restrictive and if you've met that, then you've met these less restrictive ones too). Anyway, after doing that I end up making setup by 2ns but failing hold by -1ns. When I look at the fitter report, the delay chains have been cranked up to their highest value of 6, i.e. the fitter is setting these to meet timing. I was thinking you could hand place the registers further away from the IO to get longer delays, but the problem is if you add ~1ns of delay in the fast corner, that's probably about 2ns of delay in the slow corner, i.e. you're likely to start failing setup.

Your design seems to be falling into a sweet spot(or unsweet spot?) where it just fails on the same-edge capture and the next-edge capture. Yuck. I went back to your original constraints and put a keep on the signal the clock goes through, and this forced it to route through a LUT before getting onto a global, which added enough delay to meet timing. (Hold met with >1ns of slack, while setup had only ~100ps of setup in the fast corner.) THe placer puts the LUT right next to the global, which is a little too fast, so I added a location assignment to pull it a little further away and that balances it a little better, with 382ps of setup slack and ~800ps of hold slack. Note that I am only looking at the lower DQ signals, so you probably need to do this for the H ones.

Code change:

signal s_sl_LDQS_In : std_logic;

attribute keep : boolean;

attribute keep of s_sl_LDQS_In : signal is true;

Location assignment:

set_location_assignment X59_Y6 -to s_sl_LDQS_In

Altera_Forum · ‎11-28-2016

Can you attach the SDC file you used? The reason I went with the same-edge capture is because the DQS signal only pulses once at a very specific time. It is not a continual clock, so telling the tools to capture it on the next edge shouldn't be correct....Unless this is merely a way of lying to the tools and it will all work out.

You say that the clock is on a fixed path (meaning I can't apply a delay chain?), but figure 5 of AN 348 shows a diagram with a programmable delay chain on DQS, essentially doing exactly what I need it to do.

Altera_Forum · ‎11-28-2016

I didn't change the .sdc. (Or I did, it failed, and I changed it back). For DQS strobe you do need the same edge capture, so what I tried wouldn't work anyway. The only changes I made then are the HDL attributes and location constraint, which got it to meet timing.

That app note may be using specific hardware for DQDQS and require a megafunction. I'm not sure. I was just trying to close timing using the regular stuff.

Altera_Forum · ‎11-28-2016

Ah okay. I guess utilizing delay chains is still a bit of a mystery to me. Too bad there isn't a megafunction to instantiate which gives explicit control over this value.

But your way does work! Thank you for the help.

Altera_Forum · ‎11-28-2016

If you want to file an SR about changing the delay chains, you might get some info. I could have sworn I'd done this with assignments that are easy to recognize, but I just don't see them.