Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
21323 Discussions

properly constraining system synchronous interface

Altera_Forum
Honored Contributor II
1,712 Views

I'm trying to interface an FPGA to a DAC (running at 200MSa/s) in a system synchronous configuration. The FPGA clock and DAC clock come from the same source and then split on the PCB before going to the respective chips. 

 

I understand that a PLL within the FPGA can be used to phase-shift the clock internal to the FPGA with respect to the PCB system clock in order to improve the clock to data alignment at the DAC. 

 

The DAC has a worst case setup time of -0.6ns and hold time of 2.1ns, so a minimum data-valid window of 1.5ns. 

 

In the Cyclone IV EP4CE115 I/O pin timing parameter spreadsheet, I'm seeing a wide range of clock to output delay times between the slow and fast model (8.48ns to 4.98ns) for row I/O's using the 3.3V LVTTL standard. 

 

With a 3.5ns range of variation in clock to output times, it seems hopeless to run a bus at 200-MHz, even with PLL phase adjustment of the clock internal to the FPGA. Is that true?
0 Kudos
9 Replies
Altera_Forum
Honored Contributor II
623 Views

What you are observing is the effect of min/max timing models. That is a rather large difference however. You can improve this somewhat if you make sure that your outputs are all in the same sub-bank of the FPGA. You should be able to find a data valid window that you can capture the output data at 200 MHz in CIV. Are you in the slowest speed grade device? That will also effect output timing.

0 Kudos
Altera_Forum
Honored Contributor II
623 Views

I realize I made a mistake in my previous post. With worst-case setup and hold times of -0.6 nsec and 2.1 nsec and a 200-MHz clock, I think I was mistaken in saying the worst case data transition window was 1.5 nsec. With respect to the DAC clock, the data needs to transition no more than 0.6 nsec after the clock transition and no more than 2.9 nsec (5 nsec period - 2.1 nsec tHOLD) before the clock transition. Therefore, my data can transition anywhere in a 3.5 nsec (2.9 + 0.6 nsec) range and meet the timing specs for the DAC. At any rate, I have a good idea now of where (when) I need to make my data transition in order to make this system work. 

 

I went ahead and created a project targeting the Cyclone IV in the Quartus software. I created a PLL to drive the logic in the design and set it in Normal mode to compensate for the global clock delay. Quartus reported clock to output delay times of 5.10 and 1.85 nsec for slow and fast models respectively. Compared to the range of values reported in the timing spreadsheet, it looks like I've gained ~250 psec as far as timing uncertainty under the full range of PVT. A step in the right direction! 

 

Then, by adjusting the programmable delay in the PLL, I am able to move the reported clock to output delay over what looks like a +/-1 period range. Does Quartus have newer/better information about clock to output delays for the Cyclone IV device than the timing spreadsheet?
0 Kudos
Altera_Forum
Honored Contributor II
623 Views

Should it not be 2.1 ns tSetup and -0.6ns tHold? 

I've never seen a negative tSetup...  

 

Anyway, you're in the right path. 

A couple of checks: 

- Assign your pins and set the correct I/O standard. The delays may change a bit. 

- Make sure your output register is being packed into the IOE.
0 Kudos
Altera_Forum
Honored Contributor II
623 Views

I was confused by the negative tSetup value, but found an example from Maxim IC that explains things pretty well. Altera forum won't let me post a link, but if you search the Maxim IC site for "TUTORIAL 4053" it should show up as the top result. 

 

Hypothetically if the tSetup for the data input register in the DAC is 500 psec but there is at least 1.1 nsec of delay through the clock buffer/distribution network on chip, then the tSetup at the package input would be -0.6 nsec (assuming no or equal package pin to chip delays). 

 

I double checked the device datasheet and it is definitely -0.6/+2.1 nsec for tSetup/tHold. 

 

I assigned the pins so that each bit in my bus is driven from the same I/O bank. Within a bus, the worst case bit-to-bit skew in clock to output delay reported by Quartus is quite good, ~100 psec. There is a second bus, driving another DAC, that I assigned to an adjacent I/O bank. One bus is a row I/O while the other is a column I/O. From bank to bank, Quartus reports an additional ~100 psec skew. I'm pleasantly surprised by that report because the timing spreadsheet indicated a larger difference (8.48 - 5.88 = 2.6 nsec) in bank to bank skew when one bank is row I/O and the other is column I/O. I was thinking I would have to use two PLLs to introduce different phase shifts to the two output banks, but it looks like that is not necessary. 

 

On page 6-2 of the Cyclone IV device handbook, I saw the following statement: 

The IOE contains one input register, two output registers, and two output-enable (OE) registers. The two output registers and two OE registers are used for DDR applications. 

 

Can I use the output registers if my design is not a DDR application? 

 

When I search the Chip Planner for one of the bits in my data bus, I see four entries with the following resource types: 

DAC_I_out[0]~reg0 - Register cell 

DAC_I_out[0]~output - I/O output buffer 

DAC_I_out[0] - I/O pad 

DAC_I_out[0]~reg0feeder - Combinational cell 

 

The register cell is located in a general purpose region, not the dedicated output block, which tells me I'm not currently packing my output registers into the IOE. How do I specify that? Is it a setting in the Assignment Editor that I missed while reading through the types of assignments? Thanks!
0 Kudos
Altera_Forum
Honored Contributor II
623 Views

Just to explain my understanding of sign of tSU or tH: 

 

tSU & tH at register level are both labeled positive. 

When viewed at pins the relationship will either stay if clock and data delay are equal or the timing window(tSU + tH) will shift relative to clock edge and may sit in front of it or behind it. If timing window shifts in front of clock edge then tSU stays positive but tH is now labeled negative. If timing window moves behind clock edge then tSU becomes negative and tH stays positive.
0 Kudos
Altera_Forum
Honored Contributor II
623 Views

karl, 

no, you don't have to use DDR to make use of the IOE register. 

 

Quartus will automatically pack registers into IOEs when needed (to meet setup requirements) and possible. 

The Fitter -> Netlist optimization will report it, if so. 

 

Some quick tests indicate that Quartus can meet your I/O timing requirements without resorting to I/O register packing, so it might not do it. 

 

There should be a Fast I/O register assignment to force it, if you need it.
0 Kudos
Altera_Forum
Honored Contributor II
623 Views

Thanks rbugalho! Enabling the fast I/O register improves my timing margin noticeably. The system I am working on is expected to operate in harsh environments (wide temperature range), so I'll take all the margin I can get. 

 

Before enabling the fast I/O registers, I had a slow/fast clock to output delay range of -0.574/-2.805 nsec. After enabling the registers, that range moved to -1.530/-3.099 nsec, a 0.66 nsec improvement in delay uncertainty. 

 

If the dedicated output registers improve timing margin so much, why would they ever be turned off? Power savings? 

 

I noticed an interesting bit of behavior that slightly defied my expectations. I set up the PLL in my design in Zero-Delay Buffer mode to achieve the most compensation for clock network and output uncertainty. Initially, I was driving a simple square wave to all outputs with the following Verilog statement: 

DAC_I_out <= ~DAC_I_out; 

 

After I was satisfied that the timing could be met by phase shifting the data with the PLL programmable delay, I replaced the simple square wave with a more elaborate source of data, and recompiled the design. I thought that having a wider fan-out for the global clock from the PLL would slow down the clock and result in slower clock to output delays. However, the opposite happened. The clock to output delays were sped up by ~0.6 nsec. The spread in clock to output delay over PVT was also reduced, so I'll take it. I'm a bit puzzled as to why that parameter would improve when I added logic to the design.
0 Kudos
Altera_Forum
Honored Contributor II
623 Views

 

--- Quote Start ---  

Just to explain my understanding of sign of tSU or tH: 

 

tSU & tH at register level are both labeled positive. 

When viewed at pins the relationship will either stay if clock and data delay are equal or the timing window(tSU + tH) will shift relative to clock edge and may sit in front of it or behind it. If timing window shifts in front of clock edge then tSU stays positive but tH is now labeled negative. If timing window moves behind clock edge then tSU becomes negative and tH stays positive. 

--- Quote End ---  

 

 

kaz, 

I think our understandings of the tSU and tH parameters match. I interpret your explanation to be the same as the example from Maxim IC. 

 

It's easy to imagine a high speed DAC with parallel data input having a rather elaborate clock distribution network to phase align clock edges to all the data input registers. That distribution network could delay the clock edge enough that the clock needs to reach the pin of the part before the data in order to meet timing, thereby having a negative setup time. 

 

The part I'm using is, I think, the lower speed grade version of a very high speed part. If it is the same chip, then it would have the same distribution network introducing delay in the pin to input register clock.
0 Kudos
Altera_Forum
Honored Contributor II
623 Views

The Zero Delay Buffer mode is intended to use the FPGA as a zero delay clock buffer: it makes the clock at the PLL's dedicated output clock pin in phase with the input clock. 

The normal compensation mode should give you the best results. 

 

I'm not really sure why Quartus doesn't always use the I/O registers. I suspect it's because it takes a minimum effort timing driven aproach. 

You set the timing constraints and it tries to meet them. 

If it needs them to meet timing constraints, then it will use them automatically -- and it can be quite clever at it. 

If it doesn't need it, it doesn't. 

 

The clock network delay in the FPGAs is pretty much fixed, independently of the number of loads, since the clock network is pre-built. 

 

Why did your timings improve when you replaced your simple square wave with a more realistic logic? No clue. 

You can try to compare the paths in the TimeQuest GUI. 

 

There are two ways to ask Quartus to give you extra margin. 

The "try and give me some margin" way: set a target slack in the fitter settings. 

The "you must give me this" way: add clock uncertainty to your clocks. Or, just for I/O, make the I/O constraints worse.
0 Kudos
Reply