This is a problem that I've had hanging over me for a while. I'm hoping someone can shed light on it.So, I need to implement some DDR output logic (not for memory, but just to send a high-speed data stream to another chip--it's RGMII, if you must know). As I understand it, the standard way to implement that is with two sets of registers and a mux that selects between them based on the current phase of the clock. But here's the thing: the timing is too tight to clock each register at the beginning of its output phase. I need to clock it a half cycle earlier so that the latch-minus-launch time is a full cycle (instead of a half cycle). The numbers suggest to me that this should work. However, when I implement that, I can't find a way to make TimeQuest understand which latch/launch relationship to analyze. It seems to always assume that the latch edge is the very next edge, so it always thinks that the latch-minus-launch that it has to analyze is half a cycle, but I really want it to be a full cycle. I've tried setting a multicycle of 2 (instead of the default of 1), but that just adds a full cycle to get one and a half cycles, which is no good either. And it won't let me specify fractional multicycle values. I also tried just altering the numbers in set_output_delay to compensate, but that results in the minimum being greater than the maximum, so it complains and rejects the assignment. Does anyone know a way to deal with this situation? Is there a way to manually specify the latch/launch relationship, perhaps?
I did a DDR I/O design with time quest some months ago and will again soon at a faster clock rate. So I would like to spend some time this afternoon to reproduce your question in quartus and get an understanding of the requriement and develop for both of us the best answer. I'll build a test case in VHDL unless you already have a code snippet or bdf you want me to use.
Wow, that'd be great. Here's a simple VHDL design I whipped-up just now that should demonstrate the problem.
library ieee; use ieee.std_logic_1164.all; entity tight_ddr_test is port ( TXC : in std_logic; TD : out std_logic_vector(3 downto 0); byte : in std_logic_vector(7 downto 0) ); end entity; architecture tight_ddr_test of tight_ddr_test is signal TD_high_phase_reg, TD_low_phase_reg : std_logic_vector(3 downto 0); begin process (TXC, TD_high_phase_reg, TD_low_phase_reg) begin if TXC = '1' then TD <= TD_high_phase_reg; else TD <= TD_low_phase_reg; end if; end process; process (TXC) begin if rising_edge(TXC) then TD_low_phase_reg <= byte(7 downto 4); end if; end process; process (TXC) begin if falling_edge(TXC) then TD_high_phase_reg <= byte(3 downto 0); end if; end process; end architecture;TXC and TD correspond to real ports on my FPGA, whereas "byte" corresponds to the result of previous internal logic. The important part is that the registers are each loaded a half cycle before their result will be enabled in the multiplexer. There's no need for that above, of course, but you'd need it if the timing was tight like it is for me. (Sticking lots of combinational logic in the "<= byte(? downto ?)" lines might simulate that.) For the SDC file, let's say that TXC has a period of 8ns and the setup and hold times for the destination chip are both 1ns. If we WEREN'T loading the registers in advance, then we could just write this:
create_clock -name TXC -period 8 set_output_delay -clock TXC -min -1 ] set_output_delay -clock TXC -max 1 ] set_output_delay -clock TXC -clock_fall -min -1 ] set_output_delay -clock TXC -clock_fall -max 1 ]But since we ARE loading the registers in advance, that doesn't work. TimeQuest assumes, e.g., that we want TD_high_phase_reg to be latched by the destination chip on the rising edge right after we load it.
OK, I'm drawing out the timing you described and reviewing the timing commands available. I'll assume CycloneII family C8 when I compile. But first of all, if your problem is meeting timing with a large amout of combinatorial logic at the input of the IO element flip flop, you could perhaps add a pipeline stage flip flop and no longer have your timing quandry, as long as you can afford the extra clock of latency.
Our target is a Cyclone II C7.I actually just thought of the extra stage approach too and tried it earlier today. You're right--It pretty much eliminates the problem. I can live with that solution. There might be glitches on the output bus due to switching the mux and register at the same time, but that's not something I'll lose sleep over. Thanks for the help!
No, no, I'm still using both edges. The extra registers for the high phase data are clocked on the rising edge, and the extra registers for the low phase data are clocked on the falling edge.In response to your email, the interface is source-synchronous and we do not have a PLL available for use with this clock.
For anyone who's interested, it seems that it is now possible to do what I want in Quartus II 7.1. It seems that if the external chip has output/input delays specified for both edges of the clock, then TimeQuest will by default analyze the worst-case launch/latch relationship for both the rising and falling external edges, regardless of which one is the worst-case relationship overall. You can then specify the one that you don't want as a false path.So, take TD_low_phase_reg in my example above. TimeQuest will now analyze both a setup latch-launch of 4ns (for a falling latch edge, which is a false path in the above design) _and_ a latch-launch of 8ns (for a rising latch edge, which is correct). And for hold timing it will also analyze both a launch-latch of 0ns (which is correct) and a launch-latch of -4ns (which is extraneous). To cut the unwanted paths, one then just adds the following to their SDC file:
set_false_path -from *|TD_low_phase_reg -fall_to TXC set_false_path -from *|TD_high_phase_reg -rise_to TXCNaturally, for this to be safe, the registers must not fan-out to any other TXC-clocked destinations. Note that the changes in TimeQuest's behaviour will apply to your DDR receive timing too, but you don't need to do anything special there because the overall worst-case there is also the correct one. For people doing new DDR designs, I'd recommend this over just adding extra register stages like I did before. After the fit got even more cramped in my part, I had to start playing with the number of register stages to try and find better fits (often failing), but now with the above I don't need any extra register stages at all and it passes timing analysis every time (and real-world testing, too).
--- Quote Start ---
set_false_path -from *|TD_low_phase_reg -fall_to TXC set_false_path -from *|TD_high_phase_reg -rise_to TXCNaturally, for this to be safe, the registers must not fan-out to any other TXC-clocked destinations. --- Quote End --- You can use virtual clocks for the -clock argument in input delay and output delay constraints. A virtual clock can let you be more precise with false-path exceptions (like Tristan's) or multicycle exceptions for I/O paths. Also, the same port (device pin) can have input/output-delay constraints for more than one virtual clock, allowing the same port to have more than one I/O timing constraint for more than one register (like an output enable register and a data register feeding the same device pin). In an earlier post, Tristan had this clock: create_clock -name TXC -period 8 [get_ports TXC] For a virtual clock, simply omit the target: create_clock -name TXC_virtual_just_for_io_use -period 8 Then use the virtual clock in the output-delay constraint and in the false-path exception: set_false_path -from *|TD_low_phase_reg[*] -fall_to TXC_virtual_just_for_io_use A virtual clock can also let you use clock-to-clock exceptions instead of path-based exceptions for more efficient exception processing by TimeQuest. For example: set_false_path -from TXC -fall_to TXC_virtual_just_for_io_use