Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
20690 Discussions

How do I synchronize gray code counts safely across asynchronous clock domains?

Altera_Forum
Honored Contributor II
6,180 Views

I have a design for a generic asynchronous FIFO that I have used for many years. In this FIFO, I use gray code counters for the read and write pointers to the core memory. These multi-bit pointers must be synchronized to the opposite clock domains to compute full and empty flags (e.g. rdptr is synchronized to wclk to compare against the wptr to determine the full flag.) I am using 2-stage flops in the synchronizer to reduce the metastability. 

 

The problem that I am seeing has to do with the placement (Quartus Fit) of the original gray code pointer in domain 1 and the the first set of flops in the synchronizer in domain 2. For clarity: 

reg [w:0] rdptr_rclk; // rdptr in the rclk domain 

always @ (posedge rclk)  

begin 

rdptr_rclk <= nx_rdptr_rclk; 

end 

 

reg [w:0] rdptr_wclk_s1; // first stage synchronize of rdptr into the wclk domain 

reg [w:0] rdptr_wclk; // second (final) stage synchronize of rdptr into the wclk domain 

always @ (posedge wclk) 

begin 

rdptr_wclk_s1 <= rdptr_rclk; 

rdptr_wclk <= rdptr_wclk_s1; 

end  

 

In the sdc file, I have set_false_path between rclk and wclk. 

 

Ideally, all 3 of these synchronizer stages (rdptr_rclk, rdptr_wclk_s1, and rdptr_wclk) would be placed by the fitter as closely together as possible. However, the fitter wants to place the rdptr_rclk flops on one side of the fifo, close to where the empty flag is generated and used, and it wants to place the rdptr_wclk flops on the opposite side of the fifo, close to where the full flag is generated and used. The other register, rdptr_wclk_s1, usually will get placed right next to the rdptr_wclk. 

 

The problem occurs when some of the bits of rdptr_wclk_s1 are placed close to their rdptr_rclk counterpart, while other bits are placed far apart, especially when the skew between bits approaches or exceeds the period of the 2 clocks. In this case, the rdptr_wclk_s1 can see a transition on one bit before it sees the earlier transition on a different bit. For example: 

 

Correct rdptr_rclk sequence: 

 

  1. 0C:001100 

  2. 0D:001101 

  3. 0F:001111 (bit 1 transitions) 

  4. 0E:001110 (bit 0 transitions) 

  5. 0A:001010 (bit 2 transitions) 

 

 

Sequence seen by rdptr_wclk_s1: 

 

  1. 0C:001100 

  2. 0D:001101 

  3. 0C:001100 (bit 0 transitions) 

  4. 0A:001010 (bit 1 and 2 transition) 

  5. 0A:001010 (no transitions) 

 

 

The sequence (2)0D to (3)0C at rdptr_sclk_s1 may look correct (only 1 bit changed), but this actually is a -1 step of the code, rather than a +1 step. 

 

Note that this is NOT a metastability problem. The problem occurs because the fitter has placed rdptr_wclk_s1[1] far from rdptr_rclk[1] while placing rdptr_wclk_s1[0] right next to rdptr_rclk[0]. Also, this problem is build dependent. One build may have the problem, but it may disappear with the build the next day. And the same build may work on one board (slightly faster FPGA, available to me on my test floor) but have errors on a different one (slow FPGA on the customer's system). 

 

In the tools for a different FPGA vendor, I am able to specify a DATAPATHONLY requirement of 1ns on the nets going into rdptr_wclk_s1, telling the placement tool to place the rdptr_wclk_s1 flops no further than 1ns away from the rdptr_rclk flops. But I have not found any way to do this with the Quartus tools. 

 

The best that I am able to do is to create a logic-lock region around my fifo (or just my synchronizer), but this is an afterthought process, and can be forgotten when new fifos are added to a design. I would really like something that I can put into my code or into my constraints that will handle this for any of my fifos in my design. 

 

Is there a different way of constraining this to force the fitter to place these 3 sets of flops near each other?  

 

Or is there a different way of coding this to be more tolerant of the placement?
0 Kudos
18 Replies
Altera_Forum
Honored Contributor II
3,829 Views

I suggest you try set_max_delay on the paths. 

By the way just curious why not use alter dc fifo?
0 Kudos
Altera_Forum
Honored Contributor II
3,829 Views

Thank you for this suggestion kaz. 

 

I have tried this in the past, with no success. The problem with set_max_delay is that the Quartus computation takes into account the clock insertion for both the source and the destination flops, which I don’t care about, and which can yield inaccurate results. So, for this path, the Required time is: 

+ 1.000ns (max_delay)  

+ 2.514ns (pll output through clock tree to receiving flop)  

– 0.140ns clock uncertainty 

+ 0.228ns Tsu 

= 3.602ns 

 

The Arrival time is: 

+ 3.016ns (input clock pin through clock tree to sending flop) 

+ 0.140ns (flop output) 

+ 3.854ns (chip route) 

+ 0.273ns (cell route) 

= 7.247ns 

 

The slack is 3.602-7.247=-3.645ns. 

 

The problem is that I want to exclude the clock tree components from both halves of this computation, so that the slack would be computed only based on the actual chip route from flop-to-flop (3.854ns in this example.) In this particular example, the delta between the 2 clock delays is only 0.5ns. In other parts of the design (or build-to-build variation), the delta can be much larger. It can also have the opposite relationship (clock delay for Required is larger than clock delay for Arrival), meaning that the computed slack is too optimistic. 

 

 

As for using the Altera DC fifo -- I have found that it doesn't work when I compile my design using the tools from a different FPGA vendor :). I like to be able to use this one fifo in a block of shared IP that can be compiled either in Quartus or in a different vendor. Also, it is easy to change the dimensions of my fifo, without having to regenerate using the MegaWizard.
0 Kudos
Altera_Forum
Honored Contributor II
3,829 Views

Well in that case try set_max_skew 

This is something I haven't tried but reading through its description it may be just what you want as it can exclude clocks
0 Kudos
Altera_Forum
Honored Contributor II
3,829 Views

Unfortunately, set_max_skew pays attention to the set_false_path. Because of the set_false_path, Timequest will say there is nothing to report in the "Report Max Skew Summary". The set_max_delay would be correct, but there is no way to say "datapath_only" and have it ignore the clock trees.

0 Kudos
Altera_Forum
Honored Contributor II
3,829 Views

 

--- Quote Start ---  

Unfortunately, set_max_skew pays attention to the set_false_path. Because of the set_false_path, Timequest will say there is nothing to report in the "Report Max Skew Summary". The set_max_delay would be correct, but there is no way to say "datapath_only" and have it ignore the clock trees. 

--- Quote End ---  

 

 

First you may remove set_false_path and ignore any reported violations or set to some high multicycle. 

second, this is an altera example that involves data path only: 

 

# Create a max skew constraint that includes only data path arrival  

set_max_skew -from [get_keepers inst1|*] -to [get_keepers inst2|*] 0.200 -exclude { from_clock to_clock clock_uncertainty }
0 Kudos
Altera_Forum
Honored Contributor II
3,829 Views

Well, that was interesting. 

 

I first tried using a "set_multicycle_path" AND and "set_false_path -hold" with the "set_max_skew". The false_path here, again, caused the set_max_skew to ignore the paths completely. 

 

Next I tried removing the "set_false_path -hold" but keeping the "set_multicycle_path". Now the design misses timing on the max_skew specification. Looking at the worst case path, I see that the fitter has added 13.658ns (!) of routing delay between these 2 flops (which are in the same LAB), presumably in order to meet the hold time requirement between these flops. The best case path (different fifo, different clock, but all lumped together with the single "set_max_skew") only had 8.069ns of routing delay added. 

 

I also tried to use the "-exclude" with set_max_delay, but apparently that is not supported.
0 Kudos
Altera_Forum
Honored Contributor II
3,829 Views

you need to set mc path as say 3 for setup, 2 for hold otherwise you run into hold issues.

0 Kudos
Altera_Forum
Honored Contributor II
3,829 Views

Using 3 for setup and 2 for hold still didn't make it pass. However, this time, the fitter only added 5.2ns of delay between the flops. 

 

I may be running into a different problem with the set_max_skew. I am using a single "set_max_skew" constraint that covers all of the synchronizers in this design. This means that the skew will include the wptr sync as well as the rptr sync. Plus, it will include the wptr and rptr for all of the FIFOs, not just a single one. It might work better if I can provide a constraint that is specific for each bus. How would I generically do that?
0 Kudos
Altera_Forum
Honored Contributor II
3,829 Views

I still have not had any success with this. Are there any more ideas?

0 Kudos
Altera_Forum
Honored Contributor II
3,829 Views

I am not familiar with skew command. I hope somebody will help here.  

 

I suggest you also try multicycle of 2(setup)/1 hold as well or even 1/0 (i.e. default, if you don't set false path) then ignore violations and see what sort of delay you get.
0 Kudos
Altera_Forum
Honored Contributor II
3,829 Views

I tried the multicycle with 2/1 and 1/0. In the latter case, I am now getting setup time violations, as well as the skew violation. The fitter is still inserting unnecessary (from my view) delay between flops. These flops are in the same or adjacent LABs, but the physical routing is going all over the place. 

 

Is there a way in the sdc to do something like: "for each inst in { find ajbg_fifo } do { set_max_skew { -from {$inst.rdptr_rck} -to {$inst.rdptr_wck_s1} }"
0 Kudos
Altera_Forum
Honored Contributor II
3,827 Views

I am facing a similar situation. Did you ever figure out a good way to constrain these?

0 Kudos
Altera_Forum
Honored Contributor II
3,828 Views

No, I am afraid I have not.

0 Kudos
Altera_Forum
Honored Contributor II
3,829 Views

Thanks. I have filled a support request. If I hear anything from them I will post it here.

0 Kudos
Altera_Forum
Honored Contributor II
3,829 Views

I have a similar problem with setting the rules for a Gray Counter. 

Wondering if byates ever received a response to the support request?  

Thanks.
0 Kudos
Altera_Forum
Honored Contributor II
3,829 Views

I did not receive a useful reply from Altera support. However the local FAE helped a lot and came back with some suggestions. I also found some papers at zimmer (http://www.zimmerdesignservices.com/index.php?section=12)design that describe the problem and solutions for the ASIC world. His technique doesn't work directly but there is some very good information there. 

 

In the end I came up with the following solution which is overly complicated but does work. The design I'm using it on is a StratixV 5SGXEA7K2F40C3 with 123,666 of 234,720 ALMs used ( 53 % ). 

 

 

  1. The first step is to generate a list of all source to destination registers that cross a gray code domain and extract their timing information. I created a TCL script that runs in TimeQuest which generates this list and outputs it to a file. This information can't be calculated from the SDC file due to limits Quartus has in which functions can be called during SDC. So I do it ahead of time and store the information in a file. 

    • Use get_registers to get a list of all the gray coded registers for both the read and write sides of the FIFO. 

    • Use get_timing_paths -from to get a list of all paths that go from a gray coded register to another register. 

    • Write the timing information along with the path source and destination registers to the output file. 

    • Each line in the file is: src_reg_name dest_reg_name src_clock dst_clock src_period dst_period counter_width_in_bits 

    • Not all of the information in the output file is used but it is easy to calculate so I added it in case it becomes useful later. 

     

     

  2. In my SDC file I use set_clock_groups to tell the system which clocks are related. 

  3. In my SDC file, after all the set_clock_groups statements, I call another TCL script that dynamically generates timing constraints for each source->destination pair in the file output from the first stage. 

    • I create a new clock for each source and destination clock listed in the file. All the skew constraints will use the new clocks. Remember, the original clocks have clock groups applied and will be ignored for cross clock timing analysis. Our new clocks will be setup to ignore all signals EXCEPT the gray coded paths we care about. The Zimmer paper discusses why this is necessary. 

    • I use set_min_delay, set_max_delay, and set_false_paths to prevent timing analysis on all signals on the new clocks 

    • I use set_max_delay to set a constraint equal to the desired skew on the new_src_clk->new_dest_clk path. 

    • The tricky part is the skew limit. The set_max_delay command causes Quartus to limit the max_delay path but that path includes the clock_source routing delay to the register. We don't expect that clock_source delay to vary a lot from register to register (for a given clock) but we don't know what that delay is and we don't have a good way to calculate it. So, I set the skew limit to be the source clock period and add a 2ns fudge factor to account for clock source delay. So far that seems to work. 

     

     

  4. To check the skew I use another TCL script that I run from TimeQuest which generates skew reports for each timing model. This script was created by folks at Altera. I modified it slightly to match the names of the registers in my design. 

 

 

There are some issues with this approach: 

 

 

  1. Each build uses gray coded timing information from the last time you ran the generate timing file script. This is not too big a deal since most of the time the gray coded paths don't change that often. 

  2. You have to run the 'generate timing information' script each time you add another FIFO (or remove one) in order to update the timing information file. I run it when I know the design has changed or when it has been a while just for good measure. I don't run it every time because it takes a long time and most of the time the output file is unchanged. 

  3. The script called from the SDC file to dynamically generate skew constraints does not seem to add much delay to the build process. My build take about 2.5 hours and the skew part seems to make < 5min difference - if that. 

  4. The script to generate the timing information (run after the build completes) takes a long time to run (~30 minutes or more). 

  5. The script to generate skew report takes a long time to run (~30 minutes or more). 

  6. The skew limit is fudged a little due to the source_clock routing delay being used by Quartus for set_max_delay. 

 

 

I have attached the files I am using. They are not generic. You will have to modify paths and such. You will also need to change the pattern matching value for the places where lists of registers are being generated. I'm not an expert with TCL so please excuse any strangeness you find! Or make improvements. 

 

 

  1. cv_lib.tcl is a library with routines that are called from other scripts. 

  2. gen_fifo_constraints.tcl is called from TimeQuest to generate the gray code timing data file. 

  3. skew_report.tcl is called from TimeQuest to generate all the skew reports. 

 

 

In you SDC file you will need to add a reference to the cv_lib library. You will also need to make a call to the function create_fifo_skew_constraints after you generate all your clocks and clock constraints. 

 

lappend ::auto_path "<path to directory containing cv_lib>" package require cv_lib ... # generate clocks <create_clock> # constrain clocks <set_clock_groups> ... cv_lib::create_fifo_skew_constraints
0 Kudos
Altera_Forum
Honored Contributor II
3,829 Views

Sorry for my poor English, I'm from China 

 

I found something strange:in the &#12298;SCFIFO and DCFIFO IP Cores User Guide&#12299;&#65292;which is the altera megacore fifo user guide&#65292;it says that 

“ 

When using the Quartus II TimeQuest timing analyzer with a design that contains a DCFIFO block apply 

the following false paths to avoid timing failures in the synchronization registers: 

•For paths crossing from the write into the read domain, apply a false path assignment between the 

delayed_wrptr_g and rs_dgwp registers: 

set_false_path -from [get_registers {*dcfifo*delayed_wrptr_g 

[*]}] -to [get_registers 

{*dcfifo*rs_dgwp*}] 

•For paths crossing from the read into the write domain, apply a false path assignment between the 

rdptr_g and ws_dgrp registers: 

set_false_path -from [get_registers {*dcfifo*rdptr_g 

[*]}] -to [get_registers 

{*dcfifo*ws_dgrp*}] 

The false path assignments are automatically added through the HDL-embedded Synopsis design 

constraint (SDC) commands when you compile your design. The related message is shown under the 

TimeQuest timing analyzer report. 

Note: The constraints are internally applied but are not written to the Synopsis Design Constraint File 

(.sdc). To view the embedded-false path, type report_sdc in the console pane of the TimeQuest 

timing analyzer GUI. 

” 

To my knowledge, the gray code of pointer in the fifo should use the set_max_delay to avoid the ptr delay exceed 1 clk cycle of fastest clk. It should not use the set_false_path constrain to the dcfifo 

 

I don't know why. 

 

When I generate the async fifo in the vivado of xilinx , I can find that the tool generate the set_max_delay constrain in the XDC file. 

 

 

I was confused
0 Kudos
Altera_Forum
Honored Contributor II
3,828 Views

Yes I think you are confused. 

 

The pointers cross async clock domain and so a false path is a must otherwise timing will be reported as failed on these paths and will waste closure efforts. 

set max delay is a separate issue that you can choose to apply if it helps. 

If Xilinx does it automatically in their fifo I hope Altera will follow but I know they put registers close enough anyway by some internal invisible secrets.
0 Kudos
Reply