Many threads on this forum involve clocks that are not driven directly by device pins or PLLs. This thread addresses design considerations for clocks driven by registers or combinational logic.The main focus of this forum thread is how to avoid problems with ripple and gated clocks that can make timing closure difficult or result in failures in actual hardware even when the reported timing seems good. For related information about ripple and gated clocks, see the document written by someone else at http://www.alteraforum.com/forum/showthread.php?t=2250. That document has many examples of clock circuits like various kinds of clock dividers and clock muxes with instructions for using TimeQuest to constrain them. The Quartus II Design Assistant has some design-rule checks for ripple and gated clocks. This forum thread does not cover all considerations related to the Design Assistant rules. For more information, see the Quartus II on-line help page for each rule. Run the Design Assistant by itself using "Processing --> Start --> Start Design Assistant." Run it during compilation by enabling it at "Assignments --> Settings --> Design Assistant (category on left side) --> Run Design Assistant during compilation (checkbox)." terminology used in this forum thread: In Quartus II terminology, "ripple clock" means any clock driven by a register. A common case is a clock divider. In Quartus II terminology, "gated clock" can mean any clock driven by an unregistered logic function, usually by a LUT or ALUT in the FPGA logic array blocks. Gated clocks can provide an on/off gating function. The term also applies to clocks driven by other combinational logic functions like clock multiplexers. This thread discusses on/off gating and clock multiplexing in particular, but the information applies to any clock path containing combinational logic. In this thread, "derived clock" means either a ripple clock or gated clock. It is essentially any clock that is not driven directly by a device pin or PLL. In this thread, "global routing" refers to device-wide global clock networks, regional clock networks, dual-regional clock networks, and fast regional clock networks. design guidelines for ripple and gated clocks: These guidelines are the recommended choices listed in order from most to least preferred. #1: Do not use ripple or gated clocks; use clock enables or PLLs instead. #2: Have no synchronous data paths going to or from the derived clock domain. #3: If you have synchronous data paths going to or from the derived clock domain, then add clock uncertainty. #4: If you have hold violations going to or from the derived clock domain, then set "Optimize Hold Timing" to "All Paths". #5: If you have setup or hold violations going to or from a derived clock domain using global routing, then try nonglobal routing. Each of these guidelines is covered in a separate post. Design guidelines# 1 and# 2 prevent the negative consequences of clock skew resulting from ripple and gated clocks. Design guidelines# 3,# 4, and# 5 reduce but do not necessarily eliminate the negative consequences of the clock skew. The clock skew considerations covered in the post for design guideline# 3 also apply to designs using design guidelines# 4 or# 5. Gated clocks can have timing hazards such as glitches. Design guidelines to avoid timing hazards in gated clocks like clock muxes are covered in a separate post. disclaimer: not all the information in this forum thread is in altera documentation. even though this is not all official information from altera, i have learned much of it from knowledgeable people at altera.
Divided clocks and on/off gated clocks are common cases where you can use clock enables or PLLs instead of ripple or gated clocks.divided clocks: You can always avoid using a ripple clock to do a divide-by-n function. If you are writing new HDL, use a PLL or clock enable from the beginning. If you are reusing existing HDL that has a ripple clock, consider changing the design to use a PLL or clock enable. If the divided clock is fast enough to be driven by a PLL and if a PLL output is available in the design, consider doing the frequency division in a PLL. If there are synchronous data paths between the full-speed clock domain and the divide-by-n clock domain, skew created by the PLL will be minimized by driving the full-speed clock with a x1 output of the PLL (PLL output at same frequency as input) instead of using the clock input of the PLL to clock the full-speed data registers. With a x1 output of the PLL driving the full-speed clock, the PLL compensation delay in the clock paths will be the same for both the full-speed and divide-by-n domains; the PLL compensation delay will not create skew. If the full-speed clock domain uses the clock input of the PLL, then cross-domain data paths will have clock skew because only the divide-by-n domain will have the PLL compensation delay in the clock paths. Even in that case the PLL implementation is preferred over a ripple clock; the clock skew created by the PLL will potentially have less variation over the range of operating conditions than clock skew created by a ripple clock. (See the post for design guideline# 3 for more about clock skew.) Any divide-by-n function can be implemented with a divide-by-n clock enable instead of dividing the frequency of the actual clock. Clock the registers in the divide-by-n domain with the full-speed clock. Enable the registers with a clock enable that is asserted every nth clock cycle of the full-speed clock. The functional behavior will be the same as having a separate clock domain running at the divide-by-n frequency. If you are converting existing HDL from a divide-by-n logic-driven clock, it might be less work to convert to a divided clock driven by a PLL than to convert to a divide-by-n clock enable. A divide-by-n clock enable has some advantages over doing the frequency division in a PLL: A clock enable does not have a lower frequency limit. A clock enable does not introduce jitter, static phase error, or any other cause of clock uncertainty. A clock enable does not consume a PLL resource. A clock enable does not cause clock skew. For recommended HDL coding styles using clock enables, see the templates in the Quartus II text editor. In version 7.2, the templates are at "Verilog HDL --> Logic --> Registers" and "VHDL --> Logic --> Registers" in the “Insert Template” dialog box. For either a PLL or a divide-by-n clock enable, the data paths within the divide-by-n domain have n times the full-speed-clock period for setup. For a PLL, the timing analyzer knows the setup requirement based on the PLL output clock period. For a divide-by-n clock enable, use multicycle exceptions so that the timing analyzer can compute the setup requirement. Set the multicycle setup to n. For TimeQuest, set the multicycle hold to n minus 1; for the Classic Timing Analyzer, set the multicycle hold to n. For an example of how to use multicycle exceptions for a clock enable in TimeQuest, see http://www.altera.com/support/examples/timequest/exm-tq-clock-enable.html. No matter what the n value is for a divide-by-n clock enable, the clock enable paths from the clock enable source to the data registers have to operate in a single clock cycle. If you decide to do the divide-by-n function with a ripple clock despite this design guideline, then tell the timing analyzer that the ripple clock is derived from a base clock with a divide-by-n frequency. In TimeQuest use create_generated_clock. on/off gated clocks: Instead of gating a clock to stop it, you can use a clock enable. A clock enable has advantages over a gated clock: A clock enable does not cause clock skew. No special design considerations are necessary to avoid timing hazards like glitches or runt pulses with a clock enable. Either a gated clock or a clock enable will reduce power by reducing the toggling of registers, but only a gated clock reduces the power from toggling on the clock network. timing closure on clock enable paths: Clock enable paths have to operate in a single clock cycle. This is the case for a divide-by-n clock enable as well as for a clock enable that provides the functionality of an on/off gated clock. Because there is a large delay associated with the global buffer, the timing might be better with the clock enable signal using nonglobal routing instead of global even for a high-fan-out clock enable. If the clock enable has a high fan-out, there might be significant interconnect delay from nonglobal routing. However, the significant delay for the global buffer itself might be worse for timing. Usually the biggest advantage of global routing is to minimize skew, and skew does not matter for a clock enable as it does for a clock. Using the "Global Signal" assignment in the Assignment Editor, you can try the clock enable using both global and nonglobal routing to see which has better timing. Some people fear that it will be hard to meet the timing requirement on clock enable paths for a high-fan-out clock enable that must operate in a single clock cycle. First, even a high fan-out on a clock enable is not likely to be the main culprit if timing closure on these paths is challenging. Second, the clock enable might not be as high a fan-out on a single signal as you would expect from the RTL. Synthesis tools tend to include other logic in the clock enable in addition to what is directly implied by the HDL "if" statement for the RTL clock enable. That's why you often see a large number of clock enable signals in the "Control Signals" table in the Fitter compilation report. If you do have a timing problem from the fan-out on a clock enable using nonglobal routing, then replicate the source of the clock enable. There are multiple ways to do this ranging from letting the tools do a brute-force replication without regard to where the clock enable destinations need to be placed to a manual replication in the RTL that groups the destinations according to where they will be placed on the device. Brute-force methods available in the Quartus II software include the "Maximum Fan-Out" assignment in the Assignment Editor and the equivalent maxfan synthesis attribute. Most likely you will not have a problem meeting the timing requirement on the clock enable paths. Even if you do, the extra work to do something like clock enable replication in the RTL will give you a better design than a ripple or gated clock, especially if you cannot follow design guideline# 2 for the derived clock.
If you do use a ripple or gated clock to drive a derived clock domain, then have no synchronous data paths going to or from the derived clock domain.Treat all cross-domain data paths as asynchronous paths. Use metastability synchronization registers, handshake signals, etc. to transfer data to or from the derived clock domain. The Quartus II Design Assistant has some design-rule checks for data paths crossing between asynchronous clock domains. Tell the Quartus II software not to analyze timing on the asynchronous cross-domain data paths. In TimeQuest, use set_clock_groups or set_false_path timing exceptions. You can use set_false_path directly on the data paths, but TimeQuest can more efficiently process set_clock_groups or set_false_path between clock names, preventing timing analysis of all data paths crossing between the domains in both directions for set_clock_groups and in the single direction specified for set_false_path. Use create_generated_clock to create a derived clock name that can be used in the timing exception commands. For TimeQuest, you need to create the generated clock anyway if the derived clock is a ripple clock. Use global routing for the derived clock to minimize skew for data paths within the domain. If all cross-domain data paths are treated as asynchronous, then the additional clock skew induced by the global buffer for the cross-domain paths does not matter.
Ripple and gated clocks cause clock skew for cross-domain data paths. If there are synchronous cross-domain paths, this skew affects timing.All the clock-skew issues described below are avoided by following design guidelines# 1 and# 2. If these clock-skew issues concern you, then follow the first two guidelines. For each data path, the clock skew makes the timing worse for clock setup and better for clock hold or vice versa depending on whether the source register's clock or the destination register's clock has the longer clock-path delay. The clock skew makes it more difficult to achieve positive slack on both setup and hold simultaneously. This consideration applies regardless of the on-die-variation considerations that follow. If you have synchronous data paths to or from the derived clock domain and need your design to be extremely reliable, you probably should use clock uncertainty settings to make an allowance for the on-die-variation uncertainty in the timing analysis. In TimeQuest, use set_clock_uncertainty. Don't assume you are OK with positive slack reported for the cross-domain paths if you don't have clock uncertainty settings. The clock skew caused by ripple, gated, and nonglobal clocks is not fully accounted for in the timing analysis. Unless the timing analysis includes on-die variation, the timing analysis uses all numbers at the slow process/voltage/temperature extreme for the slow model and all numbers at the fast PVT extreme for the fast model. The actual numbers in your device probably are not all at the extreme for a given path at your particular process, voltage, and temperature combination. The timing analysis has to compare the clock-path delay to the source register, the clock-path delay to the destination register, and the data-path delay between registers. The clock skew is the difference between clock-path delays. At your particular PVT between the extremes, will the clock skew be a little faster compared to data delay than the extreme numbers say? Will it be a little slower? Without accounting for on-die variation, you don't know. That's beyond the scope of slow-model and fast-model analysis regardless of the FPGA vendor. On-die variation matters when there is clock skew from logic or nonglobal routing in the clock paths. More specifically, the on-die variation matters for the portion of the clock paths that is not common between the source and destination registers. If the source and destination registers are in the same derived clock domain, then only nonglobal routing after the logic driving the derived clock matters (for example, nonglobal routing driven by a divider register or by a clock mux). Using global routing for the derived clock will prevent skew issues for data paths within the derived clock domain. The skew problem with ripple and gated clocks is usually in synchronous data paths that cross to or from the derived clock domain. These paths could be between the derived clock domain and the associated base clock domain, or they could be between two derived clock domains. Until recently, on-die variation in FPGAs was not modeled or analyzed. This hasn't been necessary for designs using global clocks without logic in the clock paths; the guard bands have been good enough to cover the variation. It matters for data paths with clock skew caused by ripple, gated, and nonglobal clocks with significant clock path delay that is not common between the source and destination registers; the guard bands can't cover every possible design with such clocks. With newer silicon technologies, advanced timing modeling and analysis is more important than in the past for all designs, not just those with skew on ripple, gated, and nonglobal clocks. The timing models for the 65 nm families (Cyclone III and Stratix III) account for at least some of the on-die variation, and this variation is included in the analysis by TimeQuest. The Quartus II compilation messages say that the timing models for these families are still preliminary in version 7.2 SP3. It is reasonable to expect more thorough accounting for the on-die variation in the final models. That should provide very accurate timing analysis for the vast majority of designs even if they have ripple or gated clocks. Even in the future for device families that have final timing models accounting for on-die variation, however, it will probably be wise to include some additional clock uncertainty for synchronous cross-domain paths to/from derived clocks (or follow design guidelines# 1 and# 2 to avoid these paths altogether) in designs that must be extremely reliable. Clock uncertainty settings add the uncertainty for all data paths going from the source clock domain to the destination clock domain identified in the setting. Define the derived clock domain as a separate clock domain for timing analysis. In TimeQuest use create_generated_clock. For TimeQuest, you need to create the generated clock anyway if the derived clock is a ripple clock. For either a ripple clock or a gated clock, the separate generated clock will allow you to apply the clock uncertainty to only the cross-domain paths. I can't tell you how much uncertainty to add. Most people don't bother. Most people don't think about it in the first place. Some people assume guard bands in the timing analysis take care of it, but I don't like that argument. Those guard bands are meant to cover other uncertainties--they weren't necessarily intended to cover this one. Guard bands couldn't cover all possible ripple-clock and gated-clock configurations anyway; that would require guard bands that are far too conservative for most designs. Some device families are supported by the TimeQuest derive_clock_uncertainty command, which creates set_clock_uncertainty values automatically. The derive_clock_uncertainty command is for other uncertainty sources like PLL jitter and static phase error; it does not account for the clock-skew-versus-data-delay uncertainty for ripple and gated clocks. (For the kinds of uncertainty that derive_clock_uncertainty does cover, this command is highly recommended for Stratix III, Cyclone III, and HardCopy II.) You will have to decide for yourself how much to allow for the additional uncertainty caused by ripple and gated clocks or whether to bother to account for it at all. Or just follow design guidelines# 1 and# 2 to avoid this issue altogether.
If you have hold violations for data paths going to or from the derived clock domain, then have the Fitter try to fix them. At "Assignments --> Settings --> Fitter Settings (category on left side)", set "Optimize Hold Timing" to "All Paths".This setting will cause the Fitter to insert routing delay in the data path between registers to avoid hold violations. Even if this setting succeeds in eliminating all reported violations, the uncertainty considerations for on-die variation described for design guideline# 3 still apply.
If you have setup or hold violations for data paths going to or from the derived clock domain and the derived clock uses global routing, then try nonglobal routing.There is a large delay associated with the global buffer. This delay contributes to clock skew for cross-domain data paths. The clock skew might be less with nonglobal routing than with global routing. Nonglobal routing will cause some skew for paths within the derived-clock domain. In the post for design guideline# 3, see the discussion of on-die variation for nonglobal routing in clock paths.
advantages of clock control blocks:Some device families have clock control blocks, which are dedicated silicon configured with the altclkctrl megafunction to perform functions like shutting off clocks and multiplexing clocks. Clock control blocks have advantages over logic resources for implementing on/off gating and clock muxes. You have to be more careful to avoid timing hazards like glitches when you have logic resources in a clock path. Logic in a clock path can result in more timing variation across process, voltage, and temperature than a clock control block would have. Logic in a clock path can cause duty cycle distortion. Clock control blocks provide the best method to shut off a clock for power reduction. It is recommended that you use clock control blocks instead of logic when possible. preventing glitches with combinational logic resources in the clock path: Combinational logic can produce glitches on signals. This does not matter for signals in synchronous data paths because a positive setup slack means any glitches will settle out before the signal is latched into the destination register. However, combinational logic can create timing hazards in clock paths and in asynchronous paths like resets. If a clock logic function requires more than one LUT or ALUT, there is a potential for glitches unless the logic is structured so that glitches cannot propagate through the logic to the final LUT output driving the clock. Clock muxes that are too large to fit in a single LUT or ALUT are an example where this applies. A LUT or ALUT output will not glitch for a single input toggling. If, for example, you use a LUT to gate a clock with clock_out = clock_in AND enable, then clock_out will not glitch while enable is inactive even though clock_in continues to toggle. A LUT or ALUT output might glitch if more than one input toggles at about the same time. Even if the second toggling input switches between two locations in the look-up-table RAM that have the same value for the output (making the input a logical don't-care), the output can glitch as the toggling input switches between those two locations. This means that even a 2:1 mux implemented in a single LUT can have glitches on the output while the mux select is held constant. If a LUT implements a clock mux with inputs clk_a and clk_b, then toggling on clk_b can cause glitches on the clock mux output while input clk_a is selected. To avoid timing hazards, the clk_b input to the mux needs to be held static while clk_a is selected (and vice versa). For a clock multiplexer or other clock logic function implemented in logic resources instead of in a clock control block, make sure that each LUT or ALUT has at most one input toggling at a given time. Below is what someone else wrote suggesting a way to do this for a 2:1 mux. You can use "keep" synthesis attributes, instantiate LCELL primitives, or use WYSIWYG primitives to control how the logic is broken down into individual LUTs or ALUTs. --- Quote Start --- Make sure each clock is gated prior to the mux to prevent glitches on the output. So:
clk_a ---| +-----\ +-------+ | AND >---| LCELL |–-- mux_input_a +-----/ +-------+ enable_a ---| clk_b ---| +-----\ +-------+ | AND >---| LCELL |–-- mux_input_b +-----/ +-------+ enable_b ---|The outputs of the LCELLs will feed the mux, and only the active clock will be toggling when it hits the mux as it will be the only enabled clock. The mapper will collapse the "extra" LCELL into the AND, but it will not collapse the AND functionality into the mux, giving you what you want. --- Quote End --- other timing hazards from combinational logic resources in the clock path: Combinational logic in a clock path can create other timing issues like runt pulses, which might or might not matter for the design. For example, disturbances on the output of a clock multiplexer when the mux select lines change might be acceptable to the design. These timing hazards will not be covered here. duty cycle distortion from combinational logic resources in the clock path: Logic in a clock path can cause duty cycle distortion. Delays through logic and routing can be different for falling edges and rising edges. The more logic there is in a clock path, the more potential there is for these falling-edge and rising-edge delay differences to add up to a significant amount of duty cycle distortion. For device technologies older than 65 nm, the timing analysis does not include rise/fall analysis. For Cyclone III and Stratix III, rise/fall analysis in TimeQuest will account for the duty cycle distortion, but it will still worsen the timing if both edges of the clock are used (for example, source register launching at the clock rising edge and destination register latching at the clock falling edge a half-cycle later with the duty cycle distortion being in the direction of reducing the clock high pulse width). clock multiplexer protection in synthesis: Starting in version 7.1 for the original Stratix device family and newer families, Quartus II integrated synthesis has had clock multiplexer protection. The description for this setting in the "More Analysis & Synthesis Settings" dialog box says that this feature causes multiplexers in clock networks to be decomposed and mapped into trees of 2:1 muxes. It maintains the unateness of clock signals so that TimeQuest can analyze the clock edges properly. (Unateness has to do with how rising or falling edges on LUT inputs result in rising or falling edges on the output.) It also helps to balance the delay from each clock source to the destinations. Clock multiplexer protection in versions 7.1 and 7.2 does not do anything to avoid timing hazards caused by the combinational logic. It is up to the user to implement techniques like the one illustrated above to prevent glitches even with clock multiplexer protection enabled. Because clock multiplexer protection implements a large mux as a tree of 2:1 muxes, it increases the levels of logic in the clock path. The additional levels of logic increase the potential for duty cycle distortion. They also increase the total delay through the clock path, which increases the clock skew if there are synchronous cross-domain data paths going to or from the mux output clock domain. This skew does not matter if design guideline# 2 is followed to avoid synchronous cross-domain paths. timequest for clock muxes: Regardless of whether you implement a clock multiplexer in a clock control block or in logic resources, TimeQuest is the recommended timing analyzer. The Classic Timing Analyzer is limited in its support for clock muxes. There are some TimeQuest clock multiplexer examples at http://www.altera.com/support/examples/timequest/exm-tq-clock-mux.html. For a more extensive set of examples, see the Altera Forum document mentioned in the first post of this thread.
Hi Brad. This article is very helpful for my current work. Will this article be added to the Altera Application Note? It will be a good one!BTW, do you mind if I translate this article into Chinese on my blog for more Chinese engineers to read?