Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
17252 Discussions

Reducing Interconnect (IC) delay with evenly distributed pipeline stages

Altera_Forum
Honored Contributor II
4,296 Views

Hi, 

 

I am getting setup violations because of large interconnect delay. I've added several pipeline stages to reduce the interconnect delay but Quartus is not doing a very good job of spreading the pipeline registers out. It is clustering some of the pipeline registers whereas I would expect them to be evenly distributed. Here are some examples of the pipeline register locations: 

 

Example 1: 

x146 y1 n17 (source) 

x146 y1 n13 (intermediate register) 

x144 y1 n35 (intermediate register) 

x144 y1 n11 (intermediate register) 

x63 y1 n28 (intermediate register) 

x49 y0 n115 (destination) 

 

Example 2: 

x139 y7 n37 (source) 

x66 y4 n45 (intermediate register) 

x66 y4 n2 (intermediate register) 

x65 y4 n56 (intermediate register) 

x65 y4 n14 (intermediate register) 

x38 y0 n43 (destination) 

 

Is there a trick to get Quartus to spread out the pipeline registers in a more intelligent manner? I'm aware of manual location assignments as an option but when there are hundreds of such registers, manual placement is not very appealing. 

 

Thanks, 

Philippe 

 

P.S. 

Here's how I implemented the pipeline stages. I used the syn_preserve attribute to prevent Quartus from inferring an lpm_shiftreg. 

 

reg source; 

reg [3:0] pipeline /* synthesis syn_preserve = 1 */; 

reg destination; 

 

always @ (posedge clk) begin 

pipeline <= {pipeline[2:0],source}; 

destination <= pipeline[3]; 

end
0 Kudos
9 Replies
Altera_Forum
Honored Contributor II
2,611 Views

Have you thought about creating regions? This way you can assign things (whole entities down to individual registers) to be placed into the region, and they will usually get priority over other logic.

0 Kudos
Altera_Forum
Honored Contributor II
2,611 Views

adding registers back to back does not help reduce delay. You need to break down comb. logic cloud into smaller chunks using register

0 Kudos
Altera_Forum
Honored Contributor II
2,611 Views

Thanks for the replies. 

 

Tricky: 

 

I am fanning out a single source register to several output registers in IO cells across the FPGA with register stages in between with the hopes of reducing interconnect delay. The source register is somewhere in the middle of the IO cells. I'm not sure how to define a region to improve timing in this scenario. My initial thought was to manually place each register with location assignments of the type: 

set_location_assignment FF_X46_Y1_N1 -to pipeline[1] 

 

Kaz: 

 

There is no combinational logic between the register stages. I just have a chain of registers going from point A to point B where the distance between A and B is about half the FPGA. If I try to cover that distance in 1 hop (A->B) then TimeQuest reports setup violations and I can see that the culprit is huge interconnect (IC) delay. So I add register stages between A and B to create smaller hops (less IC delay between each stage). The problem is that Quartus clusters the extra registers between A and B so there is still a hop that is large enough to cause a setup violation. 

 

 

Update: I've resorted to manual location assignments to limit the distance between each register stage. It works.
0 Kudos
Altera_Forum
Honored Contributor II
2,611 Views

 

--- Quote Start ---  

Thanks for the replies. 

 

Tricky: 

 

I am fanning out a single source register to several output registers in IO cells across the FPGA with register stages in between with the hopes of reducing interconnect delay. The source register is somewhere in the middle of the IO cells. I'm not sure how to define a region to improve timing in this scenario. My initial thought was to manually place each register with location assignments of the type: 

set_location_assignment FF_X46_Y1_N1 -to pipeline[1] 

 

Kaz: 

 

There is no combinational logic between the register stages. I just have a chain of registers going from point A to point B where the distance between A and B is about half the FPGA. If I try to cover that distance in 1 hop (A->B) then TimeQuest reports setup violations and I can see that the culprit is huge interconnect (IC) delay. So I add register stages between A and B to create smaller hops (less IC delay between each stage). The problem is that Quartus clusters the extra registers between A and B so there is still a hop that is large enough to cause a setup violation. 

 

 

Update: I've resorted to manual location assignments to limit the distance between each register stage. It works. 

--- Quote End ---  

 

 

Normally quartus should not put unduely long delays between registers unless there is too many layers of comb. logic. 

I am not sure I understand why quartus puts such delay from A to B in your case. Moreover if you have freedom to put extra pipe registers then it means you don't worry about the rule of one clock of timing(default multicycle) and so your solution of extra pipe registers is equivalent to adding multicycle and then save some registers. 

In short, you can just add multicycle, or set max delay or add pipe that quartus spreads along path or force manual fitting.
0 Kudos
Altera_Forum
Honored Contributor II
2,611 Views

How to you specify timing requirements in your design? Do you constrain the desired clock rate? Do you set maximum path delays between the registers in question?

0 Kudos
Altera_Forum
Honored Contributor II
2,611 Views

 

--- Quote Start ---  

Thanks for the replies. 

 

Tricky: 

 

I am fanning out a single source register to several output registers in IO cells across the FPGA with register stages in between with the hopes of reducing interconnect delay. The source register is somewhere in the middle of the IO cells. I'm not sure how to define a region to improve timing in this scenario. My initial thought was to manually place each register with location assignments of the type: 

set_location_assignment FF_X46_Y1_N1 -to pipeline[1] 

 

Kaz: 

 

There is no combinational logic between the register stages. I just have a chain of registers going from point A to point B where the distance between A and B is about half the FPGA. If I try to cover that distance in 1 hop (A->B) then TimeQuest reports setup violations and I can see that the culprit is huge interconnect (IC) delay. So I add register stages between A and B to create smaller hops (less IC delay between each stage). The problem is that Quartus clusters the extra registers between A and B so there is still a hop that is large enough to cause a setup violation. 

 

 

Update: I've resorted to manual location assignments to limit the distance between each register stage. It works. 

--- Quote End ---  

 

 

Philippe - 

 

That's unfortunate you had to lock things down. That makes for an inflexible design if things change in the future. 

 

I'm curious if your problem even makes sense. What device and speed grade are you targeting and what is the frequency of "clk"? Also, what is the fanout from source to destination? And what version of Quartus are you running? 

 

Bob
0 Kudos
Altera_Forum
Honored Contributor II
2,611 Views

 

--- Quote Start ---  

Normally quartus should not put unduely long delays between registers unless there is too many layers of comb. logic. 

I am not sure I understand why quartus puts such delay from A to B in your case. Moreover if you have freedom to put extra pipe registers then it means you don't worry about the rule of one clock of timing(default multicycle) and so your solution of extra pipe registers is equivalent to adding multicycle and then save some registers. 

In short, you can just add multicycle, or set max delay or add pipe that quartus spreads along path or force manual fitting. 

--- Quote End ---  

 

 

I think you're right. I should be able to add multicycles to shift the setup and hold window enough to pass timing. I haven't used multicycles before but I'm aware of them. IIRC, they are used to push the data required time back by a configurable amount of cycles. Thanks for the idea. 

 

 

--- Quote Start ---  

How to you specify timing requirements in your design? Do you constrain the desired clock rate? Do you set maximum path delays between the registers in question? 

--- Quote End ---  

 

 

I use create_clock for clocks coming from input pins and derive_pll_clocks to let Quartus define pll output clocks. That's usually sufficient. I do not constrain path delays between registers. 

 

 

--- Quote Start ---  

Philippe - 

 

That's unfortunate you had to lock things down. That makes for an inflexible design if things change in the future. 

 

I'm curious if your problem even makes sense. What device and speed grade are you targeting and what is the frequency of "clk"? Also, what is the fanout from source to destination? And what version of Quartus are you running? 

 

Bob 

--- Quote End ---  

 

 

Device: 5SGXMA7N2F40C3 

Quartus: 13.1.4 Build 182 

Fanout is 36. 

The clock frequency is a bit out of spec at 781.25 MHz (1.28ns) which is why Quartus was having trouble meeting setup requirements. This clock is driving a minimal amount of logic. It is just being used to serialize 4-bit words that are being clocked at 195.3125 MHz. I know that the LVCMOS IOs in Stratix V are rated at 166 MHz but the resulting serial signal toggles no faster than 10 MHz. However, we need the serial clock to be fast in order to have finer control of the edge locations. 

 

 

update 

First, a clarification on the general design. I am fanning out 4-bit words to 36 IOs across the FPGA at 195.3125 MHz. Next to each IO there is some logic to serialize the 4-bit words at 781.25 MHz. 

 

When I made my first post, my design had no location constraints and Quartus did not place the serialization logic very close to the IOs that were being driven. This failed timing because the IC delay between the serialization logic and the IO was large (large relative to a 1.28 ns setup relationship). I tried adding register stages between the serialization logic and the IO cell to reduce the IC delay between each stage but Quartus clustered the register stages together so there was always one stage with IC delay greater than 1.28 ns. This prompted my question about how to get Quartus to evenly spread out a chain of registers. 

 

Since then, I added location constraints to place the fanned-out 4-bit words right next to each IO (this section meets timing easily because it is clocked at 195.3125 MHz). This helped Quartus place the serialization logic right next to the IOs so IC delay is no longer a problem. Thanks for all your responses!
0 Kudos
Altera_Forum
Honored Contributor II
2,611 Views

Well I feel a bit silly. I just removed all location assignments as a test and Quartus was able to meet all timings better than any previous compilation. I've tried so many different things in the last 2 days that it's hard to remember all my steps but here's the general idea: 

 

1) I started out with a poor design that could not meet timing (dozens of failing paths). 

 

2) I made timing optimizations to the code (duplicating logic so that each IO cell had its own serialization logic) and I added location constraints on key registers. Timing greatly improved but a handful of paths still failed. About 6 out of 180 of the registers that I had manually placed still had large interconnect delay even though the source and destination were right next to each other (ex. FF_X49_Y1_N1 --> FF_X49_Y1_N22 with 1.1 ns IC delay)! It's as if the signal was taking a long detour (maybe I had placed the register in a congested area?) and my location constraint was doing more harm than good. 

 

3) I removed all location constraints and timing passed. 

 

It seems like writing better code is much more effective at improving timing than trying to beat Quartus at placing logic. I'm left wondering if register location constraints are ever a good idea.
0 Kudos
Altera_Forum
Honored Contributor II
2,611 Views

Wow, you're definitely stretching the clock frequency limits, Philippe. In fact I think you've blown through them. But it sounds like you've got it working. Nice work! 

 

Bob
0 Kudos
Reply