Re: critical path and IC delay

Altera_Forum · ‎06-22-2009

Hello,

I try to implement my design on a Stratix 3 260 and my goal is to reach a 200MHz clock

after place and route steps quartus shows me the critical paths, actually all my paths are made of 80% of IC and the component is used near of 50%

Do you have some advices to improve the routing results as the problem seems to be due to routing congestion ??

thanks

Altera_Forum · ‎06-22-2009

Normally the best thing to do to improve FMax is to increase the pipelining in the design. For max speed, you want no more than a single LUT between 1 register and the next.

Altera_Forum · ‎06-22-2009

I have already optimized my design by adding registers to break the critical path between to registers , as a result of that, after placement step in quartus a message say me that :

Info: Estimated most critical path is memory to register delay of 5.293 ns

Info: Total cell delay = 2.658 ns ( 50.22 % )

Info: Total interconnect delay = 2.635 ns ( 49.78 % )

which is near of my goal of 5ns (200MHZ) , so for me it's good

BUT ! after the routing step the critical path is near of 6.3 ns and is made of 20% cell delay and 80% IC delay .... how can I avoid this ?

thanks

Altera_Forum · ‎06-22-2009

You are targetting Fmax of 200MHz and for your device you should get it readily unless your chip is really full. You can try the seed value if you don't want any redesign headaches.

Your thoughts about most critical path are not clear. The breakup into cell delay and interconnect delay doesn't help much in localising the problem. It may not be localised in the first place as floating timing violation may occur when chip reaches its limit.

I am not sure what you mean by (after routing...) I thought the timing analyser reports on issues after routing anyway.

Altera_Forum · ‎06-22-2009

for kaz,

I don't understand => "You can try the seed value if you don't want any redesign headaches".

"I am not sure what you mean by (after routing...)" =>

during the fitter there are two main steps : the mapping (place) and routing (route) , at the end of the first one quartus informs me that the critical path takes 5.2ns, and after the routing the critical path takes 6.3ns (mainly due of interconnect delay).

Altera_Forum · ‎06-22-2009

Ideally if your problem is localised to part of design then you need to tackle it there. If however you reached the limit of your chip and the violation is random then the seed can help.

When the fitter starts it uses a default random seed value. You can change this seed and it will have up to 10% effect on timing. It is somewhere in the fitter settings.

You can also run several seeds overnight or weekend using the Design space explorer(DSE)

edit: Don't forget life is always random(Darwin knew that first, I wonder if nature used some tool like DSE and how long will it take it to end up as a butterfly cell nucleus)

Altera_Forum · ‎06-22-2009

ok ,

I will try this

thanks

Altera_Forum · ‎06-22-2009

--- Quote Start ---

You are targetting Fmax of 200MHz and for your device you should get it readily

--- Quote End ---

yes indeed.

Altera_Forum · ‎06-22-2009

Routing is very, very seldom the problem. That doesn't mean it's not a large part, but the placement is usually the culprit. For obvious reasons, a spread out placement will cause long routes. (And to be honest, a spread placement is usually caused by a spread design, i.e. something like a mux that might feed multiple components in a device).

Can you list the path details of placement and routing(from TimeQuest, do report_timing with the -file "file.txt" option), or make your own if using TAN. Also, right-click on the path in TimeQuest, Locate -> Chip Planner, and then click the Expand button to see the actual routing. I'm curious if it's pretty much the Manhattan Distance, or pretty close. (Again, routing is almost always good, which is why this is strange). That's at least a good starting point...

Altera_Forum · ‎06-22-2009

my design uses a great amount of memory (M9K) close to 450 M9K, can it be the reason ?

Altera_Forum · ‎06-22-2009

Yes, but there are a ton of things that can cause the problem(it's seldom black-and-white). Using memories causes the routing congestion to go up, just like using any resource. One thing to look at in the fit report is routing utilization(average and peak). There's no perfect rule of thumb, but it would be worth posting. (If your path is using direct routes, then it's not a routing issue anyway).

If M9Ks are being used to build larger rams, then it can be a problem too. For example, if it was building an 18Kx72 RAM, for example, then your address bits would be fanning out to 144 M9Ks, and are naturally going to have long hops(unless logic is duplicated, either in the code, with directives or physical synthesis).

Altera_Forum · ‎06-22-2009

When I face speed problems due to crowded logic then memory blocks rescue me. For a given functionality it means less routing...

Altera_Forum · ‎06-22-2009

Yes my design is "spread", basically it is composed of 180 parallel functional units, each take data to M9Ks (simple dual port) and the outputs are shuffle to the others FU memories by a large barrel shifter ( 180 per 8bits) ... .memories addressing is shared and controlled by a global controller

--- Quote Start ---

Routing is very, very seldom the problem. That doesn't mean it's not a large part, but the placement is usually the culprit. For obvious reasons, a spread out placement will cause long routes. (And to be honest, a spread placement is usually caused by a spread design, i.e. something like a mux that might feed multiple components in a device).

Can you list the path details of placement and routing(from TimeQuest, do report_timing with the -file "file.txt" option), or make your own if using TAN. Also, right-click on the path in TimeQuest, Locate -> Chip Planner, and then click the Expand button to see the actual routing. I'm curious if it's pretty much the Manhattan Distance, or pretty close. (Again, routing is almost always good, which is why this is strange). That's at least a good starting point...

--- Quote End ---

Altera_Forum · ‎06-23-2009

For Rysc =>

here the report_timing file , this design has been routed with the speed option and the result is worst than with the balanced option ....

--- Quote Start ---

Routing is very, very seldom the problem. That doesn't mean it's not a large part, but the placement is usually the culprit. For obvious reasons, a spread out placement will cause long routes. (And to be honest, a spread placement is usually caused by a spread design, i.e. something like a mux that might feed multiple components in a device).

Can you list the path details of placement and routing(from TimeQuest, do report_timing with the -file "file.txt" option), or make your own if using TAN. Also, right-click on the path in TimeQuest, Locate -> Chip Planner, and then click the Expand button to see the actual routing. I'm curious if it's pretty much the Manhattan Distance, or pretty close. (Again, routing is almost always good, which is why this is strange). That's at least a good starting point...

--- Quote End ---

Altera_Forum · ‎06-24-2009

fb_35,

The following line in the report caught my attention:

>>; 3.515 ; 0.000 ; FF ; CELL ; 1164 ; FF_X67_Y52_N1 ; in_mode_init~DUPLICATE|q

Here is a Flipflop that is driving 1164 nodes. Can you try splitting (duplicating that logic)? Or is that already a duplicated flop?

Altera_Forum · ‎06-24-2009

hello sw181,

this signal seems to be already duplicated as the original name is only in_mode_init .

my design is composed of 180 fu, in each fu this signal is used to select an operation mode, so this signal is used by the 180 fu and it appears that quartus has some difficulties to handle that ...

any advices to limit this fanout ?

thanks

Altera_Forum · ‎06-24-2009

Just a note on your barrel shift register(180 x 8):

Have you implemented that in logic or memory? If logic it will need 1440 flips at least. This can cause congestion. I will prefer memory implemented shift

Altera_Forum · ‎06-24-2009

=> kaz

yes the barrel shifter uses logic, how can I use memory for that ?

Altera_Forum · ‎06-24-2009

You can instantiate a ram based shift register megafunction. There are some restrictions but you can set the depth/width and taps as neccessary.

look at this thread and others similar

http://alteraforum.com/forum/showthread.php?t=5075&highlight=shift+register

Altera_Forum · ‎06-24-2009

--- Quote Start ---

hello sw181,

this signal seems to be already duplicated as the original name is only in_mode_init .

--- Quote End ---

did you duplicate it or did the tool? try enabling more of the physical synthesis options.

Altera_Forum · ‎06-24-2009

--- Quote Start ---

did you duplicate it or did the tool? try enabling more of the physical synthesis options.

--- Quote End ---

the duplication has been made by quartus,

a lot of critical paths in my design are due to signals with high fanout like this one