Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16597 Discussions

How can I figure out what made my Fitter (Place and Route) time balloon?

Altera_Forum
Honored Contributor II
2,123 Views

I have a design that was rather small and was only taking 10 minutes to complete from start to finish and was only taking a small fraction of the space on an Arria10. I added a portion of code (three components) and a few custom components to the NIOSII and now the fitter time has ballooned to over two hours. When it finally finishes, is there any good way to look at the report and see what took so long? I can go through the design and take pieces out until I find the one that is taking all the time, but I'm hoping there's a more efficient way to zero in on the offending logic.  

Quartus Prime Standard Edition 16.1.2 on Windows 7.
0 Kudos
4 Replies
Altera_Forum
Honored Contributor II
1,037 Views

You can't really see which bit took how long because that's not how the fitter works. It looks at the design as a whole and uses random seeds along with timing driven guesswork to get the design to fit. 

 

As designs get larger, especially if you have high clock rates, long combinational chains, or lots of stuff trying to pack into one small area, it takes longer to fit. This is in part due to congestion in resources, and partly due to trying to find a solution that can meet timing. As you pack more into a small space it takes more effort to find a way to pack everything in without running out of resources. 

 

One thing you can do is to partition the design. By splitting up the design into smaller groups of related "stuff", you can give the fitter a helping hand by showing it what bits are intended to be closely related to each other. This helps it optimise how it is trying to fit the design. Additionally you can then use LogicLock regions to place sections of the design is parts of the FPGA which can also help it identify what goes where. 

 

 

Based on the fact that you are talking about adding components to a NIOS processor, that suggests that you are building the system in Qsys. If this is the case and you are adding lots of components to the data (or instruction) bus of the NIOS processor, you basically increase the amount of behind the scenes logic that must be added. Qsys adds a lot of Avalon-MM fabric (glue logic) for address decoding, bus arbitration, and other mapping logic. This primarily ends up being a massive cloud of combinational logic with lots of stuff trying to pack together as close to the NIOS processor as possible. This is pretty much a perfect storm for increasing fitting times. 

 

You can reduce this issue somewhat if speed in your system allows by adding Avalon-MM pipeline bridges into the design to split some of the peripherals off into smaller buses. These pipeline stages add some latency to access, but they also break up the glue logic up by adding extra register stages between them. By reducing the length of combinational paths and adding pipelines, you allow the fitter to move the logic further away from the NIOS processor without adversely affecting timing. This in turn reduces compile times by reducing congestion. 

 

Of course you could also turn off some fitter optimisations to further reduce compile time. However doing this usually doesn't achieve the desired outcome and it generally increases the likelihood of timing issues.
0 Kudos
Altera_Forum
Honored Contributor II
1,037 Views

All good recommendations by TCWORLD. Also, what are the specs of the machine you're compiling on, including how much RAM you have?

0 Kudos
Altera_Forum
Honored Contributor II
1,037 Views

Thanks! 

Partitioning is a great idea. One other thing I did was look through the timing report to see if anything stuck out. Sure enough, the were a lot of address lines going to a memory that were (apparently) hard to place in order to make timing. Adding an extra register stage seems to have helped and I'm going to look for more ways to reduce the combinational logic as TCWORLD suggested. 

To answer sstrell's question, I am working on an Intel i7 (~3GHz) with 12GB of RAM. Yes, I know I should get some more RAM too.
0 Kudos
Altera_Forum
Honored Contributor II
1,037 Views

I found the problem. Should anyone else run into a similar problem, they *might* have done what I did: write VHDL code meant to infer a RAM that did *not* infer a RAM. I used the Netlist viewer to look at the components, found that the RAM was *not* considered a memory. Once I used Qsys to create a RAM and replaced my VHDL code the synthesis time went back to around ten minutes.

0 Kudos
Reply