- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have recently upgraded Quartus II from 7.3 SP2 to 8.1 and am very happy with the compilation speed (about 25% faster in my case). But does anyone know how much faster the 9.0 is comparing to 8.1? Any reason not to upgrade to 9.0 at this point?
Thanks in advance. HuaLink Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To be honest, it's worth installing and trying for yourself. (Luckily, Quartus II plays quite nicely when having multiple versions installed, although I've heard complaints about SOPC builder, but I don't use it.)
The reason I say that is it's very design dependent. About the timeframe you're talking(7.2 to 8.1, I don't think there was a 7.3), the physical synthesis algorithms were improved a lot, so that easily could be the gains you see. Linux versions were also greatly improved(they were actually far behind windows for a technical reason, and were moved to being close to par). It also depends on what family you're targeting, as older families generally won't see specific algorithmic impromvenets, just generalized improvements(like the Linux improvement). I don't think 9.0 is going to improve as much, but it's worth trying to find out.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the quick reply, Rysc. I understand that it is design dependent, but I will give it a try and post the result here just in case someone (like myself) want to get an idea about it.
If anyone who had done this sort of comparison and like to share the result, please feel free to post here too. Thanks.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
I have a somewhat disappointing result for our design here: With Quartus 8.1, the fitter needs between 2h15 and 3h15 for different seeds, synthesis is 51min. With Quartus 9.0, the fitter needs between 3 and 6 hours, with the mean value around 5h. Synthesis is somewhat shorter (46min). I have to say that the design is made for an ASIC - and even there the clocking strategy is not very nice. For an FPGA it is disastrous. So if anyone has a cool trick with settings to remedy hold time violations due to clock skew - please tell me ;) I can't change the design for now... Thanks emanuel- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are you saying that Q9.0 is optimize positive hold requirements inside the device and that's why it's taking longer? Note that when look at compile times, I always look at the fitter messages, as it breaks out placement time, physical synthesis time(which is part of total placement) and route time, just so I can monitor where the fitter is spending time, and when comparing different results, I know what changed.
If you have Assignments -> Fitter -> Optimize Hold Timing = All Paths, and Optimize Fast Corner checked, then you already have the cool settings to have the router add delays to meet your hold requirements. But naturally this increases routing resources and can make compile times get longer, or even make the design hard to route. This is expected in an FPGA with large clock skews. (There is also an Assignment -> Analysys & Synthesis -> More Settings -> Auto Gated Clock Conversion, which could help a lot of it works on your design. I don't know enough about it, and there are plenty of gated clock implementations that can't be improved, but it's probably worth a try.)- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- With Quartus 8.1, the fitter needs between 2h15 and 3h15 for different seeds, synthesis is 51min. With Quartus 9.0, the fitter needs between 3 and 6 hours, with the mean value around 5h. Synthesis is somewhat shorter (46min). --- Quote End --- Hi emanuel, Have you taken the incremental compilation into consideration? Usually compiling the second time takes much shorter time than the first time. If the compilation time in 9.0 is a first-time compile while the time for 8.1 is not, it may not be fair. I am having a cyclone 3 design compiling right now in 9.0 for the second time and will post the result soon. Hua
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So here is my result:
3C120 design, same source, same constraints, two independant directory, compiled twice each for 8.1 and 9.0 (for the second time the design was changed by one line, which pulls different signals to test pins). Resource usage: 52% in 9.0 and 54% in 8.1. 8.1: ; Analysis & Synthesis ; 00:11:00 ; 1.0 ; 417 MB ; 00:10:52; ; Partition Merge ; 00:00:32 ; 1.0 ; 345 MB ; 00:00:32; ; Fitter ; 00:30:28 ; 1.3 ; 899 MB ; 00:36:48; ; Assembler ; 00:00:18 ; 1.0 ; 514 MB ; 00:00:17; ; TimeQuest Timing Analyzer ; 00:01:36 ; 1.5 ; 617 MB ; 00:02:21; ; Design Assistant ; 00:01:52 ; 1.0 ; 474 MB ; 00:01:51; ; EDA Netlist Writer ; 00:00:46 ; 1.0 ; 468 MB ; 00:00:46; ; Analysis & Synthesis ; 00:02:47 ; 1.0 ; 415 MB ; 00:02:44; ; Partition Merge ; 00:00:33 ; 1.0 ; 346 MB ; 00:00:31; ; Fitter ; 00:30:18 ; 1.3 ; 896 MB ; 00:36:37; ; Assembler ; 00:00:17 ; 1.0 ; 514 MB ; 00:00:17; ; TimeQuest Timing Analyzer ; 00:01:35 ; 1.5 ; 616 MB ; 00:02:20; ; Design Assistant ; 00:01:51 ; 1.0 ; 473 MB ; 00:01:51; ; EDA Netlist Writer ; 00:00:46 ; 1.0 ; 467 MB ; 00:00:46; ; Total ; 01:24:39 ; -- ; -- ; 01:38:33; 9.0 ; Analysis & Synthesis ; 00:24:48 ; 1.0 ; 919 MB ; 00:24:31; ; Partition Merge ; 00:00:31 ; 1.0 ; 279 MB ; 00:00:3; ; Fitter ; 00:25:02 ; 1.5 ; 954 MB ; 00:29:45; ; Assembler ; 00:00:18 ; 1.0 ; 517 MB ; 00:00:18; ; TimeQuest Timing Analyzer ; 00:02:22 ; 1.1 ; 613 MB ; 00:02:27; ; Design Assistant ; 00:01:48 ; 1.0 ; 480 MB ; 00:01:47; ; EDA Netlist Writer ; 00:00:42 ; 1.0 ; 480 MB ; 00:00:41; ; Analysis & Synthesis ; 00:06:46 ; 1.0 ; 950 MB ; 00:06:35; ; Partition Merge ; 00:00:34 ; 1.0 ; 283 MB ; 00:00:31; ; Fitter ; 00:24:37 ; 1.5 ; 933 MB ; 00:29:27; ; Assembler ; 00:00:18 ; 1.0 ; 517 MB ; 00:00:18; ; TimeQuest Timing Analyzer ; 00:02:24 ; 1.1 ; 611 MB ; 00:02:29; ; Design Assistant ; 00:01:50 ; 1.0 ; 478 MB ; 00:01:49; ; EDA Netlist Writer ; 00:00:42 ; 1.0 ; 479 MB ; 00:00:41; ; Total ; 01:32:42 ; -- ; -- ; 01:41:50;- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In one word, 8.1 is quite faster in Analysis and Synthesis but slower in fitter. Also 8.1 uses much less memory in Analysis and Synthesis.
It would be interesting to know what takes 9.0 so much time in Analysis and Synthesis.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hua,
In Quartus II 9.0, some physical synthesis optimizations are carried out during the Analysis & Synthesis stage, and some are carried out during Fitter stage. So depending on physical synthesis options you have turned on, Analysis and Synthesis could take longer. IIRC, when the physical synthesis effort level is set to Fast, physical synthesis optimizations are done only during Analysis & Synthesis. That might explain the increase in memory usage during this stage too.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
thanks for the numbers, Hua. I agree with sw181, there are also many more options concerning the Synthesis step, so you can probably trim that time down, if the design allows this. My comparison is both for full compilation and full hold time optimisation. And the "optimisation" is not for getting better positive values here, but rather something close to zero at all. I guess Q9.0 is trying much more aggressively to get a satisfying result, thus takes more time. I have to say that this is not working very well here, I get about the same results as with Q8.1 (that was much better than Q8.0). I tried the clock gate conversion, it doesn't help much - we have all this stuff that breaks the conversion, even though I sometimes don't understand why. Maybe I should play with some constraints... /emanuel- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- Hi all, And the "optimisation" is not for getting better positive values here, but rather something close to zero at all. /emanuel --- Quote End --- Just curious, how do you optimize the hold time toward zero? BTW, my comparison was conducted on two copies of the same 8.1 project in two different directory. So they actually have the same configuration and constraints.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I guess I'm talking a bit weird again... sorry ;)
I have a negative total hold slack for two out of ten seeds. So I'm rather trying to get zero instead of something negative, not optimising positive values. I'm not trying to push the hold time slack to zero from any direction (guess that is what you question was). I'm just trying to "fix" (optimise is too positive here) a very bad timing.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you have a gated clock, and therefore have clock skew so the latching clock is longer than the launching clock, you end up having to add delays to meet hold timing. So even though the requirement is 0ns, in essence it still has to add delay to meet timing. Naturally anything that can be done in the code to fix this would be good. (Like moving the gating signals from logic to drive the enable of the altclkctrl block...)
How long is the route time compared to placemen? Most designs it's much less, say 5%, but your design may be skewed. Also curious what the average and peak routing utilization is? (in .fit.rpt). And do you have Optimize Fast Corner checked? (I think you're aware of all this, just double-checking...)- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, we definitely need a lot of delay introduced. I haven't tried to run without optimize hold timings for some time, but I guess I would end up somewhere in the micro-seconds range TNS.
I can't currently do much against that, as the ASIC design is quite frozen atm. The only thing I'm not sure is if I could push the clock gating conversion for a better result with other timing constraints and a smart clock gating cell replacement. It doesn't convert anything right now. Concerning elapsed times: Info: Fitter placement operations ending: elapsed time is 00:21:32 Info: Fitter routing operations ending: elapsed time is 02:06:53 So I'm on the far other end then, I need 600% for the routing :-/ Concerning utilisation: Info: Average interconnect usage is 15% of the available device resources Info: Peak interconnect usage is 55% of the available device resources But I have seen much higher values here, this one is for a quite nice fit. Logic utilization is 33%. I have the optimize fast corner timing checked for 8.1. But I recently uncecked the optimize multi corner timing in 9.0, hoping to reduce the compile time somewhat.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yowsers. At the end of the day, I think it is what it is. Routing is probably the biggest use of silicon in an FPGA, so they're really not designed to have lots of extra resources to fix hold violations. You may try to manually add delay to your clock trees and see if theres any way to balance them. (I don't know how many clocks you have, if they're on globals, etc.) One other thing of note is that in Q9.0 they added an "Estimated Delay Added for Hold" table, or something like that. You're already aware of why it's being added, but might be a good chart to gauge what the router is doing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Getting quite off-topic here...
I've been messing around with different kinds of location assignments, trying to release the fitter, but somehow it was always worse (what a wonder...). There is always a point where the skew ends up in a loop. We have some 20 clocks here, thereof about 5 asynchronous. The good part of the dividers are ripple counters (I hope there is no student or professor reading that now...). So somehow, I'm not really surprised by the bad result ;) The only disappointing thing is the clock gating conversion introduced in Q8.1. From a colleague I know that Synplify managed to convert the tree. Anyway, we are probably starting a new version this summer, so I'll try to fix this mess somewhat. They were running into problems on the ASIC as well, just much less. So back to the topic: This will also reduce the compilation time heavily I guess.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page