What is currently the recommended PC system for fastest compilations (for Arria 10's)? We are aware of memory recommendations, but what about the relative importance of other parameters:
- number of CPU cores
- maximum CPU single core frequency (turbo)
- CPU cache size
- DDR memory bus speed
- DDR memory number of controller channels (2/4)
- CPU architecture
Could you please elaborate on this? I think it would be very useful information for a lot of FPGA developers.
Faster cores are better than more cores. The compilation will spend a lot of time in a single core. This will be the single biggest factor (after ram requirements) that will affect build speed. With more cores, you can also run more builds in parallel (ensure you have enough ram)
More cache is better
I also think an SSD can improve times a bit, as there are large databases that may get written to disk, so you may save a bit of time here, but it is mostly going in and out of ram.
The supporting OS are in this link: https://www.intel.com/content/www/us/en/programmable/support/support-resources/download/os-support.h...
The knowledge base item is from 2013, I understand this as it is still valid for Quartus Prime Pro? Does it make sense to invest into more CPU cores? Could you please confirm or deny whether QPro fitter uses only a single core (as stated somewhere else on the forum)?
Does it make sense to have a CPU with four memory controllers instead of the usual two? Will this have significant impact on compilation time or does most of the computation happen in chunks of netlist stored in the CPU cache, so the DDR memory bandwidth is not that relevant to compilation time?
Thanks for all the answers :)
The Compiler can detect and use multiple processors to reduce total compilation time. You specify the number of processors the Compiler uses. The Intel Quartus Prime software can use up to 16 processors to run algorithms in parallel.
Parallel compilation reduces the compilation time by up to 10% on systems with two processing cores, and by up to 20% on systems with four cores. When running timing analysis independently, two processors reduce the timing analysis time by an average of 10%. This reduction reaches an average of 15% when using four processors.
The Intel Quartus Prime software does not necessarily use all the processors that you specify during a given compilation. Additionally, the software never uses more than the specified number of processors. This fact enables you to work on other tasks without slowing down your computer. The use of multiple processors does not affect the quality of the fit. For a given Fitter seed, and given Maximum processors allowed setting on a specific design, the fit is exactly the same and deterministic. This remains true, regardless of the target machine, and the number of available processors. Different Maximum processors allowed specifications produce different results of the same quality. The impact is similar to changing the Fitter seed setting.
My 2 cents:
- In practice you should not expect more than 4 cores to be efficiently utilized during placement and routing of one design. Hence, investing in more than 6 cores (to leave room for other work) is a waste of money unless you want to perform parallel compilations. In reality, you will very likely need to do parallel compilations so more cores will not hurt.
- The operating frequency of the CPU should be the first priority but if you want high core count, then you are limited to low operating frequencies.
- The cache size will likely make not much difference since memory usage is in the order of tens of GBs and the cache hitrate is likely not very good.
- Get the fastest DDR memory you can; however, if you go with a server CPU, you will be limited to slow memory modules (usually up to 2666 MHz).
- More memory channels means more memory bandwidth which can improve the speed of the "memory-intensive" process of placement and routing.
- CPU architecture will make little difference since there have hardly been any major architectural improvements in the past 5-6 years; however, going with the latest architectures is likely the best choice since you also get higher operating frequency and better memory controller.
For me, since I use OpenCL to create very large designs, the main bottleneck is the "memory size" since that is what limits the number of parallel compilations I can do per node. On a node with 256 GB of memory and 2 x 10-core modern Xeons I can only do 4 parallel compilations targetting Arria 10, or 5 targetting Stratix V. I still have unused cores in both cases but I cannot add any more parallel compilations becuase then the compilations will start crashing due to running out of memory.
Thank you for all the answers.
In the end, we went with Intel i9 7980XE with 64 GB of quad channel memory and the overall CPU utilization looks quite good (in the middle of Fitter place stage with Quartus Prime Pro 18.1):
The processor has 18 cores and they do not look bored :) The interleaving of busy and idle cores might be caused by each physical core being virtualized as two logical ones, so when Quartus throws the same code thread to all available cores, the two logical cores forming one physical compete for the same physical resources(?)