Compiler error, not able to generate hardware

Altera_Forum · ‎11-28-2017

Hi,

I had a hardware generation failure


aoc: Environment checks are completed successfully.
You are now compiling the full flow!!
aoc: Selected target board a10soc
aoc: Running OpenCL parser....
aoc: OpenCL parser completed successfully.
aoc: Compiling....
aoc: Linking with IP library ...
aoc: Restarting compile without lmem replication because of estimated overutilization!
aoc: Compiling....
aoc: Linking with IP library ...
+--------------------------------------------------------------------+
; Estimated Resource Usage Summary                                   ;
+----------------------------------------+---------------------------+
; Resource                               + Usage                     ;
+----------------------------------------+---------------------------+
; Logic utilization                      ;   74%                     ;
; ALUTs                                  ;   33%                     ;
; Dedicated logic registers              ;   42%                     ;
; Memory blocks                          ;   62%                     ;
; DSP blocks                             ;   23%                     ;
+----------------------------------------+---------------------------;
aoc: First stage compilation completed successfully.
Error: Compiler Error, not able to generate hardware

And in the quartus_sh_compile.log, there is this message:


 Info (170196): Router estimated peak interconnect usage is 153% of the available device resources in the region that extends from location X62_Y47 to location X73_Y58
Info (188005): Design requires adding a large amount of routing delay for some signals to meet hold time requirements, and there is an excessive demand for the available routing resources. The Fitter is reducing the routing delays of some signals to help the routing algorithm converge, but doing so may cause hold time failures. For more information, refer to the "Estimated Delay Added for Hold Timing" section in the Fitter report.
Warning (16684): The router is trying to resolve an exceedingly large amount of congestion. At the moment, it predicts long routing run time and/or significant setup or hold timing failures. Congestion details can be found in the Chip Planner.
Warning (16618): Fitter routing phase terminated due to routing congestion. Congestion details can be found in Chip Planner.
Critical Warning (188026): The Fitter failed to successfully route the design.  You may be able get this design to route by making design modifications, changing the fitter seed or by enabling the Fitter Aggressive Routability Optimizations logic option.
    Info (188027): The highest placement effort tried by the fitter during this compile was: 1.54
Error (170143): Final fitting attempt was unsuccessful

I don't know if the problem comes from:


aoc: Restarting compile without lmem replication because of estimated overutilization!

And this is my top.fit.summary (not over utilization too)


Fitter Status : Failed - Mon Nov 27 22:55:13 2017
Quartus Prime Version : 16.1.0 Build 196 10/24/2016 SJ Pro Edition
Revision Name : top
Top-level Entity Name : top
Family : Arria 10
Device : 10AS066N3F40E2SG
Timing Models : Final
Logic utilization (in ALMs) : 215,752 / 251,680 ( 86 % )
Total registers : 430793
Total pins : 140 / 812 ( 17 % )
Total virtual pins : 0
Total block memory bits : 3,380,040 / 43,642,880 ( 8 % )
Total RAM Blocks : 1,553 / 2,131 ( 73 % )
Total DSP Blocks : 379 / 1,687 ( 22 % )
Total HSSI RX channels : 0 / 48 ( 0 % )
Total HSSI TX channels : 0 / 48 ( 0 % )
Total PLLs : 4 / 96 ( 4 % )

Any advice would be greatly appreciated!

Altera_Forum · ‎11-28-2017

With such high logic utilization, and the fact that OpenCL on Arria 10 uses partial reconfiguration, I am not surprised your kernel is not routing. There is little to nothing you can do to avoid routing failures, other than making changes in your kernel to significantly reduce the logic utilization. In my experience, on Arria 10, successful routing in conjunction with partial reconfiguration becomes unlikely with over 60% logic utilization. I am not sure what you are doing that needs so much logic, though; unlike Stratix V, logic utilization is generally not a bottleneck on Arria 10, and I have managed to fill all of the DSPs and BRAMs on Arria 10 with ~50% logic utilization.

Regarding the lmem replication message, as discussed before, the compiler will replicate all the local memory buffers on the FPGA based on the number of accesses to the buffers, to allow all pipeline stages to be able to access the buffers in parallel. Then, the compiler performs a check after full replication and if it estimates that the kernel would fail to fit due to Block RAM overutilization, it restarts the compilation and instead of fully-parallel accesses, opts for sharing reads and write ports from/to the local memory buffers to reduce Block RAM utilization and allow the kernel to fit on the device. Of course this will come at the cost of lower performance due to pipeline stalls. To fix this problem, minimize the number of accesses to your buffers and make sure all the accesses are properly coalesced. You can also use the attributes described in "Intel FPGA SDK for OpenCL Best Practices Guide, Section 7.5" for more fine-grained control over the replication factor.