Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Valued Contributor III
2,692 Views

Why do these other designs not fit?

Dear all, 

 

I've got another question about why designs do not fit on our Arria 10 board . This time, the designs do not overly use resources. I've used --high-effort.  

 

The first kernel has the following top.fit.summary: 

 

Fitter Status : Failed - Wed May 3 23:08:26 2017 Quartus Prime Version : 16.0.0 Build 211 04/27/2016 SJ Pro Edition Revision Name : top Top-level Entity Name : top Family : Arria 10 Device : 10AX115N3F40E2SG Timing Models : Final Logic utilization (in ALMs) : 122,273 / 427,200 ( 29 % ) Total registers : 263614 Total pins : 288 / 826 ( 35 % ) Total virtual pins : 0 Total block memory bits : 4,057,294 / 55,562,240 ( 7 % ) Total RAM Blocks : 501 / 2,713 ( 18 % ) Total DSP Blocks : 770 / 1,518 ( 51 % ) Total HSSI RX channels : 8 / 48 ( 17 % ) Total HSSI TX channels : 8 / 48 ( 17 % ) Total PLLs : 18 / 112 ( 16 % )  

 

The second one has an even smaller footprint, but also does not route: 

 

Fitter Status : Failed - Thu May 4 00:26:07 2017 Quartus Prime Version : 16.0.0 Build 211 04/27/2016 SJ Pro Edition Revision Name : top Top-level Entity Name : top Family : Arria 10 Device : 10AX115N3F40E2SG Timing Models : Final Logic utilization (in ALMs) : 154,248 / 427,200 ( 36 % ) Total registers : 241333 Total pins : 288 / 826 ( 35 % ) Total virtual pins : 0 Total block memory bits : 2,692,686 / 55,562,240 ( 5 % ) Total RAM Blocks : 296 / 2,713 ( 11 % ) Total DSP Blocks : 128 / 1,518 ( 8 % ) Total HSSI RX channels : 8 / 48 ( 17 % ) Total HSSI TX channels : 8 / 48 ( 17 % ) Total PLLs : 18 / 112 ( 16 % )  

 

Why do these designs not fit while they do not use that many resources?  

 

The quartus_sh_compile for both is attached. Can I make these designs route somehow?  

 

Any suggestions welcome! 

Thanks in advance!
0 Kudos
10 Replies
Highlighted
Valued Contributor III
95 Views

FYI: I also used --profile

0 Kudos
Highlighted
Valued Contributor III
95 Views

Is your top level file in Verilog or VHDL? I have had this issue of design not fitting on the FPGA if I have a large number of buses (signals greater than 1 bit wide) as input or output ports. To overcome this, I exported the ports as Virtual pins in the QSF file (This is when I am doing unit level compilation, and pin assignments do not matter). 

set_instance_assignment -name VITRUAL_PIN ON -to csi_clock_clk ( where csi_clock_clk is a port in the module) 

 

All ports are exported the same way.
0 Kudos
Highlighted
Valued Contributor III
95 Views

The top level file is generated by the OpenCL compiler, I have no influence over it.

0 Kudos
Highlighted
Valued Contributor III
95 Views

I believe the relevant part of the log is this bit: 

 

nfo (170239): Router is attempting to preserve 23.61 percent of routes from an earlier compilation, a user specified Routing Constraints File, or internal routing requirements. Info (170236): Routing optimizations have been running for 1 hour(s) Info (170242): 0 out of 364996 signals have been routed. Info (170238): 804177 interconnect resources are used by multiple signals. Info (170195): Router estimated average interconnect usage is 60% of the available device resources Info (170196): Router estimated peak interconnect usage is 186% of the available device resources in the region that extends from location X165_Y189 to location X176_Y200 Info (188005): Design requires adding a large amount of routing delay for some signals to meet hold time requirements, and there is an excessive demand for the available routing resources. The Fitter is reducing the routing delays of some signals to help the routing algorithm converge, but doing so may cause hold time failures. For more information, refer to the "Estimated Delay Added for Hold Timing" section in the Fitter report. Warning (16684): The router is trying to resolve an exceedingly large amount of congestion. At the moment, it predicts long routing run time and/or significant setup or hold timing failures. Congestion details can be found in the Chip Planner. Warning (16618): Fitter routing phase terminated due to routing congestion. Congestion details can be found in Chip Planner. Critical Warning (188026): The Fitter failed to successfully route the design. You may be able get this design to route by making design modifications, changing the fitter seed or by enabling the Fitter Aggressive Routability Optimizations logic option. Info (188027): The highest placement effort tried by the fitter during this compile was: 1.54 Error (170143): Final fitting attempt was unsuccessful
0 Kudos
Highlighted
Valued Contributor III
95 Views

Try this: Modify the file named "top_synth.qsf" from your BSP and add the following line: 

 

set_global_assignment -name AUTO_PARALLEL_SYNTHESIS OFF 

 

This will disable parallel synthesis for Arria 10. I had an issue like yours that a design with very low area utilization failed to route on Arria 10, while it worked fine on Stratix V. After lots of email exchanges with Altera's support, they found a bug in their synthesizer for Arria 10 which is hopefully going to be resolved in v17.0. They recommended the above workaround for now. 

 

If the fix doesn't work for you, try to compile your design against Altera's reference BSP for Stratix V. If it still doesn't route, there is some issue you have to fix in the design itself.
0 Kudos
Highlighted
Valued Contributor III
95 Views

Thanks!!!! I will try and report.

0 Kudos
Highlighted
Valued Contributor III
95 Views

Sadly, this does not help. As an example of my problems, consider the altera matrix multiplication example from altera's website: 

 

https://www.altera.com/support/support-resources/design-examples/design-software/opencl/matrix-multi... 

 

This was designed for the stratix 5, with a blocksize of 64 and 4 simd paths. The arria 10 has more resources, so to utilize them I instead set the blocksize to 104 and created 8 simd paths. However, this does not route even though the resource usage is not that high. I do not know why or how to reason about this. Any suggestions?
0 Kudos
Highlighted
Valued Contributor III
95 Views

Did you try compiling your original kernel against Altera's reference Stratix V BSP (if it fits)? In fact, you should probably also try compiling it against Altera's reference Arria 10 BSP. 

 

On Arria 10, routability is generally much worse than Stratix V since Altera has switched to Partial Reconfiguration through PCI-E (this adds numerous extra placement and routing constraints). It is getting better by each new version, but very slowly. Unfortunately, for Partial Reconfiguration, timing quality of the BSP also plays a very big part, while the board manufacturers generally aren't very interested in spending much time optimizing their BSP. 

 

My recommendation is to try compiling your kernel against both of Altera's reference BSPs for Stratix V and Arria 10 (s5_ref and a10_ref which are shipped alongside with Quartus). If it did work on the reference Arria 10 BSP, then your problem is due to bad timing quality of the BSP and you should contact your board manufacturer. If, however, it also failed to route with Altera's reference BSP, then you should report it directly to Altera; they are looking for such cases to help improve their Arria 10 placer and router. You should probably also report your findings with the Matrix Multiplication example.
0 Kudos
Highlighted
Valued Contributor III
95 Views

Thanks HRZ! You've been super helpful! I'm trying out your suggestions, on the reference arria the compilation now has been going on for more than 12 hours. Will report later!

0 Kudos
Highlighted
Valued Contributor III
95 Views

After trying it out on various kernels, I can report that the reference BSP takes a long time to compile, but does a much better job at at succeeding than the Nalla BSP. Both these kernels compile fine on the reference Arria 10 BSP. Will report to Nalla.

0 Kudos