I am trying to close timing on an Arria 10 Design (A057K2F40I2). I would like to know if some of the problems I am seeing are device limitations, or just tool issues or operator error.
The first concerns setup time to I/O registers. Here is an example, from Chip Planner:
The delay from the pad to the register input exceeds 5 ns (see below).
This seems a bit high to me, but perhaps mediocre I/O cell performance is a limitation of mid-priced FPGAs.
The second problem regards synthesis results, and the effect therof on timing. We have a block of 32-bit registers, which can be accessed via a PCIe endpoint and Avalon shim interface. There are about 100 - 150 registers, and a 14-bit address. The readback mux for the register block is failing setup time by a significant margin. I would expect the mux for each read data bit to be implemented separately, and the resulting combinatorial circuit would have fewer than 200 inputs. With 6-input LUTs, it should be possible to implement that logic in three or four levels. But I am seeing on the order of 30 levels:
I have added a couple of extra clock ticks and am using a multicycle constraint, but I am still seeing a bunch of paths that don't make it.
Here is the chip planner view for the above mux:
So is this an Arria 10 limitation? I have tried Prime Pro 18.1 and 19.2, as well as Prime Standard 18.2, and I get similar results in each case.
I am sorry, but I am unable to insert or attach image files (not sure if this is some IT problem on my end or what). Regardless, I think there is enough information in my posting to describe the problem, primarily:
- Internal delay in an I/O cell (pad to register input) can be as high as 5 ns
- a combinatorial circuit with ~150 inputs requires 30 levels of 6-input LUTs (not just 30 LUTs, a a timing chain 30 LUTs deep).
Can you at least post Timing Analyzer reports on the failing paths, especially the Data Path tab in a detailed report, to see all the resources the failing paths are passing through?
Here is the list of LABs and MLABs for one path. I would post the entire report, but it would not format nicely. To my mind, this is way too many layers for 6-input LUTs.
These are just the locations of the resources the path is going through. I would want to see the node names that correspond to your design. Screenshots from the Timing Analyzer would be easiest or you can save the report as ASCII text.
Well, the node names are mostly machine-generated gibberish, but here they are:
The combinatorial path, in this case is from read_addr to reg_rdbk_data, and the RTL code is a case statement.
I would also need to see the timing numbers. Is there no way you can just post the whole report? And is there a reason why the .sdc can't be posted? That would help a lot.
You will have to explain to me how the timing values, or the .sdc file, will lead to an answer as to why the synthesizer/P&R/fitter tools are implementing this particular bit of code in a way in which it can not possibly make timing, because that is what I need to understand.
There could be an error in your .sdc or things like unnecessary multicycle exceptions that could increase routing delay. Whatever the Fitter does is guided by your timing constraints, so they are the first and most important thing to look at.
In order for the community to help you have to add the archive of your project, it is the easiest way to analyse your issue.
Timing problems are sometimes very complex, and in your case, it could absolutely unreal to help you without additional information from you.
If it is a problem for some reason to attach the whole project, you need to create a simple test version of your project.