Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
386 Views

Timing closure on Arria 10

I am trying to close timing on an Arria 10 Design (A057K2F40I2). I would like to know if some of the problems I am seeing are device limitations, or just tool issues or operator error.

The first concerns setup time to I/O registers. Here is an example, from Chip Planner:

The delay from the pad to the register input exceeds 5 ns (see below).

This seems a bit high to me, but perhaps mediocre I/O cell performance is a limitation of mid-priced FPGAs.

 

The second problem regards synthesis results, and the effect therof on timing. We have a block of 32-bit registers, which can be accessed via a PCIe endpoint and Avalon shim interface. There are about 100 - 150 registers, and a 14-bit address. The readback mux for the register block is failing setup time by a significant margin. I would expect the mux for each read data bit to be implemented separately, and the resulting combinatorial circuit would have fewer than 200 inputs. With 6-input LUTs, it should be possible to implement that logic in three or four levels. But I am seeing on the order of 30 levels:

I have added a couple of extra clock ticks and am using a multicycle constraint, but I am still seeing a bunch of paths that don't make it.

Here is the chip planner view for the above mux:

So is this an Arria 10 limitation? I have tried Prime Pro 18.1 and 19.2, as well as Prime Standard 18.2, and I get similar results in each case.

 

0 Kudos
15 Replies
Highlighted
38 Views

Hi,

 

The screenshot cannot be seen. Could you reattach? Do you have the block diagram?

 

Thanks.

 

0 Kudos
Highlighted
Beginner
38 Views

I am sorry, but I am unable to insert or attach image files (not sure if this is some IT problem on my end or what). Regardless, I think there is enough information in my posting to describe the problem, primarily:

  • Internal delay in an I/O cell (pad to register input) can be as high as 5 ns
  • a combinatorial circuit with ~150 inputs requires 30 levels of 6-input LUTs (not just 30 LUTs, a a timing chain 30 LUTs deep).

Thanks again.

 

 

 

0 Kudos
Highlighted
Moderator
38 Views

Can you post your .sdc file? That will be much more useful to see than Chip Planner screenshots.

 

#iwork4intel

0 Kudos
Highlighted
Beginner
38 Views

No, I can't, sorry.

0 Kudos
Highlighted
Moderator
38 Views

Can you at least post Timing Analyzer reports on the failing paths, especially the Data Path tab in a detailed report, to see all the resources the failing paths are passing through?

 

#iwork4intel

0 Kudos
Highlighted
Beginner
38 Views

Here is the list of LABs and MLABs for one path. I would post the entire report, but it would not format nicely. To my mind, this is way too many layers for 6-input LUTs.

 

 

FF_X54_Y124_N38

FF_X54_Y124_N38

MLABCELL_X55_Y126_N39

MLABCELL_X55_Y126_N39

MLABCELL_X55_Y126_N39

LABCELL_X54_Y127_N30

LABCELL_X54_Y127_N30

LABCELL_X54_Y127_N30

LABCELL_X45_Y130_N54

LABCELL_X45_Y130_N54

LABCELL_X45_Y130_N54

MLABCELL_X49_Y129_N57

MLABCELL_X49_Y129_N57

MLABCELL_X49_Y129_N57

LABCELL_X50_Y127_N27

LABCELL_X50_Y127_N27

LABCELL_X50_Y127_N27

LABCELL_X51_Y127_N0

LABCELL_X51_Y127_N0

LABCELL_X51_Y127_N0

LABCELL_X51_Y125_N6

LABCELL_X51_Y125_N6

LABCELL_X51_Y125_N6

LABCELL_X50_Y125_N9

LABCELL_X50_Y125_N9

LABCELL_X50_Y125_N9

LABCELL_X51_Y122_N24

LABCELL_X51_Y122_N24

LABCELL_X51_Y122_N24

LABCELL_X50_Y122_N0

LABCELL_X50_Y122_N0

LABCELL_X50_Y122_N0

LABCELL_X51_Y120_N54

LABCELL_X51_Y120_N54

LABCELL_X51_Y120_N54

LABCELL_X51_Y120_N42

LABCELL_X51_Y120_N42

LABCELL_X51_Y120_N42

LABCELL_X50_Y119_N42

LABCELL_X50_Y119_N42

LABCELL_X50_Y119_N42

MLABCELL_X49_Y116_N9

MLABCELL_X49_Y116_N9

MLABCELL_X49_Y116_N9

LABCELL_X48_Y116_N18

LABCELL_X48_Y116_N18

LABCELL_X48_Y116_N18

MLABCELL_X42_Y117_N0

MLABCELL_X42_Y117_N0

MLABCELL_X42_Y117_N0

MLABCELL_X38_Y121_N12

MLABCELL_X38_Y121_N12

MLABCELL_X38_Y121_N12

LABCELL_X37_Y122_N24

LABCELL_X37_Y122_N24

LABCELL_X37_Y122_N24

LABCELL_X35_Y124_N51

LABCELL_X35_Y124_N51

LABCELL_X35_Y124_N51

LABCELL_X34_Y124_N54

LABCELL_X34_Y124_N54

LABCELL_X34_Y124_N54

LABCELL_X30_Y120_N51

LABCELL_X30_Y120_N51

LABCELL_X30_Y120_N51

LABCELL_X31_Y120_N9

LABCELL_X31_Y120_N9

LABCELL_X31_Y120_N9

LABCELL_X30_Y116_N48

LABCELL_X30_Y116_N48

LABCELL_X30_Y116_N48

LABCELL_X31_Y116_N18

LABCELL_X31_Y116_N18

LABCELL_X31_Y116_N18

LABCELL_X34_Y116_N54

LABCELL_X34_Y116_N54

LABCELL_X34_Y116_N54

LABCELL_X33_Y119_N0

LABCELL_X33_Y119_N0

LABCELL_X33_Y119_N0

MLABCELL_X29_Y113_N24

MLABCELL_X29_Y113_N24

MLABCELL_X29_Y113_N24

LABCELL_X35_Y120_N42

LABCELL_X35_Y120_N42

FF_X35_Y120_N44

FF_X35_Y120_N44

0 Kudos
Highlighted
Moderator
38 Views

These are just the locations of the resources the path is going through. I would want to see the node names that correspond to your design. Screenshots from the Timing Analyzer would be easiest or you can save the report as ASCII text.

 

#iwork4intel

0 Kudos
Highlighted
Beginner
38 Views

Well, the node names are mostly machine-generated gibberish, but here they are:

 

ctl_regs_inst|read_addr[14]|q

ctl_regs_inst|read_addr[14]~la_lab/laboutb[5]

ctl_regs_inst|reduce_nor_339~2|datac

ctl_regs_inst|reduce_nor_339~2|combout

ctl_regs_inst|reduce_nor_339~2~la_mlab/laboutb[7]

ctl_regs_inst|reduce_nor_173|datab

ctl_regs_inst|reduce_nor_173|combout

ctl_regs_inst|reduce_nor_173~la_lab/laboutb[1]

ctl_regs_inst|i6015~0|datad

ctl_regs_inst|i6015~0|combout

ctl_regs_inst|i6015~0~la_lab/laboutb[16]

ctl_regs_inst|i6040|datad

ctl_regs_inst|i6040|combout

ctl_regs_inst|i6040~la_mlab/laboutb[18]

ctl_regs_inst|i6061~0|datac

ctl_regs_inst|i6061~0|combout

ctl_regs_inst|i6061~0~la_lab/laboutt[18]

ctl_regs_inst|i6078~1|dataf

ctl_regs_inst|i6078~1|combout

ctl_regs_inst|i6078~1~la_lab/laboutt[1]

ctl_regs_inst|i6103~0|dataf

ctl_regs_inst|i6103~0|combout

ctl_regs_inst|i6103~0~la_lab/laboutt[4]

ctl_regs_inst|i6111~0|dataf

ctl_regs_inst|i6111~0|combout

ctl_regs_inst|i6111~0~la_lab/laboutt[6]

ctl_regs_inst|i6132~0|datae

ctl_regs_inst|i6132~0|combout

ctl_regs_inst|i6132~0~la_lab/laboutt[17]

ctl_regs_inst|i6166~2|dataf

ctl_regs_inst|i6166~2|combout

ctl_regs_inst|i6166~2~la_lab/laboutt[0]

ctl_regs_inst|Select_34~1|dataf

ctl_regs_inst|Select_34~1|combout

ctl_regs_inst|Select_34~1~la_lab/laboutb[17]

ctl_regs_inst|i6208~0|dataf

ctl_regs_inst|i6208~0|combout

ctl_regs_inst|i6208~0~la_lab/laboutb[9]

ctl_regs_inst|i6229~1|datae

ctl_regs_inst|i6229~1|combout

ctl_regs_inst|i6229~1~la_lab/laboutb[9]

ctl_regs_inst|i6250~2|dataf

ctl_regs_inst|i6250~2|combout

ctl_regs_inst|i6250~2~la_mlab/laboutt[7]

ctl_regs_inst|i6271~2|dataf

ctl_regs_inst|i6271~2|combout

ctl_regs_inst|i6271~2~la_lab/laboutt[13]

ctl_regs_inst|i6288~1|dataf

ctl_regs_inst|i6288~1|combout

ctl_regs_inst|i6288~1~la_mlab/laboutt[0]

ctl_regs_inst|i6309~0|dataf

ctl_regs_inst|i6309~0|combout

ctl_regs_inst|i6309~0~la_mlab/laboutt[9]

ctl_regs_inst|i6343~0|datae

ctl_regs_inst|i6343~0|combout

ctl_regs_inst|i6343~0~la_lab/laboutt[16]

ctl_regs_inst|i6343|datae

ctl_regs_inst|i6343|combout

ctl_regs_inst|i6343~la_lab/laboutb[15]

ctl_regs_inst|i6364~0|datae

ctl_regs_inst|i6364~0|combout

ctl_regs_inst|i6364~0~la_lab/laboutb[16]

ctl_regs_inst|i6385|dataf

ctl_regs_inst|i6385|combout

ctl_regs_inst|i6385~la_lab/laboutb[14]

ctl_regs_inst|i6406~0|datae

ctl_regs_inst|i6406~0|combout

ctl_regs_inst|i6406~0~la_lab/laboutt[7]

ctl_regs_inst|i6427|dataf

ctl_regs_inst|i6427|combout

ctl_regs_inst|i6427~la_lab/laboutb[12]

ctl_regs_inst|i6448~0|dataf

ctl_regs_inst|i6448~0|combout

ctl_regs_inst|i6448~0~la_lab/laboutt[13]

ctl_regs_inst|i6460|dataf

ctl_regs_inst|i6460|combout

ctl_regs_inst|i6460~la_lab/laboutb[17]

ctl_regs_inst|i6510|dataf

ctl_regs_inst|i6510|combout

ctl_regs_inst|i6510~la_lab/laboutt[1]

ctl_regs_inst|Select_26~118|dataa

ctl_regs_inst|Select_26~118|combout

ctl_regs_inst|Select_26~118~la_mlab/laboutt[16]

ctl_regs_inst|Select_26~119|datad

ctl_regs_inst|Select_26~119|combout

ctl_regs_inst|reg_rdbk_data[8]|d

ctl_regs_inst|reg_rdbk_data[8]

 

The combinatorial path, in this case is from read_addr[14] to reg_rdbk_data[8], and the RTL code is a case statement.

0 Kudos
Highlighted
Moderator
38 Views

I would also need to see the timing numbers. Is there no way you can just post the whole report? And is there a reason why the .sdc can't be posted? That would help a lot.

 

#iwork4intel

0 Kudos
Highlighted
Beginner
38 Views

You will have to explain to me how the timing values, or the .sdc file, will lead to an answer as to why the synthesizer/P&R/fitter tools are implementing this particular bit of code in a way in which it can not possibly make timing, because that is what I need to understand.

0 Kudos
Highlighted
Moderator
38 Views

There could be an error in your .sdc or things like unnecessary multicycle exceptions that could increase routing delay. Whatever the Fitter does is guided by your timing constraints, so they are the first and most important thing to look at.

 

#iwork4intel

0 Kudos
Highlighted
38 Views

Hi,

 

Could you post the sdc and timing report?

 

Thanks.

0 Kudos
Highlighted
Beginner
38 Views

I can't post this stuff, it is an ITAR design.

0 Kudos
Highlighted
New Contributor II
38 Views

Hi Seadog.

 

In order for the community to help you have to add the archive of your project, it is the easiest way to analyse your issue.

Timing problems are sometimes very complex, and in your case, it could absolutely unreal to help you without additional information from you.

 

If it is a problem for some reason to attach the whole project, you need to create a simple test version of your project.

 

 

--

Best regards,

Ivan

0 Kudos
Highlighted
38 Views

Hi,

 

May I know if you have any updates?

Thanks.

0 Kudos