I'm trying to finalize a design in an Arria 10, using Quartus Prime Pro 16.1 , where a signal is passed from a global clock to a related peripheral clock of the same frequency (they are both derived from the same source, the only phase difference coming from the different delays for the clock buffers). I'm struggling to close timing though as TimeQuest is giving an insanely long interconnect time for moving data within the same slice (see below).Incr RF Type Fanout Location 0.181 FF uTco 1 FF_X1_Y144_N7 <- comes from the global clock 0.098 FF CELL 1 FF_X1_Y144_N7 11.416 FF IC 1 FF_X1_Y144_N1 0.000 FF CELL 1 FF_X1_Y144_N1 <- latched in peripheral clock Has anyone else encountered anything similar? Any ideas how to solve this? Is it even possible to take ~11.5 ns to move data from FF_X1_Y144_N7 to FF_X1_Y144_N1? Thanks all, Andy
If the clocks for source and destination have the same properties, is there reason why you're using two different clock domains? Passing from a global to a periphery clock resource is always going to cause a skew issue.
This is part of a design using the 10G transceivers. The registers on the peripheral clocks interact directly with the PHY, the global clock is for everything else. It's all part of an experiment to derive the device clock from the recovered transceiver clock...I'm not convinced that this is just representing skew though - should that not be represented as part of the clock path rather than the data path?
Yes, I have seen long timing paths due to routing delays in A10 designs. Check for congestion in the chip-planner. Also locate failing path and select highlight routing. If it takes multi-segment routes you may have an issue of available routing resources. Post a picture of your routing wire utilization. Good luck!
Good call on checking what the routing was trying to do - turns out it actually was routing the signal half way round the device in an attempt to make a hold requirement that is utterly nuts! I'm working on playing with multicycle constraints to get it to actually analyze the path in a more sane way...Thanks all for the help!
Thank you for commenting back. It is also good to know that I am not alone seeing this behavior on the A10 P&R. Before you go too far with multi-cycle constraints try to place regions for your major logic in vertical direction (S5 had horizontal direction preference). You can look up a10_ref OpenCL BSP for reference. Each region ~ 80% full. And try to push away from you regions unrelated logic like so:set_instance_assignment -name PLACE_REGION "8 119 22 137" -to top|your_logic_instance set_instance_assignment -name RESERVE_PLACE_REGION ON -to top|your_logic_instance set_instance_assignment -name CORE_ONLY_PLACE_REGION OFF -to top|your_logic_instance set_instance_assignment -name ROUTE_REGION -to top|your_logic_instance "8 119 22 137" X/Y numbers are out of my design and they should change or your placement. The idea is to spread the logic vertically, so P&R would not make the "mess" in the center of the chip. This should gain you better chances of routing having less segments and therefore less delay.
--- Quote Start --- Good call on checking what the routing was trying to do - turns out it actually was routing the signal half way round the device in an attempt to make a hold requirement that is utterly nuts! I'm working on playing with multicycle constraints to get it to actually analyze the path in a more sane way... Thanks all for the help! --- Quote End --- This is identical to a problem I had to a global fanout async reset. The Reset was coming from an synchronous source, and for some reason, it decided that a reg somewhere had some crazy hold requirement, and hence routed the single output through a global buffer (as expected), then through some logic on the edge of the chip, then fanout all over the chip giving me 1000s of recovery violations up to 10ns (in a S4) Solution (as supplied by altera) was to relax the hold time on the fanout of this reg, and the problem went away.