Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16642 Discussions

Internal Timing and IC delay and Negative clock skew issues.

Altera_Forum
Honored Contributor II
1,080 Views

Internal timing issues are Big threat in my Design. My design requirement is 266mhz. I have single clock and Lot of arithmetic components are used in the design. Megacell functions are instantiated in my RTL. When i verified internal timing with Stand alone Megacell functions , all are working around 300mhz. When i integrated and implemented in Same FPGA, the interconnect delays are huge(60% of the clock) from Register to input of the Megacell functions and unable to achieve the timing.  

 

 

I tried all the advisor's from timing optimization tool. I set global Max fanout of 8. Does this global max fanout is creating any problem?. 

 

Please, can some one suggest to reduce the IC delay in my design . Most of the IC delays are from Register to Megacell functions or where ever mux is present in the design. Other problem ,I am seeing is , Negative Clock skew. Is there any variable to control negative clock skew in my design?.  

# ========================================================= 

; 3.608 ; 0.000 ; ; uTco ; 1 ; DSPOUT_X89_Y18_N2 ; _CAL_R|MSCLC_MUL10X16:U_ 

MSCLC_MUL10X16|lpm_mult:lpm_mult_component|mult_net:auto_generated|result[13] ; 

; 4.217 ; 0.609 ; RR ; CELL ; 1 ; DSPOUT_X89_Y18_N2 ; 0X16|lpm_mult_component|auto_generated| 

mac_out1|dataout[23] ; 

; 5.980 ; 1.763 ; rr ; ic ; 2 ; LCCOMB_X65_Y19_N28 ; ADD_SUB18|lpm_add_sub_component|auto_genera 

ted|add_sub_cella[5]|datac ; 

; 6.438 ; 0.458 ; RR ; CELL ; 1 ; LCCOMB_X65_Y19_N28 ; C_CAL_R|U_MSCLC_ADD_SUB18|lpm_add_sub_component|auto_genera 

ted|add_sub_cella[5]|cout ; 

; 6.438 ; 0.000 ; RR ; IC ; 2 ; LCCOMB_X65_Y19_N30 ; C_ADD_SUB18|lpm_add_sub_component|auto_genera 

ted|add_sub_cella[6]|cin# ========================================================= 

 

Regards, 

Sam
0 Kudos
1 Reply
Altera_Forum
Honored Contributor II
295 Views

- What device and speed grade? 

- How full is it? What's the interconnect usage? 

- Having stand-alone components meet timing is a lot different than everything meet timing. Testing stuff by itself usually gives pretty-ideal placement, as there are no other restrictions. But as you put more stuff into the device, the resources start squeezing each other out. You get control signals that pull multiple components together. All sorts of things usually pop-up that make the problem more difficult than the stand-alone attempt. 

- Your critical path is a DSP block to a register. Where is the other DSP block placed at that's feeding into this add_sub component? If it's down low(in the X65 range or below), that's what's pulling the add_sub FF away from the other DSP block. That or the destinatino of that add_sub component is further away.  

- I would double-check the routing. If using Q8.0, then right-click, locate to Chip Editor, then click on the Expand button twice. Make sure it's using a Manhattan route. (Also note that there are different routing structures. So as a device fills up, components may have to use less than ideal routes that were used in your stand-alone test). I stronly doubt the route is an issue, as DSP designs usually aren't very route intensive, but it's worth looking at, as 1.7663ns does seem pretty large(I assume you're in the fastest speed grade). 

- If you're not too full, you may want to do some LogicLocking. By this I just mean large rectangles that help guide Quartus in the actual floorplan of the device. I've seen this work before. For example, I had a design with 12 DSP functions(filters) that were all pretty stand-alone and then talked to central logic. By putting them in LLRs around the device, I helped Quartus understand the flow of the design and got slightly better results. But note that it's easy to overLogicLock, or do it wrong, and hurt the performance of the design, so tread carefully. 

- The max fanout won't affect DSP blocks. It won't duplicate entire DSP blocks, just registers. Plus, all the parts of this critical path already have a low fan-out, so that's not an issue. 

- Negative skew? I assume you're on a single clock, and the negative skew is small. This generally occurs when going between the general logic fabric and memory blocks or DSP blocks, which have their own specialized "branch" off the global tree that sometimes has a slightly different delay. It should be pretty small and have little affect compared to the routing your attempting to fix, and is really just something that exists(have a global that hits every register at the exact same time down to the picosecond isn't really possible)... 

Good luck.
0 Kudos
Reply