Closing timing on high speed DDR3 interfaces using Cyclone V, Arria V, or Stratix V

cancel
Showing results for 
Search instead for 
Did you mean: 
363 Discussions

Closing timing on high speed DDR3 interfaces using Cyclone V, Arria V, or Stratix V

Closing timing on high speed DDR3 interfaces using Cyclone V, Arria V, or Stratix V

Description

This page is dedicated to users that are seeing poor timing closure performance on Cyclone V, Arria V, and Stratix V DDR3 interfaces related to half rate core to periphery (c2p) registers, DDR address command, and DDR DQS vs CK.  All of these violations will be shown when performing a Report DDR in Timequest or viewing the Timing Analyzer results in Quartus.  The violations prevent achieving the highest specified DDR3 interface rate as shown in our EMIF specification estimator and the FPGA family datasheet.

External Memory Interface (EMIF) Spec Estimator - Intel

The violations are not controllable by user RTL and require manual placement of half rate registers and post ECO D5 delay value changes to achieve timing closure.

This article will list common violations and provide solutions to meet expected DDR3 EMIF interface timing.

Examples of timing violations

*ureset_n_qr_to_hr*dataout_r[*][*] to ddio_outa[*]~DFFHIO
*ureset_n_qr_to_hr*dataout_r[*][*] to ddio_outa[*]~DFFLO
*uras_n_qr_to_hr*dataout_r[*][*] to ddio_outa[*]~DFFHIO
*uras_n_qr_to_hr*dataout_r[*][*] to ddio_outa[*]~DFFLO
*ucas_n_qr_to_hr*dataout_r[*][*] to ddio_outa[*]~DFFHIO
*ucas_n_qr_to_hr*dataout_r[*][*] to ddio_outa[*]~DFFLO
*ubank_qr_to_hr*dataout_r[*][*] to ddio_outa[*]~DFFHIO
*ubank_qr_to_hr*dataout_r[*][*] to ddio_outa[*]~DFFLO
*ucs_n_qr_to_hr*dataout_r[*][*] to ddio_outa[*]~DFFHIO
*ucs_n_qr_to_hr*dataout_r[*][*] to ddio_outa[*]~DFFLO
*uodt_qr_to_hr*dataout_r[*][*] to ddio_outa[*]~DFFHIO
*uodt_qr_to_hr *dataout_r[*][*] to ddio_outa[*]~DFFLO
*ucke_qr_to_hr*dataout_r[*][*] to ddio_outa[*]~DFFHIO
*ucke_qr_to_hr*dataout_r[*][*] to ddio_outa[*]~DFFLO
*uwe_n_qr_to_hr*dataout_r[*][*] to ddio_outa[*]~DFFHIO
*uwe_n_qr_to_hr*dataout_r[*][*] to ddio_outa[*]~DFFLO
*uaddress_qr_to_hr*dataout_r[*][*] to ddio_outa[*]~DFFHIO
*uaddress_qr_to_hr*dataout_r[*][*] to ddio_outa[*]~DFFLO

Which will be from the half rate clock, pll_hr_clk, to the phase shifted half rate clock pll_addr_cmd_clk.

*|pll0|pll_hr_clk to *|pll0|pll_addr_cmd_clk

 

DDR address command negative setup or hold slack that may look similar to the following:

mik_Intel_0-1667505490907.png

mik_Intel_1-1667505497292.png

DDR DQS vs CK negative setup or hold slack that may look similar to the following:

mik_Intel_2-1667505520625.png
mik_Intel_3-1667505527383.png

Quartus interpretation of *dataout_r[*][*] to ddio_outa[*]~DFFHIO register timing relationship

It seems that Quartus is not properly recognizing the phase relationship between the half rate clock, pll_hr_clk and the phase shifted half rate clock pll_addr_cmd_clk.  Even though the clocks are synchronous, it is as if Quartus does not make enough effort to place the source dataout_r registers close enough to the periphery IO register.

Quartus interpretation of Address Command and DQS vs CK timing

It seems that Quartus is not properly setting the D5 Delay ideally within the IO cell.  The balance between setup and hold is not properly being managed.  In most cases, there is enough hold slack for a setup violation to reduce the D5 delay or enough setup slack for a hold violation to increase the D5 delay. Quartus is not choosing the correct D5 delay value.

mik_Intel_4-1667506551454.png

 

Solution for *dataout_r[*][*] to ddio_outa[*]~DFFHIO and *dataout_r[*][*] to ddio_outa[*]~DFFLO and  core to periphery (c2p) timing violations

Firstly, the phase relationship between the half rate clock, pll_hr_clk, and the phase shifted half rate clock pll_addr_cmd_clk make it very challenging to meet timing.  By default, the DDR3 IP generates a 225 degree phase shifted half rate clock for the pll_addr_cmd_clk.  The pll_hr_clk is not phase shifted, so the pll_hr_clk relationship to the pll_addr_cmd_clk is such that the max setup is roughly 5/8s the period of the half rate clock. So, if the half rate clock is 250Mhz, the max setup is roughly (1/250Mhz * 5/8) = 2.5ns. To gain more setup time for the long interconnect (IC) delay, we can also phase shift the pll_hr_clk.  In many cases, adding 315 degrees of phase shift will be appropriate which effectively will give another 1/8 max setup, or another 250ps to try and meet a challenging setup window.  However, there is no option in the DDR3 IP generation to change the phase shift of the pll_hr_clk. The changes need to be made manually in two of the IP generated files.

<DDR3_IP_name>_p0_parameters.tcl
<DDR3_IP_name>_pll0.sv

 Open <DDR3_IP_name>_p0_parameters.tcl and update the p0_pll_phase for the following two lines:

set ::GLOBAL_<DDR3_IP_name>_p0_pll_phase(7) 0.0
set ::GLOBAL_<DDR3_IP_name>_p0_pll_phase(PLL_HR_CLK) 0.0

Change to the following for a 315 degree phase shift.

set ::GLOBAL_<DDR3_IP_name>_p0_pll_phase(7) 315.0
set ::GLOBAL_<DDR3_IP_name>_p0_pll_phase(PLL_HR_CLK) 315.0

Open <DDR3_IP_name>_pll0.sv and update the p0_pll_phase for the following line:

parameter HR_CLK_PHASE       = "0 ps";

Update to reflect a 315 degree phase shift, in ps.  So, for 250Mhz (1/250Mhz * 7/8) = 3500ps.

parameter HR_CLK_PHASE       = "3500 ps";

If you need to re-generate the DDR3 IP, the above files will be overwritten and you will need to manually make the changes again.

The above change may be enough to meet timing on the core *dataout_r* to periphery IO registers.

If Quartus is still not able to easily close timing, it may be required to do logic locks and place the *dataout_r* registers as close as possible to the periphery IO register.

Use Quartus Logic Lock to lock down the failing or ALL *dataout_r[*][*] registers as close as possible to the EMIF pin within the periphery.  The process can be painful as you may need to get down to each individual register lockdown.  In many cases, multiple *uaddress_qr_to_hr*dataout_r[*][*] and control registers can be placed with the same logic lock.  Here is an example of locking down specific registers close to the periphery at the top of an Arria V.

mik_Intel_5-1667506778072.png

Showing the routing from the locked down registers to the periphery IO registers:

mik_Intel_6-1667506795630.png

Here is an example of similar lock downs at the bottom of an Arria V:

mik_Intel_7-1667506818532.png

Showing the routing from the locked down registers to the periphery IO registers:

mik_Intel_8-1667506845071.png

Please see the attached DDR3_qr_to_hr_dataout_r_Logic_Lock.tcl file for an example of locking down core dataout_r[*][*] registers close to the destination periphery IO registers.   

Solution for Quartus Address Command and DQS vs CK timing violations

The solution for fixing Address Command and DQS vs. CK timing violations is a post compile ECO edit of the D5 delay then performing an ECO compile.

The post compile ECO will be changing the D5 delay value.  Please refer to the forum article 

Changing the D5_Delay value in Quartus post fit to help meet timing on external interfaces  

Here are the rules to determine if you should be incrementing or decrementing the D5 delay setting:

If DQS vs. CK for ddr3 IP has negative setup slack in slow corner, than reduce D5_DELAY of ddr3_dqs_io[3:0] and ddr3_ndqs_io[3:0] by 1.
If DQS vs. CK for ddr3 IP has negative setup slack in fast corner, than increase D5_DELAY of ddr3_dqs_io[3:0] and ddr3_ndqs_io[3:0] by 1.
If Address Command for ddr3 IP has negative setup slack in slow corner, than increase D5_DELAY of ddr3_clk_o[0] and ddr3_nclk_o[0] by 1.
If Address Command for ddr3 IP has negative hold slack in fast corner, than decrease D5_DELAY of ddr3_clk_o[0] and ddr3_nclk_o[0] by 1.
If Address Command for ddr3 IP has negative hold slack in slow corner, than decrease D5_DELAY of ddr3_clk_o[0] and ddr3_nclk_o[0] by 1.
If DQS vs. CK for ddr3 IP has negative hold slack in slow corner, than increase D5_DELAY of ddr3_dqs_io[3:0] and ddr3_ndqs_io[3:0] by 1.

After each change look at the Report DDR timing report to see that the negative slack is being absorbed on the other side.  i.e.  If large setup negative violation the D5_Delay change should reduce or remove the setup violation and reduce the margin on the hold side.  Continue to adjust the D5 delay on individual pins until all are meeting timing in Report DDR.

 

Attachments
Version history
Last update:
‎11-10-2022 09:57 AM
Updated by:
Contributors