FPGA, SoC, And CPLD Boards And Kits
FPGA Evaluation and Development Kits
6217 Discussions

Cyclone V: reaching RGMII Data to Clock output skew of +/- 500ps

Jodok
New Contributor I
1,318 Views

What is best practice to reach the 500ps Data to Clock output Skew of RGMII specification v2.0 on a Cyclone V?

Trying synthesizing a TSE MAC with RX-Clock to TX-Clock loopback fails because of timing missmatch. Applying a 2ns delay at the PHY and with the constrains made, the FPGA has 600ps setup and 900ps hold time budget for TX-path. The worst failing path is from the back-looped RX-Clock Pin to one of the TX-Data output Pins with a mismatch of  360ps.

For more timing details please see my other post: Re: Cyclone V TSE MAC timing closure - Intel Community

 

The Cyclone V device datasheet states in Table 48 RGMII Timing Characteristics: Td (TX_CLK to TXD/TX_CTL output data delay) -0.85ns ..+0.15ns. So with the correct Clock to Data Delay it should be possible to reach the 500ps output Skew.

 

Any help on this would be greatly appreciated!

Jodok

Labels (1)
0 Kudos
15 Replies
ShengN_Intel
Employee
1,136 Views

Hi Jodok,


Check this KDB link https://www.intel.com/content/www/us/en/support/programmable/articles/000079123.html, it's recommended to select the PHY with ability to adjust its input timing.


Thanks,

Regards,

Sheng


Jodok
New Contributor I
1,085 Views

Hi Sheng,

 

thank you for your response! 

How does one then respect the "asymmetrical" characteristic of the Delay introduced by the PHY?

Could you please review the following constraints for a ddr transmission.

 

max_delay = tData(max) + tSU - (tPeriod/2 - tPHYDELAY)

min_delay = tData(min) - tH - (tPeriod/2 - tPHYDELAY)

 

Thank you very mouch.

 

0 Kudos
ShengN_Intel
Employee
1,059 Views

Hi,


I think should be no problem with your ddr transmission constraint based on this link https://www.intel.com/content/www/us/en/support/programmable/support-resources/design-examples/horizontal/exm-tse-rgmii-phy.html


I think the ddr is center aligned (tPeriod/2).

(tPeriod/2 - tPHYDELAY) reflects the shifted clock edge to ensure the setup and hold times are correctly met despite the delay


Jodok
New Contributor I
1,053 Views

Dear Sheng,

 

thank you for reviewing the constraint.

Indeed, tPeriod/2 represents the 90° Phase-Shift to change from edge-alinged Tx at MAC to center-alinged Rx at PHY.

 

Is it then so that the Quartus Synthesis-Tool is not able to fit the logic with according internal delay to fix timing without additional PHY-Delay? Is it really the Td (TX_CLK to TXD/TX_CTL output data delay) of -0.85ns ..+0.15ns that Quartus can't deal with?

 

I haven't fully figured it out yet, but your answers give me hope that I am close to the solution.

 

If you could spend a moment to have a look at the calculations I have attached to this post and could give me a quick response if the method used is reasonable, I really would appriciate it.

0 Kudos
ShengN_Intel
Employee
1,039 Views

Hi,


Seems like that's the phy delay lacking which causes the violation.


Is the timing pass after adding the phy delay?


Jodok
New Contributor I
1,038 Views

Dear Sheng,

No, unfortunatly the timing is still bad.

Please, where do you think is the phy-delay lacking?

I thought I considered the phy-delay in the constraints with (tPeriod/2 - tPHYDELAY) Please have a look at Cell O20;O21 in the Excel.

 

Greetings

 

0 Kudos
ShengN_Intel
Employee
989 Views

Hi,


I think you can refer the timing constraints in AN477: Designing RGMII Interface with FPGA and HardCopy Devices (page 12) https://www.intel.com/content/www/us/en/content-details/654563/an-477-designing-rgmii-interface-with-fpga-and-hardcopy-devices.html


The internal PHY delay is considered in tco for set_input_delay


For the equation in this link https://www.intel.com/content/www/us/en/support/programmable/support-resources/design-examples/horizontal/exm-tse-rgmii-phy.html, $data_delay_min, $data_delay_max, $clk_delay_min, $clk_delay_max put 0 as assume trace delay, pin capacitance, and rise/fall time differences between data and clock are negligible

Design .qar can be found here https://blog.csdn.net/wangyanchao151/article/details/90401027


Let me know if any further update or concern.


Thanks,

Regards,

Sheng


Jodok
New Contributor I
947 Views

Dear Sheng,

 

I am not sure if we are on the same page anymore.

My interest is to reach the timing for MII-Tx, so i am not interested in input_delay constraints.

 

Fact is, if I constrain tx_max_delay and tx_min_delay as following and use a PHY-Delay of 2ns, Timing Analyzer gives me negative slack.

tx_max_delay = tDataTrace(max) + tSU = 0.33ns+1.05ns = 1.38ns

tx_min_delay = tDataTrace(min) + tHold = -0.33ns-0.8ns = -1.13ns

 

With this constraints made, I should meet the +/- 0.5ns "data to clock output skew" spiecified in RGMII v2.0 

But this is not the case, so I wonder where does this negative slack come from.

 

So please, is Td (TS_CLK to TXD/TX_CTL output data delay) -0.85ns to +0.15ns in Table 48 in the CycloneV Datasheet of relevance in that case? What does it mean? Other sources tell me that it should be irrelevant since the synthesis tool should best match the internal skew of the fpga.

 

Thank you for your response.

 

 

 

 

0 Kudos
Jodok
New Contributor I
937 Views

Important to note, the MAC is not on a SOC, It is not HPS.

 

As I understand now, Td (TS_CLK to TXD/TX_CTL output data delay) -0.85ns to +0.15ns in Table 48 in the CycloneV Datasheet is only of relevance for SOC, HPS EMACs.

 

So my last question: is a skew on a synthesized mac with ddio-buffer on both data and clock of 1.5ns usual?

 

 

 

0 Kudos
ShengN_Intel
Employee
897 Views

Hi,


May I know you're using External PHY Device with the Delay Option Is Enabled right?


Based on the AN477 page 12 design example, could you try with the 3 combinations below:

Assume trace delay, pin capacitance, and rise/fall time differences between the data and clock are negligible. 

Combination 1:

tx_max_delay = tDataTrace(max) + tSU = 0ns+1.05ns = 1.05ns

tx_min_delay = tDataTrace(min) + tHold = 0ns-0.8ns = -0.8ns

Combination 2:

tx_max_delay = tDataTrace(max) + (-tSU) = 0ns+(-1.05ns) = -1.05ns

tx_min_delay = tDataTrace(min) + tHold = 0ns-0.8ns = -0.8ns


Combination 3:

tx_max_delay = tDataTrace(max) + (-tSU) = 0.33ns+(-1.05ns) = -0.72ns

tx_min_delay = tDataTrace(min) + tHold = -0.33ns-0.8ns = -1.13ns



I think Td (TS_CLK to TXD/TX_CTL output data delay) -0.85ns to +0.15ns is not related to rgmii as design example didn't include that as well.


Jodok
New Contributor I
869 Views

Hi Sheng

Yes, I still use a PHY with delay option of 2ns enabled.

Combination 1 seems the correct one to me. This is what I was doing originally.

But still, with exact this combination TA fails with -0.065ns max slack -40°..+100°C both, fast and slow model analyzed.

Jodok_0-1728544149665.png

Combination 2 seems to be ilegitime, since tx_min_delay has to be smaller than tx_max_delay.

Combination 3 would theoretically be legal since tx_min_delay is smaller than tx_max_delay, but since the formula is the same as for Combination 2, it is false as well.

 

I will try to find other resources to get on.

Thank you Sheng.

0 Kudos
ShengN_Intel
Employee
860 Views

Hi,


Even without the tracedelay, the setup still fail. Could you try with PHY with delay option disabled, does the timing pass?


Seems like there's something wrong with the parameter used.


0 Kudos
Jodok
New Contributor I
835 Views

Hi Sheng,

Correct, even without the tracedelay (Combination 1), the setup fails.

 

"Could you try with PHY with delay option disabled, does the timing pass?"

But I do need the PHY-delay to shift the data from edge- to center-aligned. Without, timing will fail massivley.

...and did faild massively.

Jodok_0-1728552850471.png

 

Again, what jitter at the Cyclone V is to be expected for such a combination of both, data and clk coming from ddio-buffer?

Could you please share with me what jitter is to expect. 

 

Does the fpga maybe struggle to delay the data singals internally positive or negative to the clock signal?

Could I help the synthesis tool by shifting the external PHY-delay away from symmetric (2ns) towards a asymmetrical delay for example (+/- 1.5ns). But this is basically what I tried to achive before and failed.

0 Kudos
Jodok
New Contributor I
766 Views

So here is what I belive to found out so far:

Asymmetric Delays for Rx/Tx Path seem to be benefitial. I only can guess that is because the fpga can not synthesize negative delay on data trace. This I belive is because in my design the Rx-clk is looped-back as Tx-clk and can not be delayed freely.

create_clock -period 8.000 -name PHY0_RX_CLK -waveform {1.750 6.250} [get_ports PHY0_RX_CLK]

create_clock -period 8.000 -name PHY0_RX_CLK_VIRT

create_generated_clock     -name PHY0_TX_CLK_VIRT -phase 100.0 -source [get_ports PHY0_RX_CLK]  [get_ports PHY0_TX_CLK]

In this example both Tx and Rx have an external PHY (clk)-Delay of 2.25ns so the data at Rx will arrive in principle to soon. The fpga can now add delay to the Rx-data trace. The PHY delays the clock with 2.25ns on the Tx Path so that in general the data occure to soon. The fpga can now add delay to the Tx-data trace.

 

With the following constraints, a mildly ok result can be achived.

# **************************************************************
# External components
# **************************************************************
# TVX0106
set tSKQ 0.1
 
# DP83867
set TsetupT  0.55
set TholdT  -0.55
set TsetupR  1.05
set TholdR  -0.8
 
#  Calculate min / max skew 
set RxMaxDelay [expr {$TsetupT  + $tSKQ}]
set RxMinDelay [expr {$TholdT   - $tSKQ}]
set TxMaxDelay [expr {$TsetupR  + $tSKQ}]
set TxMinDelay [expr {$TholdR   - $tSKQ}]

 

TA Slack:[ns]
Input to Register Setup0.244
Input to Register Hold0.165
Input to Outupt Setup-0.045
Input to Output Hold-0.084

 

If a symmetric delay of 2ns for both Tx and Rx Path is used, the following Slack is much worse:

create_clock -period 8.000 -name PHY0_RX_CLK -waveform {2.000 6.000} [get_ports PHY0_RX_CLK]

create_clock -period 8.000 -name PHY0_RX_CLK_VIRT

create_generated_clock     -name PHY0_TX_CLK_VIRT -phase 90.0 -source [get_ports PHY0_RX_CLK]  [get_ports PHY0_TX_CLK]

TA Slack:[ns]
Input to Register Setup521
Input to Register Hold0
Input to Outupt Setup-0.15
Input to Output Hold-0.175

 

To improove timing further I now try to manipulate D5 Delay which is impressively bad documented.

 

Could somebody please verify my findings.

 

What else can be done on that?

0 Kudos
ShengN_Intel
Employee
757 Views

Hi,


Seems like there's no problem with the timing equation. Most probably the problem due to some parameter is wrong or missing. I think you may need to open an IPS thread https://www.intel.com/content/www/us/en/support/articles/000057045/ethernet-products.html to get some insights from RGMII expert.


0 Kudos
Reply