FPGA Intellectual Property
PCI Express*, Networking and Connectivity, Memory Interfaces, DSP IP, and Video IP
6343 Discussions

Receive Transceiver clock recovery problem without loopback

Altera_Forum
Honored Contributor II
2,147 Views

Hi, 

 

I am working on a high speed optical fiber link between two FPGAs. Now the design seems to work perfectly in a loopback setting. 

 

But when I connect two boards together, download the exact same designs on the two boards and analyze the received data with signaltap, I get something that looks like the right stream of bits but with what seems to be a messed up clock (for example every now and then there would be two 1's in a row when there should be only one of them). 

 

I am stunned that clock recovery works fine in a loopback but not in a real setting. Has anyone seen this kind of issue before? any pointers to what could be the issue?
0 Kudos
12 Replies
Altera_Forum
Honored Contributor II
627 Views

 

--- Quote Start ---  

 

I am working on a high speed optical fiber link between two FPGAs. Now the design seems to work perfectly in a loopback setting. 

 

But when I connect two boards together, download the exact same designs on the two boards and analyze the received data with signaltap, I get something that looks like the right stream of bits but with what seems to be a messed up clock (for example every now and then there would be two 1's in a row when there should be only one of them). 

 

I am stunned that clock recovery works fine in a loopback but not in a real setting. Has anyone seen this kind of issue before? any pointers to what could be the issue? 

--- Quote End ---  

 

 

Are you running the receiver in Lock-to-Reference (LTR) mode, or Lock-to-Data (LTD) mode, or Auto? 

 

If you have the receiver in LTR mode, then it will work in loopback, since the reference clock is identical. For a board-to-board link, you need to use LTD. I find you need to use LTD even if the two boards are phase-locked to the same reference. I suspect its so that the CDR PLL can lock to the 'center' of the data eye pattern. 

 

Cheers, 

Dave
0 Kudos
Altera_Forum
Honored Contributor II
627 Views

Thanks for the reply! 

 

I have read some documentation on what you're mentioning. In my design I am not using the rx_locktorefclk nor the rx_locktodata ports, so I think it is in auto mode: First LTR, then it switches to LTD. 

 

I am using rx_freqlocked later to check for proper locking, so I think the LTR to LTD transition is properly done. 

 

Maybe I could try lowering the PPM (right now it's at +/-1000), or I could start from LTD right away. Also, apparently there's a different reset sequence depending on which mode is used, maybe I only support the one for LTR, so that even if it's in auto mode it doesn't do the LTD properly, except when the reference clocks are the same (in case of a loopback). 

 

I'll try this as soon as I get to work later today, and let you know how it goes. 

 

The other track I was pursuing was with some rate matching business, but I am not sure how to set this up and edit it in Altera's transceiver functions, I only see some rate_match parameters with constants that I don't understand in the generated megawizard functions... 

 

Edit: 

I have launched a new compilation with 62.5 ppm for the threshold that makes the transition between LTR and LTD. I have also added rx_phase_comp_fifo_error and tx_phase_comp_fifo_error ports to see if an error occurs there. 

I am a little confused about the functioning of the LTD though, it seems like I always end up switching to LTD and having an asserted rx_freqlocked even when no cable is plugged in (so only garbage data is received), isn't it supposed to lock only when good data is received?
0 Kudos
Altera_Forum
Honored Contributor II
627 Views

>> I have launched a new compilation with 62.5 ppm for the threshold that makes the transition between LTR and LTD. I have also added rx_phase_comp_fifo_error and tx_phase_comp_fifo_error ports to see if an error occurs there. 

 

Those two signals never get asserted, so i assume there is no error. The 62.5 ppm does not help either. 

 

Could it be that the frequency of the clock on the second board is not EXACTLY the one one the first board, so that it expects arriving data with a frequency derived from a multiple of a 49.5Mhz clock, while the arriving data was generated with a clock with a frequency that is a multiple of a 50.3MHz clock? 

 

This would explain why the CDR works only in a loopback setting. But if this is the case, isn't there a building block in the receiver data path that compensates for this?
0 Kudos
Altera_Forum
Honored Contributor II
627 Views

 

--- Quote Start ---  

 

Those two signals never get asserted, so i assume there is no error. The 62.5 ppm does not help either. 

 

--- Quote End ---  

I have seen discussions on this group regarding the lock indication signals not really being that meaningful ... 

 

 

--- Quote Start ---  

 

Could it be that the frequency of the clock on the second board is not EXACTLY the one one the first board, so that it expects arriving data with a frequency derived from a multiple of a 49.5Mhz clock, while the arriving data was generated with a clock with a frequency that is a multiple of a 50.3MHz clock? 

 

--- Quote End ---  

Can you test your theory? Can you get the first board to generate a clock that you then use on the second board as the reference clock? 

 

For my testing, I am using two Stratix IV GX development kits, I have the first send its 156.25MHz reference clock to the second, and then both are operating coherently. I have not tested whether lock-to-data works between two boards with independent oscillators, since this does not reflect what I will be implementing; an ADC-to-FPGA interface. 

 

 

--- Quote Start ---  

 

This would explain why the CDR works only in a loopback setting. But if this is the case, isn't there a building block in the receiver data path that compensates for this? 

--- Quote End ---  

"Lock-to-data" should lock the receive CDR to that of the incoming data. 

 

Here's what I do to check the frequencies of all the clocks; 

 

1) I have an SOPC system with control registers slaves. That system is clocked at 100MHz. 

 

2) I have a block of slave registers that are clock counters. 

 

The clock counters block is enabled by my Avalon-MM master writing to a control register to enable the counters, and then writing again to disable counters. I use Tcl and enable the counters for about 1 second. 

 

The counters count; the system clock (eg., ideally a count of 100M clocks), and all of my external clocks; the external GXB refclk 156.25MHz nominal, and the receiver channel CDR recovered clocks. I then assume my 100MHz is exactly 100MHz, and use Tcl math to calculate the frequencies. In another system, I use an external GPS 1pps tick to get more accurate estimates. 

 

You should create something similar. When you are in LTR mode, your CDR clock count should match that of the reference on the receiver board. When you are in LTD mode, your CDR clock count should match that of the transmitter board. 

 

If your clock counts do not, then the LTD mode is not working properly ... possibly due to too great a mismatch between reference frequencies. 

 

Cheers, 

Dave
0 Kudos
Altera_Forum
Honored Contributor II
627 Views

>> Can you test your theory? Can you get the first board to generate a clock that you then use on the second board as the reference clock? 

 

This is going to be my next step. I was going to try this this morning but then I lost my motivation when I thought that there would be noise in the SMA connectors etc... so it was probably not gonna work... 

 

Your message is encouraging though! 

 

I am using the "SFP HSMC loopback demo" as a basis for the design. My board is a DE4 with Stratix IV, connected to an SFP HSMC daughter card through HSMC. 

 

The reference clock to do clock and data recovery is derived from hsmb_clk_in2, which comes from the HSMC interface and seems to be generated directly by the daughter card. 

 

From what I read, I think I can force the two daughter cards to be using the same clock by using SMA connectors, now I just need to find two of them that will be long enough and hopefully not broken...
0 Kudos
Altera_Forum
Honored Contributor II
627 Views

 

--- Quote Start ---  

 

This is going to be my next step. I was going to try this this morning but then I lost my motivation when I thought that there would be noise in the SMA connectors etc... so it was probably not gonna work... 

 

--- Quote End ---  

 

 

SMA connectors work up to much higher frequencies than the reference clocks. It would be the board layout that could potentially cause issues. However, the Terasic boards are well designed, so you should be ok. 

 

 

--- Quote Start ---  

 

I am using the "SFP HSMC loopback demo" as a basis for the design. My board is a DE4 with Stratix IV, connected to an SFP HSMC daughter card through HSMC. 

 

--- Quote End ---  

 

 

This one? 

 

http://www.terasic.com.tw/cgi-bin/page/archive.pl?language=english&categoryno=71&no=342 

 

There are clock input and output SMAs. You should be able to use those. 

 

However, none of the REFCLK signals from the transceivers route via the HSMC connector, so you'll have to see if Quartus will let you use an LVDS clock pin. It should, worst-case you might have to use a general-purpose PLL (ALT_PLL) and then route the PLL output to the transceiver block as the reference clock. 

 

Cheers, 

Dave
0 Kudos
Altera_Forum
Honored Contributor II
627 Views

Yes, this is the board I am working with. 

 

>> However, none of the REFCLK signals from the transceivers route via the HSMC connector 

 

What do you mean? 

I think that clk2_p/n (figure 2.3 in the manual) is the clock that the design sees as the pair hsmb_clk_in_p2 and hsmb_clk_in_n2, which is then used as a refclk further in the code by the CDR (the code is available on the website too). 

 

What I did now is that I connected clk2_p/n (figure 2.3) from one board into the SMA_clk_p/n of the second board, such that the clk2_p/n of the second board is the same as the one from the first board. 

 

I am not sure if the mux that is pictured would create problems though (like adding a delay to the other clock), 

 

It doesn't work though... with this configuration there is no data sent/received by the second board. Either my connections are not what they should be, or the cables I am using aren't doing the job.
0 Kudos
Altera_Forum
Honored Contributor II
627 Views

 

--- Quote Start ---  

 

>> However, none of the REFCLK signals from the transceivers route via the HSMC connector 

 

What do you mean? 

 

--- Quote End ---  

See page 46 of the HSMC specification: 

 

http://www.altera.com/literature/ds/hsmc_spec.pdf 

 

Connector pins 1 to 32 are defined as transceiver lanes, but there is no assignment for a transceiver REFCLK clock. 

 

 

--- Quote Start ---  

 

I think that clk2_p/n (figure 2.3 in the manual) is the clock that the design sees as the pair hsmb_clk_in_p2 and hsmb_clk_in_n2, which is then used as a refclk further in the code by the CDR (the code is available on the website too). 

 

--- Quote End ---  

That is possible, however, it would be a function of the DE4 board clock distribution. You'd want to check that documentation to see what clocks on the HSMC route to FPGA pins that can be used as REFCLKs. 

 

 

--- Quote Start ---  

 

What I did now is that I connected clk2_p/n (figure 2.3) from one board into the SMA_clk_p/n of the second board, such that the clk2_p/n of the second board is the same as the one from the first board. 

 

--- Quote End ---  

That sounds reasonable. 

 

 

--- Quote Start ---  

 

I am not sure if the mux that is pictured would create problems though (like adding a delay to the other clock), 

 

--- Quote End ---  

The delay should not matter. The logic levels of the clocks will. You'll need to see whether things are AC-coupled, whether they require termination, etc. 

 

 

--- Quote Start ---  

 

It doesn't work though... with this configuration there is no data sent/received by the second board. Either my connections are not what they should be, or the cables I am using aren't doing the job. 

--- Quote End ---  

Plug the cable with the clock into a scope and look at it. Use an oscilloscope to probe for the clocks around the place. Implement the clock counter logic I told you I use, and then count clocks. You can send a lower frequency clock around if you are concerned the cables aren't great; once you get things working, increase the clock rate and see if there are issues. 

 

Cheers, 

Dave
0 Kudos
Altera_Forum
Honored Contributor II
627 Views

Thanks for your help, I really appreciate it! 

 

Today I played around with the altgx and altgx_reconfig megafunctions, trying to reconfigure them with different parameters etc... nothing worked. 

 

I only have sketchy cables around, I will have to buy new ones. When looking at the clk2_p clock with the scope I don't see any meaningful signal (only something that looks like noise), so the cables probably don't do the job... 

 

I will also implement a "clock measuring logic" as you suggested, maybe it will show some obvious problem, like very different clock frequencies. The way I did this in previous designs is simply by incrementing a counter at every clock cycle of the clock I want to measure, and then look at a SignalTap waveform sampled at the 50MHz clock, and do some math... 

 

I hope I can get it to work soon... I'll let you know how these things go.
0 Kudos
Altera_Forum
Honored Contributor II
627 Views

 

--- Quote Start ---  

 

Thanks for your help, I really appreciate it! 

 

--- Quote End ---  

You're welcome. 

 

 

--- Quote Start ---  

 

Today I played around with the altgx and altgx_reconfig megafunctions, trying to reconfigure them with different parameters etc... nothing worked. 

 

--- Quote End ---  

They do work. I've used them to test the eyeQ registers in a receiver, and test the VOD control in a transmitter. Start with the simulation models and Modelsim, and once you can read registers there, move onto hardware. 

 

 

--- Quote Start ---  

 

I only have sketchy cables around, I will have to buy new ones. When looking at the clk2_p clock with the scope I don't see any meaningful signal (only something that looks like noise), so the cables probably don't do the job... 

 

--- Quote End ---  

Did you check at the source of clk2_p and make sure there is a signal there first? It might not be your cables, though you should buy some good quality ones. 

 

 

--- Quote Start ---  

 

I will also implement a "clock measuring logic" as you suggested, maybe it will show some obvious problem, like very different clock frequencies. The way I did this in previous designs is simply by incrementing a counter at every clock cycle of the clock I want to measure, and then look at a SignalTap waveform sampled at the 50MHz clock, and do some math... 

 

I hope I can get it to work soon... I'll let you know how these things go. 

--- Quote End ---  

Great. Feel free to ask more questions. 

 

Cheers, 

Dave
0 Kudos
Altera_Forum
Honored Contributor II
627 Views

Problem solved! 

 

Since the example design provided by Terasic was for the development kit, I had to change the pin assignments and clocks used to make it work with the DE4 (I did this a long time ago). In this process, the pin for the clk2_p clock was set to a 2.5V pin by default, whereas in the design it was intended to be used in LVDS mode. 

I changed it back to LVDS and now the LTD seems to behave more normally: It actually doesn't lock when the cables aren't connected properly (the LED indicating successful locking flickers). 

 

I imagine it worked in the loopback case because the transmitter and receiver had the exact same clock, so it wasn't a problem. But in the real case I guess that some ppm differences between the clocks made the CDR unit fail. 

 

I get another problem now though. In a simple design all 4 channels work well. But when I use a bigger design with some other transceivers, 3 of the channels work fine, channel# 0 properly sends the data, but does not receive anything (al 0's). I know it sends data properly because if I interface the big design with the small one, the small one actually receives data. Maybe it's a conflict between transceivers in the big one... ?!?! 

 

But this is not related to this topic anymore, hopefully I will find a solution soon... 

 

Thanks for your help Dave!
0 Kudos
Altera_Forum
Honored Contributor II
627 Views

 

--- Quote Start ---  

Problem solved! 

 

Thanks for your help Dave! 

--- Quote End ---  

 

 

Great, and you're welcome.  

 

Cheers, 

Dave
0 Kudos
Reply