Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
20690 Discussions

Arria 10 Native Transceiver PHY - 10GBASE-R

Altera_Forum
Honored Contributor II
2,137 Views

Hi, 

I have a question concerning XCVR clock domains.. I will be referencing arria 10 transceiver phy user guide (UG-01143 2017.04.20). 

 

chapter 2.6.2 

Figure 55 

 

Here we can see clocking for 10GBASE-R mode of XCVR. I am interested especially in Enhanced PCS TX and RX FIFOs and their WR/RD clocks. 

From this figure, we can see that WR clock of TX FIFO is tx_coreclkin and it's supposed to be 156.25 MHz from FPGA fabric (specifically my 10GbE MAC). Internal "Parallel Clock" depends on the width of PCS/PMA interface (default is 40b and 257.8125MHz). So far, so good. 

 

Next, in Figure 58, we see a case with FEC support (which I don't use, but it's interesting anyway) and 64b PCS/PMA interface (so parallel clock is 161.13MHz). 

FPGA-side of FIFOs is still 156.25MHz and diagram shows that we can either supply our own clock to rx_coreclkin or create one with PLL using same reference clock. 

 

2.6.2.1.1 tx fifo and rx fifo 

in 10gbase-r configuration, the tx fifo behaves as a phase compensation fifo and 

the rx fifo behaves as a clock compensation fifo. 

 

chapter 2.6.2.1 

Note: 

for 10gbase-r, you must achieve 0 ppm of the frequency between the read clock of 

tx phase compensation fifo (pcs data) and the write clock of tx phase 

compensation fifo (xgmii data in the fpga fabric). this can be achieved by using the 

same reference clock as the transceiver dedicated reference clock input as well as the 

reference clock input for a core pll (fpll, for example) to produce the xgmii clock. 

 

question: Doesn't this mean, that WR and RD clock of TX FIFO must be same frequency? How can this be, when one is ~257MHz and second is ~156MHz? 

 

More on this in chapter 5.2.1.1.1: 

in phase compensation mode, the tx core fifo decouples phase variations between 

tx_coreclkin and pcs_clkout_x2(tx) . in this mode, read and write of the tx 

core fifo can be driven by clocks from asynchronous clock sources but must be same 

frequency. you can use tx_coreclkin (fpga fabric clock) or tx_clkout1 (tx 

parallel clock) to clock the write side of the tx core fifo. 

 

Again.. How? This seems as a major bug in documentation! 

 

As for RX FIFO, previous note continues: 

the same core pll can be used to drive the rx xgmii data. this is because the rx 

clock compensation fifo is able to handle the frequency ppm difference of ±100 ppm 

between rx pcs data driven by the rx recovered clock and rx xgmii data. 

 

100 ppm is very little (~15KHz difference on this speeds), so basically, I must use same frequency on RD and WR clocks (which goes against diagram and 156.25MHz claim). 

 

More in chapter 5.2.2.10.4

in 10gbase-r mode, the rx fifo operates as a clock compensation fifo. when the 

block synchronizer achieves block lock, data is sent through the fifo. idle ordered 

sets (os) are deleted and idles are inserted to compensate for the clock difference 

between the rx low speed parallel clock and the fpga fabric clock (±100 ppm for a 

maximum packet length of 64,000 bytes). 

 

question: Should I insert ASYNC FIFO on XGMII between XCVR and MAC? 

 

My problem is that packets are getting lost (almost half of them never arrives to the MAC and instead RX LOCAL FAULT is received from XCVR). 

 

question: Where does RX LOCAL FAULT come and what does it really tell me? 

 

I have tried several configurations and, so far, 40bit PCS/PMA interface and tx_pma_div_clkout with division factor 33 (produces 156.25MHz) works "least broken" (but it's still unacceptable, of course). 

 

Example of statistics from Spirent-Arria test: 

TX: 1 000 000 

RX: 579 105 

FCS ERR: 706 

BER: 0 

 

MAC RX: 579 349 

MAC DISCARD: 245 

MAC TX: 579 105 

 

(TX seems to be working well enough, packets are generated with PRBS) 

 

More problems emerge when using Intel 10GbE PCI-E card and real packets: 

TX: 23400 

RX: 3494 

RX ERROR: 52 

MAC RX: 10 590 

MAC DISCARD: 3 

MAC TX: 10 587 

 

Here more than half packets are lost on MAC RX nad only 1/3 of those sent actually arrives. 

 

I've also tried using 64bit PCS/PMA interface (so parallel clock is ~161MHz) and used tx_clkout for MAC, but surprisingly, nothing really changed and problems were still there. 

 

Development board with SFP+ is used (and actual transceivers used were Finistar and Intel - several of them and all are tested and working properly). 

 

chapter 2.4.3 

Table 13. 

 

enable tx_pma_div_clkout port  

on/off 

enables the optional tx_pma_div_clkout output clock. this clock is generated by the serializer. you can use this to drive core logic, to drive the fpga - transceivers interface. 

 

Figure 252 shows possible WR CLKs: 

you can control the write port using tx_clkout or tx_coreclkin . use the 

tx_clkout signal for a single channel and tx_coreclkin when using multiple 

channels. 

 

This is quite logical - sharing one parallel clock for multiple channels.. But those other mentions with 156.25MHz as WR CLK are quite confusing. 

 

Can anyone help me please?
0 Kudos
1 Reply
Altera_Forum
Honored Contributor II
751 Views

Sorry I can't answer any of your questions specifically, but know that I opened a service request with Altera a few weeks ago regarding 5.2.2.10.5 and the gentleman from Applications Engineering agreed that it was a typo and he submitted a request to have it corrected. From what I've read, I doubt that is the only typo. The Applications Engineer said they would publish a KDB solution--whatever that is. 

Also, the Applications Engineer recommended I refer to Figure 127 for information on when to assert the read signal (figure 126 is for the write side), but I find both of those diagrams questionable. They both recommend asserting the read/write signal *asynchronously* driven by the pempty/pfull signal--which seems a little odd. Furthermore, the data signals in those two diagrams doesn't seem to be lined up correctly.
Reply