Transceiver passthrough in Cyclone V

Altera_Forum · ‎02-16-2017

Hello!

I'm having the following situation and I can't figure out how to do it, after reading the documentation and trying some things I decided to come here for help.

I need to connect one transceiver (Rx) to another (Tx). Basically creating a transceiver passthrough. I know each transceiver channel has this capability, but I have to do it between different channels. This is actually just a small part of a much bigger design in which I would need to be able to connect one Rx transceiver to any of the other Tx transceivers, dynamically. There would be multiple Rx receivers.

I can't figure out how to clock the Tx transceiver. I need to provide a reference, but this can't be the same reference as Rx because it doesn't match the incoming data rate. It should be the recovered Rx clock (from the CDR) however you can't just connect the CDR output clock into another transceiver's CMU/PLL, it complains about an illegal connection.

Maybe connecting through a PLL could work, but I'm not sure if this would be an option given that I need to connect any Rx to any Tx, I would need a PLL for each Tx transceiver with multiple clock inputs to be able to select the Rx source, is that correct?

What if I wanted to connect either of the 5 Rx transceiver to one Tx? The clock control blocks only allow selection from 4 sources.

Does someone know if what I'm trying to achieve here is possible? If so, I'd appreciate any help, even if it's just a nudge in the right direction! Or flat out telling me it's not possible would be a great help too.

Thank you!

Altera_Forum · ‎02-17-2017

--- Quote Start ---

I would need to be able to connect one Rx transceiver to any of the other Tx transceivers, dynamically. There would be multiple Rx receivers.

--- Quote End ---

Treat every transceiver interface as its own clock domain. Treat the FPGA as the multiplexing clock domain. Each transceiver has an RX and TX channel. The transceiver to FPGA core interface would use a dual-clock FIFO. The FIFO interfaces can be converted to use the AXi4-Stream or Avalon-ST stream protocols.

Your multiplexing logic operates in the FPGA core clock domain and simply routes AXI4-Stream/Avalon-ST traffic from each source to each destination.

Each transceiver block needs a reference clock. The FPGA core can use the same frequency, or a faster clock, or a slower clock (depending on the data volume). The clocks can all be identical frequencies if possible, but that is not a requirement. The FIFOs take care of clock-domain crossing, and the transceiver protocol needs to take care of idle character insertion and removal.

Simulate this and it will work just fine.

There are some basic transceiver and BFM examples here ...

https://www.ovro.caltech.edu/~dwh/correlator/cobra_docs.html

Cheers,

Dave

Altera_Forum · ‎02-17-2017

Hello Dave, thank you for your fast response!

Unfortunately, I don't think this is possible in my case. I'm using the transceivers to receive and transmit SDI video, this means there's no room for idle character insertion which is, if I understand correctly, what the rate-match FIFO does, right?

I've also looked at Altera's documentation configuration examples and it disables the rate-match FIFO for the SDI examples.

Is there something I'm missing?

Regards,

Sebastian.

Altera_Forum · ‎02-19-2017

--- Quote Start ---

I'm using the transceivers to receive and transmit SDI video, this means there's no room for idle character insertion which is, if I understand correctly, what the rate-match FIFO does, right?

--- Quote End ---

The FIFOs are there primarily for clock domain crossing. You could design a system with a common clock source, yet many many transceivers, and although they are phase-locked to the same source, the relative clock phases used by the transmitters, receivers (clock-and-data recovery unit PLL), and the FPGA clock domain, will all be different. The FIFOs allow you to bring data in from one receiver, cross into the FPGA fabric, and then cross over to a transmitter.

If you are receiving and transmitting SDI video, then your transceiver configuration will not involve inserting/removing idles. If you are transferring a video stream from one source to another, then you can use FIFOs (in the FPGA fabric) for storing frames, and that will simply delay the video stream.

Create a simulation containing two transceiver interfaces (TX+RX in two different blocks), connect the Avalon-ST video streams, and assuming that setup works fine, you should be able to see how to scale things so that the FPGA is a video multiplexing switch.

Cheers,

Dave

Altera_Forum · ‎02-19-2017

--- Quote Start ---

If you are receiving and transmitting SDI video, then your transceiver configuration will not involve inserting/removing idles. If you are transferring a video stream from one source to another, then you can use FIFOs (in the FPGA fabric) for storing frames, and that will simply delay the video stream.

--- Quote End ---

I'm afraid that will not work. The problem with SDI video is exactly that you cannot insert/remove idles (just in case it wasn't clear, the signal comes from another device so it does not share a common clock source with the FPGA). If I understand correctly, what you're telling me to try is this:

SDI_in --> XCVR_0 --> ( FIFO ) --> XCVR_1 --> SDI_out

The parenthesis indicate that the FIFO is in the FPGA fabric, the rate-match FIFOs are included in the XCVR.

Let me throw some numbers (not accurate nor real, just for demonstration purposes). Lets say both XCVRs are clocked with the same reference of 100M. Now, if the incoming video signal is running slightly off at 99M then the CDR in XCVR_0 will recover a 99M clock. You feed that into the FPGA's fabric FIFO and then extract it with the output transceiver's clock of 100M.

At this point you have a 99M source pushing data into the FIFO and a 100M sink pulling data out, since the sink is running faster it will have to make up samples, 1M of them in a second in this simple example. The same happens the other way around if the video signal is running at 101M, you will have an excess of 1M samples in a second which need to be discarded.

Am I understanding incorrectly the way you say the transceivers work? Can a Tx transceiver clocked at 100M actually output data at 99M/101M without inserting or removing idles? The only way I've been able to do this is by connecting the recovered clock from XCVR_0 to a fPLL and then feed that as a reference for XCVR_1, this is not a solution though since you can only really use one fPLL per bank so there would be a bunch of Tx XCVRs you wouldn't be able to use.

Once again, thank you for your help!! =)

Regards,

Sebastian.

Altera_Forum · ‎02-22-2017

Hi Sebastian,

--- Quote Start ---

I'm afraid that will not work.

--- Quote End ---

You're right, but it also might work just fine.

If the input source is SDI video and the output destination is SDI video, or the output destinations are all SDI video, eg., multiple displays with the same images, then the SDI reference clock frequencies will all be similar within some parts-per-million tolerance.

I'm not familiar with SDI, but I suspect there are likely points in time where you can add or drop an entire frame without anyone really noticing. If that is the case, then your FIFO logic would simply need to be designed to store one or more frames. The FIFO flags can be used to determine when a full frame is present in the buffer, and when that is true, you start transmitting it over SDI. If once the frame is transmitted the FIFO does not have a full frame, a blank frame could be sent, or perhaps the duplicate of the last frame (in which case you'd need a save-the-last-frame FIFO).

Ultimately because you need to transfer complete frames between sources and destinations that have different clocks, you need to consider your options. In practice if you do the math for the ppm differences in clocks, you may find that the accumulated error takes many hours to accumulate a one frame difference. If that is the case, duplicating or dropping a frame would likely never be noticed. If your logic could detect the black frames between transitions, then you could simple send an extra black frame, or drop one of those frames.

I've not played with video for a long time. I'm sure there are common techniques similar to what I am suggesting.

The bottom-line is that unless you can control all the reference clocks to all the transceiver blocks and displays, then you cannot create a truely synchronous system.

Of course, if you are building your own mega-wall-display and you have the option to define the clocking, then go ahead and make the clocks synchronous! I think solving the asynchronous clocking scheme would make the system much more scaleable.

Cheers,

Dave

Altera_Forum · ‎02-23-2017

--- Quote Start ---

Of course, if you are building your own mega-wall-display and you have the option to define the clocking, then go ahead and make the clocks synchronous! I think solving the asynchronous clocking scheme would make the system much more scaleable.

--- Quote End ---

Actually, this is the most common use case, there's usually one global (global as in across all of the devices chained together) clock synchronizing everything. However, I also need to account for the few cases in which this isn't like so and each input signal will have it's own independent reference. So my question was geared towards this less common use case.

--- Quote Start ---

I'm not familiar with SDI, but I suspect there are likely points in time where you can add or drop an entire frame without anyone really noticing. If that is the case, then your FIFO logic would simply need to be designed to store one or more frames. The FIFO flags can be used to determine when a full frame is present in the buffer, and when that is true, you start transmitting it over SDI. If once the frame is transmitted the FIFO does not have a full frame, a blank frame could be sent, or perhaps the duplicate of the last frame (in which case you'd need a save-the-last-frame FIFO).

--- Quote End ---

This is sort of true. SDI carries a lot of other information besides video, so while it can be done (and I'll have to do it if there's no other way) it's not the ideal case. After researching about it, I couldn't really conclude anything and that's how I came to ask here in the forums. I want to know if it's possible to use the recovered clock to feed the transmitter's reference, I know it can be done through a fPLL but that's very limiting because you can only use one fPLL per bank so you'd only be able to route 2 inputs into 2 outputs (I'm not sure if the outputs have to be on different banks) and I don't know how much is jitter influenced by this.

I appreciate your help, this is actually a really good alternative if what I'm asking about ends up not being possible at all. Thank you! =)

Regards,

Sebastian.

Altera_Forum · ‎02-24-2017

--- Quote Start ---

I want to know if it's possible to use the recovered clock to feed the transmitter's reference

--- Quote End ---

It might be possible. You would need to use Quartus to check.

Transceivers have an external reference pin for the transceiver block reference clock (REFCLK). The REFCLK is the transmit clock source, and the initial reference clock for the receiver. When the receiver is initializing, it uses REFCLK to lock the receiver clock-and-data recovery (CDR) unit PLL this is called lock-to-reference (LTR) mode. Then the CDR tries to lock to the incoming data stream this is called lock-to-data (LTD) mode. In the case where all the clocks are synchronous, then the recovered LTD clock has the same frequency as REFCLK, but it has difference phase noise.

What you are asking (if I interpret your question correctly) is whether you can take a clock from the CDR, i.e., the LTD clock, and then use it for the REFCLK input of another transceiver block. I'm not sure that this is possible, it will depend heavily on the FPGA clocking resources. For example, the clock you want needs to route from the CDR onto a global or local clock network, and then back to the REFCLK multiplexing logic on the transceiver block inputs. If you are at the board-design stage, then you could take the CDR clock output off the FPGA to a jitter cleaner PLL, and then route the output of that PLL back to a REFCLK pin. If you look on the SiLabs web site, there is a paper on synchronous Ethernet that discusses this technique.

--- Quote Start ---

I know it can be done through a fPLL but that's very limiting because you can only use one fPLL per bank so you'd only be able to route 2 inputs into 2 outputs (I'm not sure if the outputs have to be on different banks) and I don't know how much is jitter influenced by this.

--- Quote End ---

I've tried something similar with a global clock input. The global clock pin could not be routed to the transceiver REFCLK, but Quartus would allow routing the clock to a PLL and then a PLL output to the REFCLK. Quartus generates a warning about "excessive jitter" for this setup. This was acceptable in the test I was doing. Since you mention an fPLL, you're using a newer device, so there might be a few more clocking options. If Quartus generates warnings about clock jitter, then you probably don't want to accept that method as your final implementation.

In your application, if the input video stream clock is asynchronous relative to the video wall clock, then just buffer a small amount of data before displaying the images over all the destination displays. The only rate difference you have to worry about is the source relative to the destination. If you do the math for the time taken for a one frame difference to accumulate give the ppm errors between clocks, you might find it is quite a long time. If you buffered 1 second of frames, and had enough space for 1 second more in your buffer (in case the source clock is faster than the wall clock), I'm sure it would take a while to overflow or underflow the buffer.

Cheers,

Dave