Solved: How long should rx_is_lockedtodata be asserted before a valid connection can ACTUALLY be assumed?

ABoge · ‎10-21-2019

In the documentation for the high speed transceiver IP it discusses the details for controlling the reset signals when a connection is being established. The altera reset controller is being used here, but the custom communication state machine needs to know this timing information so that it can start processing valid received data.

Documentation link:

https://www.intel.com/content/www/us/en/programmable/documentation/hki1486507600636.html#wua1486507356870

In the section "Lock-to-Data Mode" there is a note that reads:

"The rx_is_lockedtodata signal toggles until the CDR sees valid data; therefore, you should hold receiver PCS logic in reset (rx_digitalreset) for a minimum of 4 µs after rx_is_lockedtodata remains continuously asserted."

Despite taking this into account, the custom state machine was getting confused. Monitoring the rx_is_lockedtodata signal with an oscilloscope when the transmitter is turned off (another FPGA that is not configured) shows the issue. rx_is_lockedtodata is toggling roughly every 293us. This timing is extremely consistent, so it's unlikely to be confusion from some kind of random noise that might be on the line. It does not line up with any other signals on the board.

Setting the assertion delay to 500us in the custom state machine and it works as expected. So there is a workaround, but that cannot be trusted going into the future when the behavior is contradicted by the documentation. My custom code is only monitoring the state of the transceiver and reset controller at this stage, so I don't think there is something wrong there.

IP's being used:

altera_xcvr_native_av

alt_xcvr_reconfig

altera_xcvr_reset_control

Hardware:

Cyclone V

CheePin_C_Intel · ‎10-22-2019

Hi,

If during the toggling, the rx_is_lockedtodata assertion duration is longer than 4us, you can try to workaround by monitoring the RX signal detect to ensure valid signal presence before you check on the CDR status. Note that you would need to enable 8b10b block to use RX signal detect. You will need to set the signal detect thresholds according to your specific setup. You may refer to the SATA/SAS recommended QSF assignments in the V series XCVR PHY IP user guide -> Cyclone V Transceiver Native PHY IP Core -> "Enable rx_std_signaldetect port" section for further details.

Please let me know if there is any concern. Thank you.

Best regards,

Chee Pin

View solution in original post

CheePin_C_Intel · ‎10-22-2019

Hi,

As I understand it, you have some inquiries related to the CDR lock to data. As I understand it from the CV device handbook, it says that after rx_is_lockedtodata asserted, if it stays asserted for 4us, then CDR should have successfully locked to the incoming data.

When there is no incoming data, the CDR will periodically attempt to lock data. When it is attempting this, the rx_is_lockedtodata will go high. If no valid incoming data, then the rx_is_lockedtodata will go low. This will continue to repeat which lead to observation rx_is_lockedtodata toggling.

Just would like to check with you if you are observing the high duration of rx_is_lockedtodata when there is no valid data is more than 4us?

Please let me know if there is any concern. Thank you.

Best regards,

Chee Pin

CheePin_C_Intel · ‎10-22-2019

Hi,

If during the toggling, the rx_is_lockedtodata assertion duration is longer than 4us, you can try to workaround by monitoring the RX signal detect to ensure valid signal presence before you check on the CDR status. Note that you would need to enable 8b10b block to use RX signal detect. You will need to set the signal detect thresholds according to your specific setup. You may refer to the SATA/SAS recommended QSF assignments in the V series XCVR PHY IP user guide -> Cyclone V Transceiver Native PHY IP Core -> "Enable rx_std_signaldetect port" section for further details.

Please let me know if there is any concern. Thank you.

Best regards,

Chee Pin

ABoge · ‎10-22-2019

Had to tweak XCVR_RX_SD_THRESHOLD and XCVR_RX_COMMON_MODE_VOLTAGE values to 3 and VTT_0P70V respectively to get things working for PCML-1.5, but that did the trick.

In fact, activating the signal detect circuit gates rx_is_lockedtodata as well, so the only change needed in my state machine was to remove the delay timer.

Is signaldetect just looking for the common mode voltage to be above the specified value or is it actually watching for transitions?

CheePin_C_Intel · ‎10-23-2019

Hi,

Thanks for your update. Glad to hear that you have managed to make it works.

Regarding your latest inquiry on the signaldetect, for your information, the signaldetect will look for voltage above the set threshold to assert or de-assert the status signal. It is not watching for transition.

Please let me know if there is any concern. Thank you.

Best regards,

Chee Pin

ABoge · ‎10-28-2019

After some more testing the reliability of establishing a connection is still flaky.

That is once a connection is successful it works great, but getting a connection is unreliably.

For example when in the bad state, every time the following packet is sent:

003C, 0200, 1234, 5678

The following is received:

003C, 023C, 1234, 567C, 5678

This is looking directly at rx_parallel_data with signal tap after both rx_is_lockedtodata and rx_std_signaldetect are high.

The state machine does not have any way to detect this because when empty packets are being sent they look fine:

003C, 0000, 0000, ....

Received:

003C, 0000, 0000, ....

Is it possible for rx_is_lockedtodata to have false positives (besides the 4us timeout)?

It is as if some component within the Native Phy is partially out of alignment.

Right now tx_std_clkout is connected to both tx_std_coreclkin and rx_std_coreclkin so as to keep everything in the same clock domain and simplify the design on my side. The documentation states that there is alignment logic that will take care of some discrepancy between rx_std_coreclkin and rx_std_clkout as long as they don't drift relative to each other. Since both FPGAs are running from the same reference, that requirement should be satisfied. Maybe the alignment logic is not as strong as the documentation suggests?

Maybe the state machine is going to have to do some additional handshaking by sending test data to make sure everything is aligned, then toggle rx_std_wa_patternalign again if that fails.

But that is annoying since sending a comma symbol (3C and datak=1) every packet really should be enough to get a good alignment.

Version info:

Quartus Prime Version 18.1.1 Build 646.

CheePin_C_Intel · ‎11-02-2019

Hi,

Sorry for the delay. As I understand it, it seems like you are observing bit errors when you are sending some fixed data pattern from TX to RX. I believe the TX and RX are different FPGA. Based on this observation, it seems to be trending towards signal integrity issue or probably clocking issue instead of CDR lose lock. Generally when CDR lose lock, you will see all corrupted data instead of a few bit toggling with correct word boundary.

Would you mind to try the following:

1. Connect the rx_clkout to rx_coreclkin which is the recommendation connection.

2. Perform a serial loopback from TX to RX within the same FPGA with the same data pattern to see if RX still exhibit similar error. This would be helpful to narrow down potential signal integrity issue.

3. Please share with me you Native PHY .ip file so that I could have better understanding on your configuration as well as to see if can spot any anomaly.

Please let me know if there is any concern. Thank you.

Best regards,

Chee Pin

ABoge · ‎11-06-2019

I'll try your points 1 and 2. In the meantime I've also included what I hope are the files that you are asking for.

As a side point:

You mention that connecting rx_clkout to rx_coreclkin is the recommended connection. I don't see this recommendation in any documentation.

On page 11-8 (pdf page 293) of this document https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/ug/xcvr_user_guide.pdf

It says that it can be connected to any clock that the user wants. Specifically: "FPGA fabric clock, the FPGA fabric RX interface clock, or the input reference clock"

Obviously the clock used cannot drift relative to the transmitter or the fifo will overflow or underflow as the case may be, but that is guaranteed since I'm using the same reference for both FPGAs.

Admittedly that page is detailing information for the Low Latency PHY IP core not the Native PHY IP core. But that is not the only bit of information that is only mentioned for one variant of the IP core that is reasonably applicable to all the variants. For example, the fifo overflow possibility is only mentioned in the Interlaken section.

Is there better documentation somewhere else?

CheePin_C_Intel · ‎11-08-2019

Hi,

Please allow me some time to look into this. I will provide you an update on the progress by early next week. Please ping me if you do not hear back from me. Sorry for keeping you waiting.

CheePin_C_Intel · ‎11-12-2019

Hi,

Sorry for the delay. Regarding your inquiry on the rx_clkout to rx_coreclkin, yes, you are right, you can drive the rx_coreclkin with other parallel clock with the same frequency and 0ppm with the rx_clkout. Based on my experience, generally this is used in bonding mode where we are using the master channel's rx_clkout to drive all the bonded channels' rx_coreclkin to reduce skew. For non-bonded channel, normally we will use the rx_clkout for a specific channel to drive its own rx_coreclkin. If you take a look at the figure "PCS Block Diagram of a Transceiver Channel in a Cyclone V Device" in CV device handbook, you would see a dedicated path for rx_clkout to be connected to the read of the FIFO. This path will be used if you connect rx_clkout to rx_corelkin at RTL.

Please let me know if there is any concern. Thank you.