Re: Quartus 18.1 Soft LVDS transmitter w/ ext PLL core missing tx_outclock ddio generator logic?

WKett · ‎11-01-2019

If I generate a Soft LVDS Intel FPGA IP core (v18.1) transmitter (verilog) and do not select the external PLL option, 7x serialization factor (odd), and 4 output channels, the generated verilog shows instances of ddio shifter logic which then creates the tx_outclock output from the megafunction instance. This configuration matches the channel link/flat panel link 1 format originally defined by National Semiconductor and still used by camera link interfaces. According to the simulation verilog, the core does not generate the 57% duty cycle using a PLL, rather, uses shifter logic and DDIO register instances to build the output clock from the both rising and falling edges of serialized data clock reference.

Now if I build the same LVDS core but select the external PLL option so I can have access to the remaining PLL output channels, the generated transmitter function now expects a high speed serialized clock (refclk * serialization factor / 2) and a reference clock by presumably instantiating a pll. However, the megafunction instance completely strips out the ddio logic for generating the serialized tx_outclock. Why? There is no scenario where I want to then generate the 57% duty cycle output clock in user code which risks phase mis-alignment to the data path controlled by the LVDS Tx core. The top level instance of the core has no tx_outclock port but the main substance of the IP core (suffixed _0002.v) which is instantiated in the top level instance of the core again shows an unconnected "tx_outclock" port.

Someone at Intel, please provide feedback. I would like to see an option for the external pll version of the soft LVDS transmitter core include the tx_outclock logic in future Quartus releases.

Rahul_S_Intel1 · ‎11-07-2019

May I know which device you are using and which quartus version using

WKett · ‎11-07-2019

Max10 family. Quartus 18.1 like it says in my post subject line. Specifically targeting 10M04DCU324C8G with 10M08DC or 10M16DC as the migration path.

WKett · ‎11-07-2019

Incidentally I've also uncovered a nuisance with the LVDS receiver IP (external PLL mode). My application requires a doubled clock so I have been forced to use the external pll option in order make use of the four remaining PLL outputs not used by my two LVDS Rx cores. The nuisance makes it impossible to be sure if the data out latency will match between two identical instances using separate input clocks. The input clocks are defined identically in the .sdc file. Call them ch0 and ch1. The pre-synthesis RTL simulates fine and matched ch0/ch1 test bench input data is still synchronized after deserialization, but simulating the .vo version of the project after synthesis and fitting, the ch1 data output is then one output clock behind ch0. So frustrating. I can only guess that synthesis is adding a pipeline delay to ch1 for routing purposes but there aren't enough megafunction knobs to insure that the pipeline of reciever0 matches the pipeline of receiver1. The fact that it doesn't match pre-synthesis RTL is troubling but I could live with it if I could simply guarantee synchronous output data.

Why two separate receivers instead of an 8 lane link? Because I don't have a guarantee that the serdes clock of ch0 is well enough matched to ch1 to use the ch0 clock to decode ch1 serialized data. I do know that after deserialization they are well enough matched that I can process them syncronously at the output bus of the LVDS receivers.

I further attempted to use internal LVDS Receiver/Transmitter PLLs and selected the box to "Use common PLL(s) for receivers and transmitters", ran the rx_inclock0 from ch0 to both the receiver and a subsequent transmitter, instantiated the ch1 receiver with rx_inclock1. RTL simulation works fine but basically simulates three independent PLLs whereas synthesis needs to figure out how to share PLL0 such that only 2 PLLs are utilized. I couldn't figure out how to make it work correctly and it always synthesized away my transmitter with the warning that it's outputs were stuck at ground so clearly it didn't connect a clock source in synthesis.

I now have a mostly working, post-fitted RTL simulation working with two 28:4 serdes receivers and one 28:4 serdes transmitter instantiated with external PLLs plus some core logic that requires 2x the frequency of the receiver clock (hence the need for access to the unused PLL outputs) but I still have to wait until I have hardware before I can figure out if the mismatch in the receiver pipeline latency really is mismatched or not.

Rahul_S_Intel1 · ‎11-08-2019

This message worries for me

transmitter with the warning that it's outputs were stuck at ground so clearly it didn't connect a clock source in synthesis.

WKett · ‎11-08-2019

Yes, that concerns me too. I assume you have access to my profile and can send a private email. Can you please tell me the criteria for getting an approved premium support account. Paid premium software? Hi budget sales account? What? My request was denied last week with no explanation and I can't find the criteria posted anywhere. I have worked with Altera FPGAs since before the Cyclone I device was released and have submitted numerous bugs through our FAE over the years. Now that I changed jobs and Intel acquired Altera, I can't even submit a decent bug report. Does Intel want folks to switch over to other brands or something?

WKett · ‎11-11-2019

Figure2 of this document: https://www.intel.com/content/www/us/en/programmable/documentation/sam1394433606063.html#sam1394435487348 seems pretty clear that using a "shared" PLL to run both an LVDS_receiver and LVDS_transmitter when the reference input clock is "rx_inclock" is supposed to work. So I ask again, how do I get decent support to resolve why this very configuration is not working post synthesis but it works fine in pre-synthesis RTL? Please note that the simulation model is not exactly the same as the synthesized model, things like PLL lock time have to be assumed for simulation, but is still supposed to represent correct logic/pipeline behavior. The synthesized result of attempting to use Rx/Tx shared PLLs or parallel Rx instances with separate external PLLs has resulted in different behavior vs. pre-synthesis RTL.

The final part of this puzzle... if use of separate PLLs results in an added pipeline delay on one of the two 4:28 receivers, why does the Rx output stay syncronized in post-synthesis when the output of a single externally instantiated PLL is connected to both receivers? However, this is exactly the case for which I cannot guarantee input timing at the DDR based serialized clock reference or I would just use this as the final solution.

WKett · ‎12-13-2019

For anyone that comes across this post, the issue with two identical LVDS Receivers sythesizing with different data pipe delays has not yet been resolved. We will be getting boards back next month and I hope I can manually add a pipeline delay to the lower latency Rx pipeline to re-synchronize the data as I was able to do in gate level RTL. It is also worth noting that the exact same code ported to a Cyclone 10LP does not exhibit the same issue at gate level RTL.

The LVDS Transmitter with a PLL shared with the upstream receiver sythesizing away despite functioning RTL was resolved. The Tx input clock was inadvertently assigned to be from a tri-stateable output node instead of the net associated with the input side of that same output buffer. The synthesizer/fitter didn't throw any useful warnings about why the outputs were stuck at ground and only warned that the clock node in question was being converted to an OR gate.

There has been no feedback on why the newer style LVDS transmitter IP core does not include a valid Tx clock generated by the same DDR logic used to run the serdes when the external PLL option is selected. This is especially problematic for odd serialization factors (7 for example) which make generating the non-50% duty Tx clock difficult without knowledge of how many half serialized clock edges the data gets aligned to.

WKett · ‎12-02-2020

A year later... it is worth mentioning that I never did get the Altera LVDS serdes cores to work. They are DDR based for Max10 devices because the clock tree only supports 400MHz. I don't think it was fully polished at the time Intel acquired Altera and they never went back to clean up the bugs. The valid "bit-slip" settings would compile differently even with small code changes and one of my two almost always didn't match RTL sims. Intel FAE support mentioned that one must send a training sequence every time the cores start in order to set the bit-slip values. That doesn't work if you don't have control over the transmitted serial stream. Eventually I abandoned the Intel LVDS cores (both Rx and Tx) and wrote my own which automatically align themselves to a desired word boundary which is completely deterministic relative to the 57% duty cycle clock on a standard "channel link" with 7:1 serialization per channel. Initially I had some timing issues but eventually set the Rx PLLs to be source synchronous instead of the default and suddenly I had all kinds of timing margin for input data capture.