Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
21611 Discussions

Stratix III EP3SL340c3 Diffrential IO data rate

Altera_Forum
Honored Contributor II
1,995 Views

Hi 

I have been working on DE3 board with the above mentioned FPGA for some months now. I am trying to deserialize 8 LVDS channels from an ADC. each channel is 12 bits and ADC can work upto 50 Mega Samples per second. I am designing my own Deserializing circuit because there are not enough PLLs on DE3 FPGA ( i will interfacing 12 ADCs where as there are only 8 PLLs on FPGA) and each SERDES Megafunction requires 1 PLL. 

 

I am trying to operate IO pins at (max) 600 Mbits per second and my shift registers are working at half the bit rate i.e. 300 mega shifts per seconed (as each channel has 2 shift registers, 1 for even stream of bits and other one for odd. In the end i interleave two streams). 

 

I have tried many different design approaches along with playing arround with chip planner defining regions, assigning specific resources to different instances and so on... but each of my design fails at arround 25 Mega Samples. Simulation shows a picture perfect scenario even at 50 MHz. I was wondering if IO pins are creating the bottle neck here or is it the shift registers in my design. ( i am using LPM_SHIFTREG megafunction for shift registers) 

 

If anyone can provide me some idea about where the problem could be, i would be greatful.
0 Kudos
13 Replies
Altera_Forum
Honored Contributor II
1,075 Views

I don't think, that you need a PLL for each ADC. You may use a PLL output for each ADC, if you want to adjust the receiving phase for each ADC individually, which shouldn't be necessary normally. At least, you should provide a common phase adjustment for the fast SERDES clock (300 MHz) for all ADCs. Alternatively, you can use the bitclock output from ADC as SERDES clock and no PLL at all. Personally, I prefer the PLL variant. 

 

The SERDES should use an altddio_in block as 1:2 demux. Than a single SR, shifting in two bits for each fastclock edge, can be used. It has no problem to achieve 600 MBPS rate with CIII. 

 

A user designed SERDES can provide deserialization factors, that aren't directly supported by the altlvds MegaFuntion with CIII, e.g. 12 and 14. It also allows a clearer configuration of timing parameters. 

 

P.S.: Utilizing the test pattern option of LVDS ADC and PLL dynamic phase shift, an automatic SERDES phase calibration can be realized.
0 Kudos
Altera_Forum
Honored Contributor II
1,075 Views

Hi FvM, 

Unfortunately i cant use 1 PLL with multiple ADCs as each ADC is on a separate board with its own connector. I have designed a Back Plane PCB which splits each HSTC Connector's IO pins only (no dedicated clock pins) to 3 small connectors (for 3 ADC). Thus 4 HSTCs provide me with 12 connectors for ADCs, none of which share pins on the FPGA. Also HSTC on DE3 has only 4 dedicated clock pins (thats y i didnt use them) and quartus wont let me connect PLL to a normal IO pin. 

 

"Alternatively, you can use the bitclock output from ADC as SERDES clock and no PLL at all." 

 

Well this might work. Only worry i have is if the Bitclock will have enough fanout to support SERDES as it is using a normal IO pin instead of a dedicated clock pin. 

 

EDIT 

"Hmmm. Just tried ALTLVDS megafunction. The slow clock has a dutycycle recuirement of 8.33%. One from my ADC is 50%. So i dont know if this approach will work..." 

 

Thanks
0 Kudos
Altera_Forum
Honored Contributor II
1,075 Views

What speed grade device is on the DE3 board? 600Mbps on an input port is pretty fast, as you've only got a 1.666ns bit period.  

My second concern is that, being limited by the board you're using, a number of the clocks will be coming in on regular I/O. That means you're going to use local routing to get to all the I/O in the databus, which means the delay to every I/O will be different. If you're timing constraints are correct, then the Quartus II fitter will modify the delay chains on the input registers to center the data onto the clock edge, but it can only do so much. 

My third concern is that, without a PLL, you're raw clock delay variance will be too large. For example, let's say it takes 3ns for the clock to route to the I/O, which is pretty fast. That's 3ns in the slow corner timing model. In the fast corner model it could easily be half that, especially in the slowest speed grade, so that means your clock delay could vary between 1.5ns and 3ns over PVT, which is pretty much the whole data eye. I'm making these numbers up, but you get the point. A PLL's feedback loop is designed to eliminate PVT variance on the clock tree, which makes I/O timing on fast interfaces much more feasible.  

So I don't think it's a logical issue, but a timing issue, that will be the problem. I haven't built anythign up and tested it out, but have serious concerns. 

Let me know if any of that doesn't make sense.
0 Kudos
Altera_Forum
Honored Contributor II
1,075 Views

Hey Rysc 

Well, it is a C3 speed grade device so it is not the fastest available. I agree completely with what you have said about routing and fitter. Unfortunately i cant use the luxury of PLLs in the device otherwise things would have been easier. 

Quoting 

"If you're timing constraints are correct, then the Quartus II fitter will modify the delay chains on the input registers to center the data onto the clock edge, but it can only do so much." 

 

Well this is where i am not sure if my timing constraints are right. Initially I was using classic timing analyzer because of its simplicity. Simulation showed things working even at IO rate of 600Mbps. When I tried TimeQuest timing analyzer ( and to be honest, guessed some of the constrains) the output got messed up. To get the valid constraints, i tried going through Device datasheets but a lot of stuff there was ambiguous and went over my head. Is getting the constraints right my only chance or i can try some thing different. It might sound crazy but someone suggested that i should try to get it to LUT level and implement stuff (equations!!!) there. I have no idea how to do that but again if all the problems are because of IO/clock missalignments due to different nonconstant delays, things in LUT will get messed up anyways... 

 

Regards
0 Kudos
Altera_Forum
Honored Contributor II
1,075 Views

I didn't realize, that DE3 is using Stratix III rather than Cyclone III. The points I mentioned about user designed software SERDES are basically valid for Stratix III, that also has DDIO capabilities in IO cells. But there are also interesting options provided by the Stratix III hardware SERDES with DPA function. 

 

As you pointed out, you can't use individual PLLs with 12 ADC. As far as I see, there are some dedicated clock inputs at the HSTC connectors. So the SERDES can use either the ADC input clock inside the FPGA or the FCO from one ADC. The phase should be adjusted individually for each ADC. DPA would allow an adjustment of each individual ADC output. In this case, delay skew caused by DE3 board and external wiring could be cancelled.
0 Kudos
Altera_Forum
Honored Contributor II
1,075 Views

Hey FvM, 

The problems with HSTC is that these four connectors have a combined 8 Dedicated clock pins (on reciever side) whereas i need 12x2= 24 (12 adc and each having 2 clocks). lets assume i get past this issue using IO pins, i will need a 16.66% duty cycle, 25MHz signal for "tx_enable" whereas one from ADC is 50% duty cycle and 50 MHz. I am still not convinced if i can use a SERDES/ALTLVDS megafunction.
0 Kudos
Altera_Forum
Honored Contributor II
1,075 Views

Timing constraints are a must. Without them you're really designing blind, as RTL is dependent on the fact that timing is met. 

Are the clock and data coming in edge-aligned or center-aligned? If it's center-aligned, then do the following: 

create_clock -period 1.666 -name adc_clk [get_ports adc_clk] 

create_clock -period 1.666 -name adc_clk_ext -waveform {0.833 1.666};# This is a virtual clock, not assigned to anything physical 

set_input_delay -clock adc_clk_ext -max 0.0 [get_ports adc_data*] 

set_input_delay -clock adc_clk_ext -min 0.0 [get_ports adc_data*] 

 

Naturally change the port names to match your design. This creates a 1.666ns clock coming into the FPGA. It creates a virtual clock that represents the clock at the ADC. I've used -waveform to say it's shifted by 180 degrees(i.e. it's center aligned). The set_input_delay requirements say there is an external register clocked by this virtual clock that sends data to the FPGA in 0ns. Ignoring the 0ns for now, what you'll see is you have 0.833ns setup requirement and -0.833ns hold requirement, i.e. if the data is skewed in relation to the clock by more than 0.8333ns, then it will fail timing. This gives the whole data eye to the FPGA to play with. If you know how much skew the ADC adds and the board, then change it there, i.e. if it adds +/- 200ps of skew, then make the max value 0.2 and the min value -0.2. 

 

If they're edge aligned, then you don't do that -waveform option(it will default to rising edge at 0 and falling at 50% duty cycle).  

 

As for the altlvds, don't use it since you can't use the dedicated hardware. Just use an altddio_in to use the DDR input registers, which gets you down to 300mbits. Then write some code to do another 2:1 mux with registers, getting it down to 150MHz, which is a feasible speed. Although that brings up another question, if you have 12 300MHz clocks coming in, and they're not all on PLLs, you can't just create 150MHz domains. You'll probably want to create a toggling clock enable(search forum for discussions on this) to enable the 300MHz clock every other cycle. Then you'll need FIFOs to get everything together. But that's all later, as you need to make sure you can meet timing on the 600mbps data coming into the IO registers.
0 Kudos
Altera_Forum
Honored Contributor II
1,075 Views

Thanks alot Rysc for the constraints guidance. Helped a lot. Now i have designed the deserializer with an altddio_in first and shift registers afterwords. Simulation shows the right results for the first time using TimeQuest timing analyzer. I will try this out on the DE3 in university tomorrow as I am at home now. Hopefully, it will work out or i'll be troubling you guys again :P 

 

Thanks again to both Rysc and FvM for your help.
0 Kudos
Altera_Forum
Honored Contributor II
1,075 Views

Why did I think this was Cylone III when the title clearly says Stratix III? I guess that shows the competence of the support you're getting. : ) 

There are some threads on it, but simulations, including timing sims, are a far cry from static timing analysis. They always use exact values, while static timing analysis is basically making sure the delays are within a range. Real hardware works that way, i.e. delays vary over Process, Voltage and Temperature. Just wanted to point that out. (Of course, Static Timing Analysis just says delays meet your requirements, but if you're logic is incorrect it won't matter, which is what simulations are for...)
0 Kudos
Altera_Forum
Honored Contributor II
1,075 Views

I finally got it working at 40 Meg Samples per sec i.e. 480 Mbits/sec. Using DDIO_IN along with proper timing constrains was the key. DDRIO_IN gave me two streams of data each of which was 240 Mbps. i used the Bit Clock (240 MHz) for DDRIO and the used Bit Clk/2 to further split the two 240 Mbs streams into 4 streams of 120 mbps. After that i used four shift registers (self designed, not LPM_SHIFTREG megafunction) to convert serial streams to parallel. Self designed shift registers saved resources and gave me more control along with better results. 

 

Another thing was not to use the Frame Clock( 40 MHz) for final laching of data. Frame clock was not working for me at all as it was latching the coroupt data. ( no idea why? theoraticaly it should have been a 50% dudy cycle wave but my guess is that its duty cycle was not consistant). i used a self genrated signal of 40 Mhz with a duty cycle of 33 %. its edges were syncronous with Bit Clock and the low to hight transition accoured after the posedge of Frame clock so that i can allign my data (frame clock was used as a reference point, not to drive latches itself). this provided me with the right timing to latch the final 12 bits. 

 

my advise to others is that if u are tring to do some thing like this, try using dedicated clock pins and make use of ALTLVDS megafunction. this will make your life much more easier. 

 

Thanks to those who helped me out.
0 Kudos
Altera_Forum
Honored Contributor II
1,075 Views

Your solution sounds somewhat complicated, you should be able to run the shift register at 240 MHz directly. If you are not able to operate the design faster than 480 MBPS, the issue is most likely related to timing and possibly signal quality of the LVDS input. Of course, the data must be transfered to the slow (frame) clock, but that's not very critical, I think. 

 

I also suggested to use hardware SERDES (=ALTLVDS Megafunction) with FPGA internal reference clock and DPA. Did you try it?
0 Kudos
Altera_Forum
Honored Contributor II
1,075 Views

As i mentioned above, to use ALTLVDS, either i needed to use an internal PLL or external clock source. The PCB i used to interface ADC with HSTC was only connected to IO pins and Quartus wont let me place a PLL to those IO pins. To use external clock source, i needed a 80 MHz frame clock with 16% duty cycle as it is required by ALTLVDS. (80 MHz instead of 40 because ALTLVDS does not support 12 bits and i needed to go for 6 bit serdes thus doubling the frame rate) this sort of clock was not available from my ADC and using a PLL was not an option open to me. i tried this as u suggested and as i UNDERSTOOD but couldnt get far as i had no idea how to synchronize internal clock with the bit stream coming from ADC with out using a PLL. 

 

Also i can get my design to work upto 500Mbps which is the IO limit for my device( i think i read it in stratix III handbook). i am happy with 40 MHz ( 480 Mbps) as it does the job for me with the oscillator i have available on ADC, although my ADC can go upto 50 MHz.
0 Kudos
Altera_Forum
Honored Contributor II
1,075 Views

Assuming the ADC clock is sourced from the FPGA, you should be able to use an internal reference clock as well. But if your present solution serves it's purpose, there's no need to change anything.

0 Kudos
Reply