Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
17252 Discussions

Improving the performance of LVDS DDR at 200MHz

Altera_Forum
Honored Contributor II
1,802 Views

I am trying to improve the performance of an LVDS DDR interface.  

I currently use DDIO registers to create a DDR interface and then send it out via LVDS. I have 33 DDR channels with one latching clock. 

 

At 200MHz, the ideal DDR data window is 2.5ns. When I put the interface through timequest (with no board skew) I find that my data window has reduced to 1.983ns. I am dropping 0.517ns. After some real analysis I have tried to capture where my loses are 

 

data bus routing (inside FPGA) = 118ps 

latching clock skew (due pos and neg edge delay differences) = 64ps 

Clock uncertainty = 50ps 

emulated driver switching = 160ps 

on die varation propagation delay = 125ps 

 

I would like to get this interface up to 400MHz but I need to reduce some of these losses.  

 

I suppose I could correct the data routing delay inside the FPGA by adding delay on my PCB tracks? I could use a true LVDS driver instead of an emulated driver. Anybody got any ideas on how to reduce the on die variation propagation delay or anything else? 

 

Thanks 

 

C
0 Kudos
10 Replies
Altera_Forum
Honored Contributor II
1,020 Views

I'm surprised the data-bus routing is so high. 

That being said, and not looking at what you have, I have done emulated differential drivers(HSTL) and usually see about 600ps of skew compared to the clock. The majority of this is on-die variation. It hurts because it cuts both ways, i.e. it models the clock as slower than the data for setup, and longer for hold.  

There is nothing that will reduce ODV, except going to a faster speed grade. Also, one thing to note is that ODV does not take into account locality. Two outputs right next to it have the same ODV as two I/O on opposite sides of the die, so there might be some pessimism with well laid out I/O, but there's no way to be sure. 

If you go to True LVDS, the channel-to-channel skew drops to 100ps, which is fantastic. This does not use a global clock and everything is dedicated. It's really around 800Mbps-1Gbps where you should go to that, depending on the situation.
0 Kudos
Altera_Forum
Honored Contributor II
1,020 Views

You're using a global, which is about as low-skew as you get for general logic. Note that ODV accounts for the fact that, when one path might be pegged at the slow corner, other paths just aren't that bad. This can be due to process(two paths next two each other), but also affects in the FPGA like cross-coupling with a completely different signal may slow one signal down, or there may be a slight power-variance. There's really no way to avoid this and you're using the best path. ODV annoys the heck out of me, but I also think it's pretty real. (I always wonder if other vendors model that? It can make Altera devices look slower, where there's obviously big risk for an interface like this if it were ignored.) 

The True LVDS does not use a global clock tree, but a dedicated clock tree along the left and right edges that are probably laid out for minimal variance. That's why it can get such a low number. 

How much skew can be tolerated? I'm actually surprised you can't get 800Mbps, as that generally seems do-able in what I've seen. But it's getting faster where things start to fall apart. Also, is the board laid out or are you using a device that doesn't have True LVDS?
0 Kudos
Altera_Forum
Honored Contributor II
1,020 Views

Oh, and report_skew is sometimes easier to look at, but setup and hold is the best methodology. Skew measures all combinations of outputs, i.e. worst case rise/fall of clock and data. Setup and hold knows that data being captured on the rising edge externally only needs to be analyzed against the rising edge of the clock being sent off chip, and vice-versa. It doesn't analyse combinations that aren't real. I haven't fully analyzed it, but thought setup/hold analysis might buy you a few picoseconds. (Skew also doesn't relate to the clock, i.e. it's skew across the whole bus, where you don't care about skew between two data ports, just skew between the data and the clock. That definitely should buy you something.)

0 Kudos
Altera_Forum
Honored Contributor II
1,020 Views

Rysc thanks again for taking the time to reply. I appreciate your help. 

 

To answer your questions 

 

1) The board is not laid out at the minute because I am analysing the timing to see if its going to work. The reason I am not using True LVDS is that I am trying to get in to a 780pin stratix 3. With 33 LVDS channels I can't fit all the true LVDS drivers on to one side ( the other sides are going to be used for other interfaces memory etc..). I can't use a mixture either because the skew between a true LVDS driver and emulated driver makes timing difficult. 

 

2) I understand your point about the 'report skew'. For actual timing I always use setup and hold analysis. I used the 'report skew' command in this case because it is easy to spilt out the ODV from everything else. 

 

 

I have produced another version of my test design using True LVDS drivers inside a EP3SE110F1152C3 device. When I put it through Timequest, my setup and hold slack was worse by around 20ps each. The ODV was 23ps more (now 241ps) than the emulated case (218ps). Just to note as well the picture that I posted a few messages back showed the timequest result of the true LVDS case instead of the emulated case. I have reposted both results to this message , sorry :(. 

 

When I comb throught the fitter resources used, I see that in the true LVDS case, no regional clocks are being used, only globals. The emulated case is the same. From what Rysc was saying, the true LVDS case should be using a low skew regional clock.  

 

Any ideas why the true LVDS case isn't doing that? 

 

All my design contains is the DDIO blocks. My input reference clock for the PLL is on the same side as the LVDS signals and it is going in through a dedicated clock input. :eek:
0 Kudos
Altera_Forum
Honored Contributor II
1,020 Views

The True LVDS ports can also be emulated LVDS, which is what I think you're doing. True LVDS has to be created with the altlvds_tx block, and it is analyzed with TQ's "Report TCCS" command. You'll find that command gives almost no information(the TQ User Guide I put together explains this), just a TCCS of 100ps. The device handbook discusses TCCS in more detail too. But what I think you're looking at is emulated LVDS on a True LVDS channel. (Yes, I don't think it could have been made more confusing if they tried.)

0 Kudos
Altera_Forum
Honored Contributor II
1,020 Views

You are right Rysc, I am just using DDIO blocks without the altlvds_tx megafunction. So access to the True LVDS circuitry can only be obtained via this megafunction. 

 

I had tried to use the altlvds_tx block before but I had two issues with it. 

 

1) I have a serialising factor of 2. 66 bits down to 33 LVDS channels. This creates my DDR data. However, I require the clock to be centre aligned. There seems to be no option to centre aligned it since by default its edge aligned. I could start rooting about in the megafunction verilog but I am not ready for that yet. 

 

2)When I installed it into my design (did it again today) to check out its performance. Its performance seems to be exactly the same as before, 241ps ODV, setup and hold just the same. It says in the fitter resources that it is not using any dedicated SERDES transmitters. The altlvds_tx is definitely in the code but I think its getting broken down into exactly same circuitry as I have without the SERDES. I read someplace else that at a serialisation factor of 2, the altlvds_tx block just uses the DDIO primitives anyway which would explain what I am seeing. 

 

I seem to be stuck where I can't get access to the true LVDS routing unless I use the altlvds_tx but at a serialisation factor 2 the altlvds_tx doesn't use the dedicated routing :cry:  

 

Yes Rysc it is confusing, very confusing. 

 

Thanks 

 

C
0 Kudos
Altera_Forum
Honored Contributor II
1,020 Views

C.  

I wonder what is at the receiving side? Another FPGA under your control? In that case you could try to constrain those inputs with the obtained results. At 400 MHz you still have an eye of 733 ps and assuming that the PCB layout doesn't add any skew, operation should be possible by fine-tuning the phase of the 'centre-aligned' clock you are sending along. (You actually could lengthen/shorten traces to correct any unbalances due to either FPGA)
0 Kudos
Altera_Forum
Honored Contributor II
1,020 Views

Josyb, 

 

Thanks for your reply. Unfortunately its not another FPGA. Its a custom ASIC. Good idea though. Varying the board traces could allow me to reduce some of the data skew. 

 

Thanks for your suggestions 

 

C
0 Kudos
Altera_Forum
Honored Contributor II
1,020 Views

I assume your data rate isn't 400MHz, so can you go to a serialization of /4. Also, I'm pretty certain there's a way to shift the clock out. Maybe it wasn't allowed in /2 mode?

0 Kudos
Altera_Forum
Honored Contributor II
1,020 Views

Although my initial clock rate is 200MHz, I would like to increase the speed up to 400MHz clock, thus 800Mbits data. My serialiser thus needs to be /2. You are right that when it goes to /4 the phase option appears.  

 

One other thing I did was to cut the data bus down to 1 bit. I gained 50ps (at 400MHz clock) of data window but the ODV did not move. I didn't really expect it to move though :(. 

 

Thanks for your continued help Rysc 

 

C
0 Kudos
Reply