About edge aligned souce synchronous interface

Altera_Forum · ‎08-19-2011

I am watching one of Altera's online tutorial about DDR source synchronous interface.

It talked about the edge aligned approach, where as shown in the attachment, the data and clock are sent out without shifting the clock.

And it said we should set_multicycle_path to 0 in this case because the data is latched at the same edge it is launched.

How is that possible? If they(the edge and the data) both arrive at the off-chip device at the same time, how could the clock sample the right data?

Altera_Forum · ‎08-19-2011

It depends what the other device is doing. Let's say it's feeding another FPGA that phase-shifts the clock 90 degrees. In that case, everything works out fine. You're right that somebody needs to shift the clock into the middle of the data eye, it just doesn't have to be the device you're constraining. (Rather than doing edge-aligned, I usually say that the other device is doing the phase-shift by putting a "-phase 90" on the create_generated_clock applied to the output port sending the clock off chip. This does nothing directly to the clock, it just says the external receiver is shifting it).

Altera_Forum · ‎08-19-2011

The question raised by ertss is very reasonable and I don't think Rysc answered it. The question has two aspects; one is about concept of edge-aligned same edge capture and the other about how timing is controlled then... Rysc answered by passing timing responsibility to external device, yet the very subject is about how the fpga should be told to control timing (e.g. multicycle issue ...etc). I personally find this question very difficult to answer.

In fact, the most important clue is to know about the difference of these two cases(as documented by altera): edge aligned same edge capture and edge aligned opposite edge capture. What then decides which is which. It could be explained as follows:

The default setup relationship is between current launch edge and next latch edge and the delay equations apply then directly. If however it is same edge capture then first what is this same edge capture? It could occur if clock is delayed such that the current edge indeed physically captures instead of next edge in the external device. In this case I wonder what will we get if we apply delay equation :

delay = tSU + data delay -clock delay

so if say tSU = 1ns, data delay = 0, clk delay = 4 ns then

delay = 1 + 0 - 4 = -3ns

I wonder is altera saying don't use minus but add multicycle of 0 or one clk period instead !!!

In particular I note that altera in this context does not use the terms lauch/latch clocks but data clock/output clock which implies edge aligned at fpga but not at latch of external device.

We certainly need Altera to help us understand their concepts.

Altera_Forum · ‎08-20-2011

For discussion purposes, I am only going to talk about setup analysis, but hold analysis is pretty similar. Note that I/O timing generally only has 2 values from your .sdc files, the setup relationship and the external max delay. The setup relationship is based on your clock relationships(so multiple assignments are used), but the end result is there is only one number for setup relationship. The external delay is pulled directly from your set_input/output_max_delay constraint.

So let's take an 8ns clock running a source-synchronous DDR interface. The data window is 4ns. There are 3 basic ways to describe the clocks. One is with a phase-shift of 90 degrees, so that there is a 2ns setup and -2ns hold relationship. If the FPGA is phase-shifting the clock, then this should always be the relationship. If the FPGA is not phase-shifting the clock, then the external device is. (The way source-synchronous interfaces work is that someone shifts the clock edge into the middle of the data window. If you don't do this, then Quartus will try to add delays to meet timing, which vary over PVT, and the interface performance will suffer).

The second method is to say the interfaces are edge-aligned, and that we're transferring data to the next edge. That would be a 4ns setup relationship. The third method is to not have the clocks edge-aligned but have a multicycle to state same edge-transfer, or a 0ns setup relationship. Now let's look at our external delays. For the one with a clock shift, let's say the external delays are 0.8ns. So the setup - external delay is 1.2ns.

Now the one with a next edge transfer, where the setup relationship is 4ns. In this case the external delay would be something like 2.8ns. The setup - external delay is 4 -2.8, which still results in a 1.2ns for the FPGA to work with. The third option might have an external delay of -1.2ns. So the 0ns - (-1.2) = 1.2ns. Once again the same relationship.

This is what can happen with two variables being used, you can switch them both two different values and end up with the same result.

Why would anyone do that? It's because external devices spec themselves in different ways. For example, if the FPGA is transmitting, one receiver might say that it will center align the clock, but that it can accept up to 1.2ns of skew(i.e. it's adding another 0.8ns on top of it). Another receiver might never say it's phase-shifting the clock(hence an edge-aligned relationship), but it would have a Tsu of 2.8ns. A third device might not say it's phase-shifting the clock, but it's Tsu is -1.2(I've seen this before).

But let's look at another example. Let's say the external device is not phase-shifting the clock but it says its Tsu is 0.8ns. If you look at the waveforms and understand how source-synchronous interfaces work, then it's up to the transmitting FPGA to phase-shift the clock and make the relationship go from 4ns to 2ns. (I admit, this is hard to picture the "window" when I only talk about setup. I'm certainly being brief). I'm working on a document that I will post next week for these interfaces, and will put a link on here. I wanted to add a lot of examples to real devices(TI, Maxim, ADI, etc.) but that will probably have to wait for round 2.

(Note that there is another case where nobody is phase-shifting the clock and the external delays are not "shifted". This is basically the case where someone should shift the clock, but just can't. The only cases I've seen this is when the FPGA is the receiver, getting edge-aligned clock/data, but the FPGA is either out of PLLs, or the clock is a strobe and hence a PLL won't lock. In these cases the same-edge transfer is generally best, as the design will need to close timing through delays, but the clock tree is generally longer than the data path, so it has a natural advantage).

Finally, if you have a specific circumstance, please put it down. As you can see, it's really hard to talk in generalities on a forum. Trainings are also difficult because they need to cover all cases, while most users are only thinking of 1 and wondering why it's so difficult.

Altera_Forum · ‎08-20-2011

Thanks Rysc for the detailed reply. I agree with all what you say but let me rephrase my question:

The equation for max output delay is this:

max delay = tSU + data delay - clk delay

and this equation applies if current launch is captured by next latch.

What I am raising is why not use same above equation for the case of edge aligned same edge capture since this scenario means somebody has to delay the clock.

In numbers and assuming your example of delaying clock by 90 degrees then max delay = tsu +data delay - 2 ns

e.g. = 1 + 0 - 2 = -1

TimeQuest does not want us to use the equation as such, instead it says add one UI (or set multicycle to 0); if we follow that we get:

max delay = 1 + 0 -2 + 4 = +3

Thus -1 & +3 mean to me the same in the sense of wrap-up. Why have they gone towards complicated two types of exceptions rather than using the one basic equation or adding it as third method. After all clock delay is clock delay whether it is by board or pll or any else.

edit: one final mystery. if clock and data is edge aligned at fpga then what is set output delay doing !!! aren't we lost in vague documentations??

Altera_Forum · ‎08-20-2011

I won't disagree with your last statement.

One important point is that clock shifts and clock delays are very different. Clock delays are just physical delays that we all understand. Clock shifts define what the "ideal clock" looks like and affect the setup and hold relationships.

Let's say the FPGA is transmitting edge-aligned and the external device is not phase-shifting, so the setup relationship is 4ns and hold relationship is 0ns. Let's also say the external delays are 0 for now, just to keep it simple. Quartus II fitter tries to "solve" the setup and hold relationships by adding 2ns of delay to the output data path compared to the clock path. If it does this exactly, there will be 2ns of setup slack and 2ns of hold slack. (Of course it can't add exactly 2ns, due to PVT).

Now, if the user adds the multicycle 0 to get same-edge transfer, the setup relationship becomes 0ns and the hold relationship becomes -4ns. The way Quartus would solve this is by adding -2ns to the output data path compared to the output clock path. So they are very different things in this case.

I agree when we add in the external delays, it's possible to make one look like the other, but when looking at external datasheets, often one of these two scenarios fits perfectly, while the other one becomes a kluge of shuffling numbers around.

Altera_Forum · ‎08-22-2011

Rysc, if I understand correctly, you are saying that we should only

use set_multicycle_path 0 command when the off-chip device has

a negative Tsu?

Or it's that we can still use set_multicycle_path 0 command in other circumstances as long as it yields the same analysis result?

Altera_Forum · ‎08-22-2011

Sort of. When driving a device and you've set up your clock constraints to be edge aligned, you need to decide which transfer works best for your external delays. If the external delays seem to have a "positive shift", such as a Tsu of 2.5 and Th of -1.5, which would become external delays of -max 2.5 and -min 1.5, then they have a "shift" of 2ns and it would make sense to put it into the default next edge. (8ns clock, 4ns data window). If the Tsu was -2.5 and Th of 1.5, which become external delays of -max -2.5 and -min -1.5, that is like a data shift of -2ns(with +/-0.5ns of skew) and hence a same edge transfer makes sense.

I believe there are other cases, such as when the FPGA is the receiver, and the transmitter might say it's data comes out with a skew of -2.5 to -1.5ns compared to the clock. Again, there is no phase-shift on the clock per se, but the data seems to be shifted.