Re: Timing contraints/assignments for async. static RAM

Altera_Forum · ‎10-01-2007

Hello,

I am porting a already working design from Cyclone I to Cyclone II. This design is required to work at least at 80MHz. The design includes a interface module to external asynchronous static RAM (ASRAM).

The Cyclone I design runs at 60MHz without any problems, but on Cyclone II the timing of the ASRAM seems to be more critical and it runs at only 50MHz or less.

The Cyclone I design includes no timing assignments for interfacing with the ASRAM. Maybe the Cyclone II design needs such timing assignments or contstraints. Assuming that I am not the first one in the need to make ASRAM timing assignments, could someone provide me an example of such assignments or constraints?

Thanks in advance

Christian

Altera_Forum · ‎10-01-2007

--- Quote Start ---

The Cyclone I design includes no timing assignments for interfacing with the ASRAM. Maybe the Cyclone II design needs such timing assignments or contstraints.

--- Quote End ---

I'll let someone else give you help specific to your design. To be proper, every design should have every internal path and every I/O path constrained even if Quartus happens to give you the timing you need without constraints.

The Fitter uses the timing constraints. Depending on whether you have the design partially constrained or not constrained at all for timing, the Fitter might not be giving any effort to performance. You probably got a compilation message about that.

Even if the real design requirements are very easy for the Fitter to meet without optimizing for performance, a proper design is fully constrained so that anyone looking at the design knows the design requirements and knows whether those requirements are still being met in the future when something changes that could keep those requirements from being met by chance.

Altera_Forum · ‎10-01-2007

If you don't have I/O timing constraints, how do you know the ASRAM only runs at 50MHz? Or do you just put it into the system, change the clock rate, and see where it fails? In general, Cyclone II should be running faster than Cyclone I(and Cyclone III should be even faster).

So, how do you do the constraints? Are you using TimeQuest or the Classic Timing Analyzer? For a simplistic approach, you want the fastest clock to out, and the fastest setup time, and then need to calculate how this fits into your total clock period. For example, if your data/address/control signals Tco is 7ns, your board delay is 1ns, the ASRAM Tco delay is 9ns, and your Tsu is 2ns, then you have a full round-trip delay of:

7ns(get off chip) + 1ns(board delay to ASRAM) + 9ns(through ASRAM) + 1ns(board delay back) + 2ns(clock data back onto chip) = 20ns, i.e. you could handle a 50MHz clock. Reducing the Tco and Tsu requirements would then give you better times.

Anyway, what timing engine are you using? Do you register your inputs and outpouts? Do you know how to enter IO constraints? Do you know how to check if I/O registers are being used? Are you using a PLL? These are all separate topics for constraining this, so please provide more information and hopefully we can point you in the right direction.

Altera_Forum · ‎10-03-2007

--- Quote Start ---

If you don't have I/O timing constraints, how do you know the ASRAM only runs at 50MHz? Or do you just put it into the system, change the clock rate, and see where it fails?

--- Quote End ---

To be honest, yes. And I haven't had such problems in the past. The only method I used to affect timing behavior was the "fast output register" logic option.

--- Quote Start ---

So, how do you do the constraints? Are you using TimeQuest or the Classic Timing Analyzer?

--- Quote End ---

I use the Classic Timing Analyzer. Thanks for your detailed explanation! Sticking to what you've said, I have entered Tco and Th constraints for all my output pins (CS, BE, WE, OE, Address) as well as Tco, Th and Tsu constraints for the bidirectional (Data) pins.

--- Quote Start ---

Do you register your inputs and outpouts?

--- Quote End ---

Because I need to have only two clock periods for each ASRAM access, some are, some not.

--- Quote Start ---

Do you know how to check if I/O registers are being used?

--- Quote End ---

Is there another way than looking at the HDL code if there is some combinatorial logic between the last register and the pin?

--- Quote Start ---

Are you using a PLL?

--- Quote End ---

Yes.

After having entered the constraints mentioned above, I still have problems with the RAM when using a system clock of 50MHz. Strange enough, it seems to work at 70MHz. I guess there might be a bus conflict on the Data signals of the 50MHz design - maybe the FPGA doesn't deassert its pins fast enough after a write access. Are the Tco and Th constraints used to affect the timing of the tri-state buffer of the pin, too?

Altera_Forum · ‎10-03-2007

The Classic Timing Analyzer should be able to report Tsu/Th/Tco and min Tco constrants for each IO, even if you don't have them constrained. (The constraints tell the fitter to try and do a better job, and tell you if it passed or failed). In the end you'll need to determine what constraints you can handle.

1) It is strange that it starts working when you run it faster. Without knowing much about the situation, that often points to a hold violation.

2) With hold violations, you may need to do min timing analysis too. The standard timing model is worst case delays of the part at worst case PVT(process, voltage and temperature). The Fast timing model is the best case model for PVT. This analysis can be turned on under Assignments -> Settings -> Timing Analysis -> More Settings -> Enable Fast/Slow Analysis. Note that you don't have to change your constraints, as they are valid over the whole timing analysis range. This model basically let's you "complete the picture" as delays will fall within this range.

3) How come you have two clock cycles? If the ASRAM is completely asynchronous, you use one clock cycle to send address/control to it. Are you sending new address/control values every other clock cycle(thereby giving you two cycles)?

4) Can you take the reported delays and draw out your transfers, i.e. a clock edge sends data off chip, through the external memory, and back again to be latched by another clock edge. That's your setup path and you want to make sure the full path is quick enough to meet requirement. Also look at the min timing path, in that the fast model doesn't complete this path too quickly and corrupt the next data(this might occur if you thought you were giving yourself two clock cycles , but are sending addr/cmd on ever clock cycle, so that if the roundtrip occurs too fast it corrupts the previously send data.

5) Finally, you might have to do some debug, such as SignalTap, to see what's failing. If you know what you send out and what you expect to get back, it might be apparent pretty quickly if you're off a full cycle, if certain bits are getting captured incorrectly, etc.

I know this seems like a lot when it used to "just work", but understanding IO analysis is difficult and is one of the most important steps for a design. Designers who skip this step often ship things that work marginally. It allows you to be more confident in your designs and let's you move towards more demanding IO structures, where proper IO analysis is a must. Good luck.

Altera_Forum · ‎10-05-2007

4) Through "Report Delay" assignments, I managed to get the delays for the ASRAM signals. Looking at these delays, the following question comes up:

Why does the Timing Analyzer not report timing violation, even if the timing constraints are not met? For example, I have entered a Tco Requirement of 7.5ns for the SRAM1_CS_n pin, but the reported maximum delay is 16.912ns for this pin.

Have I entered invalid constraints? Or is the reported delay not what I expect it to be? Is it valid to enter the PLL's "_clk0" output as a clock source for the Tco Requirement?

1) & 2) As you have suggested, I have enabled the Fast/Slow timing analysis. Both models do not produce any timing violation messages.

3) At 80MHz, the ASRAM is too slow to complete a transfer in one clock cycle, especially the write cycle. Therefore, I have designed my interface so that it generates the timing for the ASRAM in two cycles. This also means that the Address and CS signals remain stable over a period of two cycles.

5) The next thing I will do is to try my design with Quartus v7.2 (if this version works without problems) because there might be a bug in the Timing Analyzer in v6.1. While developing the ASRAM interface for Cyclone I, I have already used SignalTap to debug my transfers, anyway I will now also try and put a SignalTap into my Cyclone II design so that I can be sure nothing has changed since then.

I understand that IO constraining is important, but until now it simply didn't work. Constraining gives the impression that things are under control, determined... but if it never works, the whole thing is useless and brings up more uncertainty and mistrust than confidence. Especially in my case, where the thing worked at Cyclone I already without any constraints.

Altera_Forum · ‎10-05-2007

Report Delay isn't a constraint, but an assignment that tells Quartus to report that delay. (The Classic Timing Analyzer is somewhat static in what it reports. It dumps a lot of information, but doesn't allow you to query for what you want. TimeQuest is much, much better for allowing you to search for paths.)

Look at the Ignored Timing Constraints portion of your timing report. I'm guessing your Tco assignment is there. It might say why, but the basic reason is that the output of a PLL is not a valid starting point for a Tco. (The reason being, the delay to the output of your PLL could theoretically vary from 1ns to 100ns and you'd still get the same Tco reported. This is why Tcos are reported from the source level clock). Also, I generally don't add a source when doing a Tco, but do a single point constraint. This means I put the name of the top-level port/s in the To column, Tco as the assignment, and the value I want in the Value column. The From column is left blank.

You can add this in the Assignment Editor and then just rerun the Classic Timing Analyzer(rather than a whole fit) and see if the constraint took, or if it shows up in the ignored constraints.

Also, your Tco should actually get better once it's analyzed from the clock port, because the delay through the PLL is negative(compensating for the PLL's feedback path). It won't go from 16.912 to under 7.5, but it should improve. And once you see a Tco panel where the port has a requirement, value and slack, so you know it's constrained, you can right-click on that path and do a List Path, which will give very detailed information down below.

As for constraining giving the impression that everything is under control, it absolutely should, and designs absolutely rely on constraints being 100% correct, but there is a non-trivial amount of design work in getting the constraints into the tool. (That goes for any static timing analysis tool...)

Altera_Forum · ‎10-05-2007

--- Quote Start ---

I have entered a Tco Requirement of 7.5ns for the SRAM1_CS_n pin, but the reported maximum delay is 16.912ns for this pin.

Have I entered invalid constraints? Or is the reported delay not what I expect it to be? Is it valid to enter the PLL's "_clk0" output as a clock source for the Tco Requirement?

--- Quote End ---

Classic Timing Analyzer tsu, th, tco, and min tco constraints and reporting are always the I/O timing between the device pin for clock and device pin for data (lots of people misunderstand this). You should not use an internal point in the clock network like the PLL output as the clock for these constraints. What is happening between the clock device pin and the register is calculated for you automatically and can be seen by doing the "list paths" Rysc mentioned on a particular I/O path, although with Classic Timing Analyzer list paths some of the PLL details get lumped together in a single number (TimeQuest report_timing lets you see more of the detail). If you are using a PLL to clock an output register, tco is from clock device pin through PLL through output register to data output device pin. As Rysc mentioned, that PLL portion of the tco calculation can be a negative number for the sum of the PLL compensation delay and user-entered PLL phase shift.

--- Quote Start ---

The next thing I will do is to try my design with Quartus v7.2 (if this version works without problems) because there might be a bug in the Timing Analyzer in v6.1.

--- Quote End ---

The Classic Timing Analyzer can't handle everything that TimeQuest can, but what you are doing is probably within the realm of what the Classic Timing Analyzer handles just fine. I very much doubt that there is a bug in 6.1 affecting tco for single-data-rate outputs.