TimeQuest: report_timing/report_datasheet

Altera_Forum · ‎08-25-2011

I'd like to create a custom report file that analyzes the range of clock-to-output delays on a bus, eg., the minimum clock-to-output, and the maximum, so that I can then inspect the variation, and relative timing.

I've read through Rysc's TimeQuest guide, and understand that report_datasheet is not an ideal tool for constraining a design. But thats not really what I'd like this info for, its more to get a feeling for the design's overall timing (allowing me to go back and adjust constraints).

What I am missing, is how to get the components of the timing paths via Tcl commands. For example, lets say I have a registered output bus, with registers called reg[] inside the FPGA and ports q[]

report_timing -to [get_keepers {q

[*]}] -setup -panel_name {Timing Report}

Will populate the Timing Report window with the worst-case path in the design. In the 'Data Arrival Path' pane are the 'clock path' delay and the 'data path' delay.

The report_timing command returns the number of paths and the worst-case slack, eg. "1 4.001". What I'd really like to have, is the individual delay components that contribute to this calculation.

What Tcl command can be used to get the individual delays?

Thanks!

Cheers,

Dave

Altera_Forum · ‎08-25-2011

Add "-detail full_path".

Altera_Forum · ‎08-25-2011

--- Quote Start ---

Add "-detail full_path".

--- Quote End ---

Thanks Ryan.

That option adds the details to the GUI, but I want those same timing details returned on the command-line, so I can script them. report_timing only ever returns the number of paths and the worst-case, regardless of the arguments I've tried. Thats why I was wondering if I should be using a different Tcl command.

Cheers,

Dave

Altera_Forum · ‎08-26-2011

get_timing_paths -long_help

By the way, just typing help in TimeQuest gives a list of libraries, and you can then help on them. For example, this is in "help sta". Kind of interesting to poke around. (You can also do "quartus_sh --qhelp" in a shell for a GUI version). I've never seen this asked for, and assume you're scripting up something cool. Wanna share what it is?

Altera_Forum · ‎08-26-2011

--- Quote Start ---

get_timing_paths -long_help

By the way, just typing help in TimeQuest gives a list of libraries, and you can then help on them. For example, this is in "help sta". Kind of interesting to poke around. (You can also do "quartus_sh --qhelp" in a shell for a GUI version).

--- Quote End ---

Cool, I'll have a look around.

--- Quote Start ---

I've never seen this asked for, and assume you're scripting up something cool. Wanna share what it is?

--- Quote End ---

It could just be that I'm doing something wrong too :)

I'm interested in just seeing the individual delay contributions, and seeing how things change. Some of the examples I was playing with;

1) A simple register design with an 8-bit register from input ports back to output ports. Most of the timing delays on the signals were all very similar except for one signal. The (min-max) clock-to-output delays for all the signals was about the same, its just that one had larger output register delays ... weird. I'll post the example synthesis script and timing analysis.

2) An Avalon-MM slave SRAM controller, with generics to set the SRAM timing parameters. I then want to analyze all the clock-to-output delays, and using those calculate the minimum write pulse, minimum read pulse, and minimum setup/hold times relative to all signals. I'll see if I can convert it into some format where I can use a waveform viewer ... perhaps a value-change-dump file? I recall the Icarus verilog guys use GtkViewer to look at waveforms ... I'll check that out (or perhaps I can just use Modelsim).

3) Using the Resource Property editor, the Chip Planner, etc. to view how placement affects the delays.

I'd also like to see how these timings relate to the post-P&R Modelsim simulation results too.

Just curious ... but I'll try to put anything useful into a document for you to add to your wiki page.

Cheers,

Dave

Altera_Forum · ‎08-26-2011

I find most analysis is easier to do with the GUIs. Pulling it out in scripts tend to complicate things. Item 2) sounds better since it's more of a decision based on analysis, but my concern is with the flow. It's more of a "analyze what the timing is and make the RTL work that way", where timing analysis usually works the other way. In fact, by knowing your constraints, you know what all your delays can be and if it meets them. If you get negative slack, that immediately tells you how much you failed by. Just want to make sure you're not doing more work than necessary.

Altera_Forum · ‎09-06-2011

Hey Ryan,

This is likely the longest question you've had to read (see the attached document).

The document and code contains two designs; a synchronous design and an asynchronous SRAM design.

Both designs are analyzed for timing, with some interesting results regarding the VREF pins.

What is still not clear to me, is how to write the 'final' SDC constraints for the SRAM interface. Perhaps you could clarify that for me, and then I'll update the document, and you can add it and the code to your wiki entry.

Cheers,

Dave

Altera_Forum · ‎09-06-2011

Wow. Definitely a lot of work. I haven't had a chance to read the whole thing(traveling for work this week and trying to finish up a number of things), but some quick thoughts:

1) The register example concerns me in that the FPGA design is input -> reg -> output. As you're example shows, you're really forced to balance the input delays and the output delays. The thing is that most designs are input -> input_reg -> logic with many register levels -> output_reg -> output

Because of the logic inside, the input timing and output timing do not directly work against each other. Now, for the design that is setup like yours, it's a good example, but usually there would be some soft of logic too(a decode, mux, etc.), as just registering the data and sending it out again doesn't do much. I know you're doing it as a timing example and don't want to add logic to confuse, but my point is that adding logic will make the design more complex, i.e. if the logic is before and/or after the register, that will complicate timing even more. But I guess the main point is that, as a generic example, the fact that there is only one register deviates from most designs.

2) Asynchronous RAMs are a pain. The recipe I follow is to treat the output and input side almost as if constraining between two different FPGAs. I generally do something like the following:

- Create a virtual clock.

- Constrain the address going off chip as tight as possible. That means first putting in a -max delay and increasing it to the point it doesn't fail. That means the output timing is as quick as possible. Then add in a -min value to catch the other side. You now have a Tco max and min for the address going out.

- Add those values + the max and min round-trip external delays through the SRAM and across the board. This is the external delay for the set_input_delay on the data. (If you are taking multiple cycles to do a read, then multicycles may be necessary).

That covers the read side. The write side is treated like any source-synchronous interface, except the WRITE strobe is treated as a clock.

Altera_Forum · ‎09-06-2011

--- Quote Start ---

Wow. Definitely a lot of work. I haven't had a chance to read the whole thing(traveling for work this week and trying to finish up a number of things)

--- Quote End ---

Thanks for taking the time to respond!

--- Quote Start ---

1) The register example concerns me in that the FPGA design is input -> reg -> output. As you're example shows, you're really forced to balance the input delays and the output delays. The thing is that most designs are input -> input_reg -> logic with many register levels -> output_reg -> output

Because of the logic inside, the input timing and output timing do not directly work against each other. Now, for the design that is setup like yours, it's a good example, but usually there would be some soft of logic too(a decode, mux, etc.), as just registering the data and sending it out again doesn't do much. I know you're doing it as a timing example and don't want to add logic to confuse, but my point is that adding logic will make the design more complex, i.e. if the logic is before and/or after the register, that will complicate timing even more. But I guess the main point is that, as a generic example, the fact that there is only one register deviates from most designs.

--- Quote End ---

Yeah, the example is really there as a tool intro. I have another example that has input registers, fabric registers, and output registers. That allows Fast I/O register constraints to be applied at both ends. I'll add that to the next version of the document.

--- Quote Start ---

2) Asynchronous RAMs are a pain.

--- Quote End ---

Ah-ha! A quote for my document ... :)

--- Quote Start ---

The recipe I follow is to treat the output and input side almost as if constraining between two different FPGAs. I generally do something like the following:

- Create a virtual clock.

- Constrain the address going off chip as tight as possible. That means first putting in a -max delay and increasing it to the point it doesn't fail. That means the output timing is as quick as possible. Then add in a -min value to catch the other side. You now have a Tco max and min for the address going out.

- Add those values + the max and min round-trip external delays through the SRAM and across the board. This is the external delay for the set_input_delay on the data. (If you are taking multiple cycles to do a read, then multicycles may be necessary).

That covers the read side. The write side is treated like any source-synchronous interface, except the WRITE strobe is treated as a clock.

--- Quote End ---

Ok, so the method of constraining the interface is a hack ... not a problem ... its just not in any 'recommended practices' document ...

Thanks for these initial thoughts! If you have any other tool hints, I'd like to add those too.

Cheers,

Dave

Altera_Forum · ‎09-06-2011

Your register example is different/complementary from a lot of what I've done, i.e. I try to explain how to enter the constraint. You quickly give the constraint and spend a lot of time analyzing the post-fit results and trying to close timing. Very valuable and something I haven't spent a lot of time on. They're very distinct things, as I sometimes work with people who talk about not being able to meet timing although they have incorrect constraints and can't read them. Not until the constraints are done should anyone worry about timing.

For asynchronous RAMs, TQ doesn't have any easy way to connect all the output addresses to the data coming back(there is one that's a painful hack). That makes it a two step process of constraining one side first, then using that to constrain the other side. Note that synchronous RAMs are very nice, in that you use the clock being sent off chip as the clock for set_input_delay constraints on the data coming back, and the whole round-trip read is analyzed correctly.

Altera_Forum · ‎09-06-2011

--- Quote Start ---

Your register example is different/complementary from a lot of what I've done, i.e. I try to explain how to enter the constraint. You quickly give the constraint and spend a lot of time analyzing the post-fit results and trying to close timing. Very valuable and something I haven't spent a lot of time on. They're very distinct things, as I sometimes work with people who talk about not being able to meet timing although they have incorrect constraints and can't read them. Not until the constraints are done should anyone worry about timing.

--- Quote End ---

That is an interesting observation.

The main reason I started that way, was to show in the GUI waveforms where those numbers end up.

My designs are often FPGA-to-FPGA, and I want to know the maximum clock rate over those paths. Given that I have no information on the 'external device' timing, I put in some random constraints, and move to analysis.

--- Quote Start ---

For asynchronous RAMs, TQ doesn't have any easy way to connect all the output addresses to the data coming back(there is one that's a painful hack). That makes it a two step process of constraining one side first, then using that to constrain the other side.

--- Quote End ---

I can live with that. My main complaint with the available TimeQuest documentation, is that it does not explain that anywhere.

--- Quote Start ---

Note that synchronous RAMs are very nice, in that you use the clock being sent off chip as the clock for set_input_delay constraints on the data coming back, and the whole round-trip read is analyzed correctly.

--- Quote End ---

I see the DE2-70 and NEEK have SSRAM. I'll add examples for those interfaces.

Cheers,

Dave