Designing with Incremental Design?

Altera_Forum · ‎09-16-2007

Hello to all! This is a new thread I've started coming from my Problems with Logic Lock thread.

Now to let everyone know I was using Q2 version 7.0 and trying to follow directions from the Handbook to Logic Lock sub functions of the design. However when importing them up into the top level Logic Lock itself was failing. The feedback I got from Altera and this forum was that Logic Lock no longer works that way and that I should try Incremental Design instead. Now since I had tried that at first but the design didn't meet ID's requirements and I couldn't follow how ID was to be done so I had dropped it.

Anyway I'm now upgraded to Q2 v7.1 SP1 and have read the new ID documentation. I'm still very disappointed in the documentation and here I put one of my first four questions. The others I'll enter as replies to split them up for simplicity sake.

First question:

It is stated in several points in the ID documentation that the design leader needs to make the Global assignments in the top level and "export(?)" them to the lower level partitions for the designers to use in their parts of the design. (I believe I got some of the wording wrong but the jist is correct.)

Well....

Exactly, in excrutiating details, how is this done?

There are no directions anywhere for how to do this!

Altera_Forum · ‎09-16-2007

My second question:

Situation first;

I have from 17 to 21 different clocks in this design. 7 to 8 Globals and 10 to 13 Regional or Dual-Regional clocks.

I also have 2 Global Reset/Presets in the design.

Now 3 to 4 of the clocks are generated from PLLs, 3 to 4 are inputs or clock input and the rest are generated from logic.

The number of clocks is a result of the different devices that have to be interfaced with in the design.

Examples:

The cameras require 3 clocks of up to 200MHz with phase and cycle time differences. These I generate from An EPLL.

The transducers require 4MHz and/or 2MHz clocks as well as a serial 10KHZ clock.

The ADCs require a 10MHz serial clock.

The DACs require a 25MHz clock

The transceivers require a 125MHz clock.

And the list goes on....

I have one "partition" that requires 6 clocks, all Globals, as well as 1 Global Reset/Preset.

Now with all of these Global resources,

How, at the top level, do I set up these so that all the "partitions" use the correct resources and not conflict? Again, please, extreme details of what to do are very welcome. (None of this is covered in the documentation.)

Altera_Forum · ‎09-16-2007

My third question:

In the Q2 v7.1 Handbook Volume 1 Chapter 2 page 2-31 bottom paragraph it clearly states

"you can include a top-level PLL in your lower-level project"

this to get more exact timing results.

How do you do that? Details PLEASE!

Altera_Forum · ‎09-16-2007

My fourth question:

In the Q2 v7.1 Handbook in the Bottom Up Flow

It is instructed to create a top-level project with "Black Box" files for each partition, go through other steps and eventuallycompile the design.

Then create the lower level partition, get it working as needed and then export it.

Next import the lower level partition .qxp file into the top-level AND...

and then compile the top-level?

You don't need the lower level source files in the top-level?

This is what I mean by lack of thorough documentation. The user is left hanging.

Altera_Forum · ‎09-16-2007

My fifth question:

In the Q2 v7.1 Handbook Volume 1 Chapter 2 Page 2-11 item 3 creating design partitions and using Logic Lock;

situation:

I have a conflict with applying Logic Lock to one particular partition in the design. Its the 1300+ LCELL ADC_DAC partition. It mostly runs at 10MHz with some of it at 25MHz and a small data stream interface synchronized and handshaked at 125MHz.

Due to device I/O pin settings (board is fabed so no changes) the location for this function is right between two high speed datastream partition locations. These two partitions will be locked in and are high speed pathways.

The ADC_DAC partition should be set as a partition to ease design inclusion but using Logic Lock on it will cause a pathway interdiction conflict. This function is low speed and I was not originally going to Logic Lock it.

How do I proceed?

Altera_Forum · ‎09-17-2007

Sounds like you've got quite a project going. I almost never use the bottom-up approach, as I think it's a pain and the pseudo-bottom up has fit all my needs and has some distinct advantages(a single project, and partitions that can be "aware" of what they connect to.)

For exporting top-level assignments, I'm not sure if there's an automated way for this, and I wouldn't completely trust it. Take something as straightforward as a global clock. If it comes in on the top-level on a pin call top_clk and you make a global assignment to top_clk. Then you have a lower-level project with a clock coming in called local_clk. There's no way that top_clock knows local_clk is the same and you would have to manually make the change. So talking on my own, I would just copy over any assignments that you think are relevant and work from there. This will probably make it an incremental approach, meaning you might not have a local clock assignment on a lower-level design file and it gets put onto a global. Then the IO timing changes when it goes to the top-level and it gets put onto the local you have it assigned to there. This shouldn't happen too often, and I think it will be faster than trying to get everything "right" the first time. (In my experience, every project is different, and the steps are different every time, which might be an indicator as to why you're not seeing straightforward directrions. There should probably be more, but step-by-step stuff probably just isn't possible).

For the .qxp import question, that's correct. If you import a .qxp file and use it, you no longer need the source files.

As for the final one, I would be wary. Why do you need to LogicLock/IC a 10MHz domain? There is a definite art to the whole flow, and if you're goal is to lock everything down, then I would be wary unless it's absolutely necessary and you're taking it from a high-level approach(i.e. very large LLRs that encompass a lot of the device).

Personally, I would do the pseudo-bottom up flow. Take your top-level project and make your partitions. Everything that is high-speed and has trouble meeting timing, set the partition to Source. Your low-speed stuff that is trivial, set to Empty. Close timing on the high-speed stuff and set those partitions to Post-Fit(Strict). (I do strict because I prefer to have complete control). Set the post-fit level to Placement if there's some margin, but if you're only meeting be a few picoseconds, then set to Placement and Routing. (Note that your timing can still change, due to loading on the clock trees, parasitic affects of stuff placed/routed nearby, etc.). Then go in and fit the lower-level stuff around it. You might not even have to LogicLock with this flow. (LLRins helps if you're subsequent fits will not do the whole flow but just modify what has changed, but LLRing can be a lot more difficult.)

Anyway, I've probably not explained everything and feel free to ask more. Good luck.

Altera_Forum · ‎09-17-2007

Rysc,

Thanks.

I use the bottom up approach only because that's how I've designed for years. Design the sub-functions (partitions) based upon design requirements and/or I/O device datasheets. When all the parts are done put it all together.

No automated way to do this huh?

Well I do declare the clocks in the top and lower levels as Globals or Regionals based on what I expect them to be in the end. If it changes I go back and change the setting(s) and recompile. I also use the exact same signal naming on all levels. My concern is with the Post-Fit(Strict) that when imported to the top level that different clocks will end up on the same, expected, resource wire and conflict. This I want to avoid up front.

I am glad to hear that the source code for the sub-partitions is not needed. Now that makes sense to keep placement and routing.

No I don't need Logic Lock for the 10MHz stuff. The steps to ID state to apply Logic Lock to the partitions after creating them, that is the problem. Well that and the size of the logic and (not previously stated) the large number of outputs to be included (via availability and handshaking) into the output data stream. Essentially this "monster" should be partitioned but I don't want to apply Logic Lock to it yet the direction say to.

I'm going to try your approach.

By the way I did learn one thing from one of my many Altera SRs. There is a way to actually set Global and Regional resources to the very resource line of the device. Its not documented (which I was told means it can change without notice) but it does (did?) work in v7.0 (but Logic Lock didn't the way it was documented). Are you interested in learning how?

Altera_Forum · ‎09-17-2007

Undocumented assignments are like candy! I'm interested; what have you got?

<earlier conversation> - I have used the keep VHDL atttribute to preclude Quaruts from changing logic details. I can't recall exactly what happened, but it did some optimization that resulted in a later stage giving a no-fit. the keep stopped the "optimization" and gave me exactly what the code inteneded, which fit fine.

Altera_Forum · ‎09-17-2007

Yep, any secrets are helpful.

You said you've been designing that way(bottom-up) for years, but I don't think that includes placement and routing. This works fine for the actual coding, but I find it can break down when you try to implement this strategy with place and route. The whole flow is just too interelated. For example, the placer normally has a wide open floorplan to work with, so it can move logic wherever it thinks best, and if it needs to swap with something else, it can. But if you take a large portion of logic that is placed throughout the chip and lock it down, then the other logic has to fit "between the cracks" of the first piece of logic, which is an exponentially more difficult problem. Now if the timing on the second piece of logic meets easily, or if you've got a lot of open space, it might work, but in timing critical stuff and full devices, it falls apart. This is why LogicLocking is often recommended, in that it keeps everything in a nice rectangle and subsequent fits on partitions work within their wide-open rectangle. This is why LLRing is recommended, but not required(and I imagine why writing directions can be so painful.)

The router has somewhat similar restrictions, which is why I recommend doing a post-fit placement only. I do routing only if I absolutely have to(i.e. a 250MHz DDR2 design I'm working on, where even a slightly different route can affect overall timing).

Clocks, I believe, behave nicely in the LLR flow because their source changes as you work your way up. For example, if you have a sub-block, it might meet timing when the clock comes in on a port and feeds a global. When you move up to the top-level, that clock port no longer exists and that route no longer exists, but you do have a PLL at the top-level that drives all the post-fit placement logic. And since global clock timing is pretty consistent(loading may have a secondary affect), you should be all right. You probably will want to do constraints like Global/Regional/Dual-Regional at least, to get the right type, even if the source, or the specific global net, changes. (Although it sounds like you have a way to explicitly say which assignment you're using.)

By the way, what's your end goal? If you're trying to meet timing, then I would work on those partitions first and then move onto the other stuff(whether you're LLRing or not). If timing can be met easily and you want to reduce compile times, then you will probably have to LLR and do partitions(what size device, how full, and how long are your compiles? I've seen users have different expectations on this). My guess is your main concern is closing timing.

Altera_Forum · ‎09-17-2007

Randal and Rysc,

Ohww, the secrets are coming.

Okay, now keep in mind a couple/few things;

1. As pointed out to me undocumented capabilities can be changed or removed at any time.

2. This worked for me in Q2 v7.0. Although Logic Lock, the old way with back anno., didn't work well this part DID!

3. You've really got to know the available Global and/or Regional resources available as well as their distribution withing the chip. You must plan ahead for this.

Example; Stratix2 Device Handbook Volume 2 pages 1-62 and 1-63 Figures 1-39 and especially 1-40 has the kind of information one needs to start this.

4. What worked best from it was when I imported the back-annoed LL region these Global settings came in successfully to the top level.

Okay.

'To start with after, at least, an Analysis and Elaboration I always go into Assignment -> Settings -> Timing Analysis Settings -> Classic Timing Analyzer Settings -> Individual Clocks and describe all the signals that will be used as clocks.

Next I'll go into the Assignment Editor and load those same signals into the To column. I also get the outputs of any PLLs here as well and any Global Resets and/or Preset signals. In the Assignment Name column set the Global Signal setting. In the Value column set the Global, Regional, Dual-regional, On, etc.

Now both the above steps you perform in both the top-level and lower-level functions. The lower-level you only need those resources your design is using but keep the declarations the same in both levels.

Next will be a Full-Compilation. Lets say you just do this in the lower-level for now.

After the Full-Compilation go into the Assignment Editor again. In the To column do a Post-Compilation node search for the signals you declared as Global Signals.

Here's the interesting part: what you want to choose is the signal that you're after but with a ~clkctrl (or is it ~clkctrl_g ?) (for Global) or ~clkctrl_r (for Regional) and select them!

(Now there is also a ~clkctrl_d, and a ~clkctrl_f as well but exactly what to set for a Dual-Regional resource I could not get an answer to and Stratix2's don't have the fast resources so I don't know about them as well.)

Do note that those Resets and/or Presets will also have the ~clkctrl.

Anyhow choose those signals inot the To column.

In the Assignment Name column select the Location setting.

In the Value column type in (capital letters) CLKCTRL_G# where# is the number of the Global resource you want this signal distributed on. For Regionals CLKCTRL_R# . As I said I don't know what to really enter for _D and _F.

Example; In the Floorplanner I see the 500Mhz input clock signal coming in as LVDS on pins T1 and T2 in my EP2S601020 device. That dedicated clock input can go directly to Global clock resources 7 or 8. I choose G8. So in the Value column I set CLKCTRL_G8

Save and full recompile. Now in v7.0 after the compile the signal would be marked with a diamond and "?". I was specifically told that that labeling was a software bug so don't worry it.

Doing the old way of Logic Lock these settings did import into the top-level successfully. So well that different lower-level functions done this way and using the same signals all connect to the same Global resource at the top level as viewed with the Floorplanner after post top-level Full-Compile. If there are issues open the Assignment Editor at the top-level and look through to make sure all the ~clkctrl settings for a specific signal are the same. If not then correct it there and compile again.

Where my design failed was, by compilation error and SR feedback on it, the use of Logic Lock with back annotation. This stuff here was okay.

I don't know if this works yet in v7.1 and with ID but I'll be finding out.

Have fun guys!

Altera_Forum · ‎09-17-2007

Cool, thanks. Note that I generally don't recommend that type of control. Quartus usually does a very good job creating globals, and if it does something I don't like, either assigning the driver to a global/regional or to off(which takes it off the global), will handle the case and is very easy. But this is for a full design. I understand when doing a bottom up approach there may be a need to be more explicit. (I also understand how user's like to be able to control the globals, which makes sense, just usually not necessary.) But it's always good to have another tool if/when it's necessary. Thanks.

Altera_Forum · ‎09-18-2007

Yeah, this can be a rather strict way of controlling Globals. The compiler does do a good job but when reading about the resource divvying at the top level in ID and my need to keep the routing of the lower level partitions this looks like the answer I was after. As I said I did get this via an Altera SR but few there seem to know about it also.

This is possibly over kill but just how the software chooses which Globals go to whom after bringing in partitions to the top level is not discussed in the ID docs. Basically the question is how do you know the correct connection was made?

Through experimentation with this technique, using it in some lower levels and not in others, it is very possible that a clock signal at the top will not be connected to the lower level clock. This despite using the same signal name and same Assignments settings through out. I saw that using this technique did guarantee that that did not happen.

I wish my design didn't need so many different clocks but that wasn't my choice.

Altera_Forum · ‎09-18-2007

Rysc,

In answer to some of the stuff you mentioned of asked two positings ago;

Bottom-up design; You're right. My previous work never included placement and routing of lower level designs into higher levels like this before. This is new to me and is due to some of the design requirements. (A question you ask and I'll answer further below.)

Part of my history with this, and large success, was a design I did some 9 years ago. The FPGA, a 10K50 then, had to interface on the PC104/ISA bus. To attain the fastest speed there it mapped into the available memory addressing area but was not memory. The problem occurred when the Real Time Operating System started up and the booting went to find and test all available memory. When it hit the FPGA addresses the tests failed and the system hung. The quick and easy solution was to wait 10 seconds after the FPGA programmed before responding to any accesses on the bus. But try running a top level simulation that includes a 10 second waiting period at the start. (Yeah there are ways around that and I did use them but never was it a thorough simulation.) Much to my relief was that when my post Place And Route simulations of the lower level functions worked they functioned correctly in the top level device without first simulating it.

Now this was MaxPlus2 using the builtin simulator designing to a 10K50 with the highest speed clock at 10 or 20MHz so I got away with a lot back then but this is part of my history.

Today is very different, Quartus2, full ModelSim for simulation, Stratix2's and pushing the speed envelope for all we can get.

Clocks; Using the technique for grabbing Global resources actually leads to very interesting results. One of the things I've seen is that in the lower-level if you grab the resource you are going to use at the top-level the routing delay from that resource to the function will be the same. Now if you don't include the source of the Global resource the compiler will automatically attach it to a pin. However when you import the lower-level to the top it will drop the pin source and connect it to the real source. So what? Well, if your Global resource IS from a pin AND you include that in the lower-level then the delay of the input pin to the chosen and set Global resource to the logic using it will be the same in both the lower-level AND top-level. Hows that?

Goal;

Well this design, as I said , is going into a Stratix2. The internal routing of the Stratix devices is quite different than all the earlier families and some of the earlier/smaller Cyclones. The only things that are routed fully across the device are the Global Globals. This does mean that if you want a data bus of signals to flow through from left to right and match the timing of a Global clock you've got to do a little pipelining. This is understood.

So goals,

1. To get the routing of the lower-level functions to meet their individual timing constraints , lock that down and then import that locked placement and routing into the top-level. In other words preserve timing.

2. When recompiling the top-level to only change those lower-level partitions that have been changed, and imported, and not change any placement or routing of the unchanged lower-level partitions. Is this saying preserve timing again? It will also reduce compile times (but I don't really care about that).

3. With the earlier stated issue about pipelining requirements; It is recognized that as the main data stream, clocked at 125MHz, flows across the device from one partition to another it will cross a number of routing pathways. These crossings can result in the data no longer lining up with the Global clock. By locking down the partition's placement and routing and post PAR simulating and viewing the delays it can be more easily determined when and where to place locked down sets of pipeline registers to realign the data with the clock. I don't see how to do this relyably otherwise.

Altera_Forum · ‎09-18-2007

Say,

Does anybody know how to do what's asked in my third question?

Altera_Forum · ‎09-28-2007

Does anybody know the answer to my original third question?