Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
17255 Discussions

Dedicated Logic Registers

Altera_Forum
Honored Contributor II
5,310 Views

Hi 

 

Can anyone help with an accurate description of what "Dedicated Logic Regsisters" (DLRs) are (In the context of Stratix II and Stratix III) ? 

 

I know the concept is this: 

 

ALMs contain two registers and two look up tables (ALUTs) (yes and two adders as well) 

 

In principle, the ALUT and Register can be used independently (STXIII handbook figure 2-6 shows inputs dataE and dataF going to the registers)  

 

But in my designs, I typically end up with about the same number of ALUTs as registers (about 160,000 of each) and 95%+ of the registers marked as "Dedicated Logic Registers", i.e. ones where the ALM is _not_ going to be used. 

 

This means I end up with about 50% utilization of the device :( 

 

This is a big deal for me as I am already using EP3SL340's, so can't fork out for a bigger device. 

 

I've been through all the design advisers, changed muxes, optimize for area, etc etc., none of which makes much difference. 

 

So any ideas on what makes it so keen to mark registers as "un-shareable" (i.e DLRs) ?
0 Kudos
7 Replies
Altera_Forum
Honored Contributor II
4,017 Views

Hi, 

 

You already answered yourself correctly. The words "dedciated logic register" only pops up from compiler report. I have never heard of it in the data sheet and never as DLR. 

 

Your assumption that (when their use is reported means then their ALUT are not used) is not right. The compiler simply reports them separately then reports on total registers which is slightly higher. 

 

Your problem should be put as "How do I fit the design in the device?" Or "How do I reduce resource". I think you need to put a lot more info here to the forum. 

 

Kaz
0 Kudos
Altera_Forum
Honored Contributor II
4,017 Views

Dedicated Logic Registers refer to the two registers in the ALM(they're dedicated to being registers and nothing else). Note that in SIII the ALM can actually be configured as a register. Synthesis does this as last resort, but I'm working on a DSP design full of registers and it's proving useful. 

In the .tan.rpt, study the Fitter Resource Usage Summary. Heck, post it here and I can probably point some things out. The great difficulty with an adaptable LUT and with adaptable fitting(pack unrelated registers with logic, for example), is that synthesis and the fitter can make decisions that are right in context, but don't answer the "how full am I?". For example, if you have a lot of lone registers(no LUT before them in the RTL), they will often use half of an ALM and not get packed with logic, so the design looks almost completely full, but as you add more logic, the fitter will pack these in more tightly(which may hurt performance some), but suddenly a device that looked 100% full can handle more logic. I don't have a lot of experience with the 340, but the largest device may have more difficulty with other thinkg like routability. 

One thing to look for in the report are Combinational ALUT/register pairs used in Final Placement / Register Only. If a LUT directly feeds a register, the fitter will almost always pack them together, so these usually represent the cases where it does not. If you have a lot of these, it's oftern worth investigating. Go to the hierarchy browser in the top left window, right click on the top and Customize Columns so you can see Register Only Register/ALUT pairs. Then dive into the hierarchy looking for a culprit. If they're spread throughout, then it's part of your design, but if they're bulked in a hierarchy, most like a RAM or shift register is not getting put into memory blocks and you want to isolate that hierarchy and figure out why and if you can change the code to fix it. 

One final thing is that register only does not always mean it's a single register that isn't doing anything. Since registers have control signals, they may be valid. A common example would be something like a 64-bit 1:8 demux, that feeds 512 registers. Rather than use the LUT of all 512 registers, synthesis will create an enable for all 8 possibilities that feeds the clock enables of these registers. So the design would be 8 LUTs used as clock enables, and 512 registers with no LUT driving their D input(the inputs directly drive their D input, and the clock enable controls when the logic feeds them). End result may look like 512 lone registers, but they're really doing logic.
0 Kudos
Altera_Forum
Honored Contributor II
4,017 Views

Ok, here's a sample fitter report: 

 

 

Fitter Status : Successful - Tue Nov 4 17:23:12 2008 

Quartus II Version : 8.0 Build 215 05/29/2008 SJ Full Version 

Revision Name : hs3402_bld 

Top-level Entity Name : fpga_HS 

Family : Stratix III 

Device : EP3SL340H1152C3 

Timing Models : Preliminary 

Logic utilization : 94 % 

Combinational ALUTs : 182,570 / 270,400 ( 68 % ) 

Memory ALUTs : 348 / 135,200 ( < 1 % ) 

Dedicated logic registers : 158,746 / 270,400 ( 59 % ) 

Total registers : 159129 

Total pins : 546 / 744 ( 73 % ) 

Total virtual pins : 0 

Total block memory bits : 13,228,261 / 16,662,528 ( 79 % ) 

DSP block 18-bit elements : 0 / 576 ( 0 % ) 

Total PLLs : 5 / 8 ( 63 % ) 

Total DLLs : 0 / 4 ( 0 % ) 

 

 

 

Logic utilization ; 261,382 / 270,400 ( 97 % )  

-- ALUT/register pairs used ; 253078  

-- Combinational with no register ; 94335  

-- Register only ; 70042  

-- Combinational with a register ; 88701  

-- ALUT/register pairs unavailable ; 8304  

;  

Total registers* ; 159,126 / 274,688 ( 58 % )  

-- Dedicated logic registers ; 158,743 / 270,400 ( 59 % )  

-- I/O registers ; 383 / 4,288 ( 9 % )  

-- LUT_REGs ; 0  

;  

ALMs: partially or completely used ; 134,871 / 135,200 ( 100 % ) 

-- Logic ; 134,687 / 134,871 ( 100 % ) 

-- Memory ; 184 / 134,871 ( < 1 % )  

;  

Total LABs: partially or completely used ; 13,520 / 13,520 ( 100 % )  

-- Logic LABs ; 13,480 / 13,520 ( 100 % )  

-- Memory LABs ; 40 / 13,520 ( < 1 % )  

 

Logic utilization of 94% is based on EAB usage. From what I can conclude, this is (ALUTS + DLR's) / 20, and is limited to 13500 for a 340 device 

 

What this fit report shows is that a substantial number of registers in the device cannot share with an ALUT (70k). Any ideas on what causes these ?  

 

There are other questions here: The summary says 94% logic utilization, but 100% of ALMs are in use. The other thing I'm interested in is the 8000 "unavailable" ALUT / register pairs. What could be causing this ? 

 

Thanks to the two folks so quickly answering so far.
0 Kudos
Altera_Forum
Honored Contributor II
4,017 Views

The report is not sayin gthe 70K registers can't be packed with the logic, it's saying that it chose not to for a better fit. Basically those 70K registers are most likely not driven directly by a LUT, so there is no reason to place them together by the fitter when there are other locations that are open and give the fitter more flexibility and probably ease the router. If you put more logic in, I expect a lot of those registers will start to get packed with the LUTs. (Look in the messages of the .fit.rpt for "register pack" or something like that, and you should see a message about the algorithm being run and if it packed a lot of unrelated logic. 

So you do have room, but I also wouldn't assume all the registers can be packed with all the combinatorial with no register locations. The first thing I would check is from my previous post, where all of these lone registers exist in your design and if they're really necessary. If they could be replaced by a few memories, that would be a huge reduction. 

Another thing to try is fit again with Assignments -> Settings -> Fitter -> More Settings -> Auto Packed Registers set to Minimize Area (with Chains although I don't see a difference usually). This is a better glimpse of it trying to pack things tightly, but with open space, I think the fitter will still unpack things as that will improve performance and routability. Another option would be to throw down a LogicLock Region and put the whole design in it, so the fitter is forced to a tighter area. I'm not a huge fan of that, since it's not really what happens when you add more logic. 

As for unavailable ALUTs, that's due to the adaptive ALUT. For example, if a 5-input LUT is an ALM and that's it, the other side is considered available although only a 3-input LUT(or 4 sharing a signal from the 5) can go into it. That's fine since most designs have tons of 3-input LUTs. If a 7-input LUT is in the ALM, the other half is considered "unavailable", since very few things could be put into it. The tricky one is if a 6-input LUT is in an ALM and nothing else. Can logic still go into the other half? Yes, but a limited subset, so they just take a percentage of these cases and say they are available, while the majority are unavailable. 

As you can see, counting resources is a very tricky topic, but as you dive into it you'll realize it has to be tricky because it's not as black-and-white as the architectures with just a single 4-input or 6-input LUT. (But there's a lot of powe to being adaptive...)
0 Kudos
Altera_Forum
Honored Contributor II
4,017 Views

Good theory, but not what I observe. 

If I add _any_ more logic, I get a no fit. Our of EAB's in the fitter
0 Kudos
Altera_Forum
Honored Contributor II
4,017 Views

You didn't put the memory section of the report. How many M144Ks and M9Ks? There's little flexibility with those, so if you go over, that's that. You can try re-targeting M9Ks to Memory LABs, or bunch many into an M144K and cycle share them or something like that, but you really need to look at the RAM Summary Report(part of fit report) and see what's being targeted to what. You're using 79% of the memory bits already, which is pretty high considering most designs underutilize something(usually the M144Ks are partially used). It's definitely possible to use 100%, but you need to craft your design to make it work.

0 Kudos
Altera_Forum
Honored Contributor II
4,017 Views

 

--- Quote Start ---  

Good theory, but not what I observe. 

If I add _any_ more logic, I get a no fit. Our of EAB's in the fitter 

--- Quote End ---  

 

 

Hi, 

 

I think you are really at the edge of your FPGA capacity. maybe it is better to look for opportunities to reduce the design size. What synthesis settings do you use ? Design partitions ? Is there no arithmetic function in your design, because no DSP block is used ?
0 Kudos
Reply