Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
20706 Discussions

Could LUT in stratix support Register mode?

Altera_Forum
Honored Contributor II
1,325 Views

Recently I read a article, it says that Xilinx LUT could run in SRL mode ,but Altera not. Is it True? 

Shift Register LUT 

A LUT in shift register mode (SRL) can implement a selectable 16-bit shift register in a single LUT. The same shift register in a Stratix device would be implemented using 16 flip-flops and as many as 10 LUTs or a memory block, a much less flexible manner.  

 

In a Stratix PLD, if the shift register cannot be implemented in a memory block, a 16-bit shift register implemented using 16 LEs creates added routing congestion that may impact design performance. If the shift register requires variable tap selection, this will add logic levels on the output path, resulting in much slower operation.  

"
0 Kudos
7 Replies
Altera_Forum
Honored Contributor II
459 Views

Absolutely. I think I posted about this on another thread. It's a strange phenomenon, as I think the SRL looks very cool, and in certain designs would give X parts a big area advantage. But time and time again, I see SRLs used in groups, which then get put into memories automatically by Quartus, and we end up being even, and in some cases smaller. I've sure there are some designs where this ends up being a true advantage, I just have never encountered them or heard about them and for the most part consider it a non-issue(besides perception...)

0 Kudos
Altera_Forum
Honored Contributor II
459 Views

The SRL-LUT shows its real advantage in pipelined DSP designs. 

 

If the design needs to tap into the shift along the way, then it begins to break down. 

 

If the user is very cleaver, then they can make the SRL-LUT do very size efficient things that most synthesizers will never think to realize. 

 

I have done some interesting things with them in the past. 

Avatar.
0 Kudos
Altera_Forum
Honored Contributor II
459 Views

Even with DSP designs, which use shift-registers all over the place, this tends to not be a problem. The big thing is that they're not used on lone bits, but usually busses. Quartus can also combine shift-registers that aren't explicitly together in your code. So, for example, if you have an 8-bit long shift-register on a 16-bit bus, that's 16 SRL-LUTs. In Altera, it's a read and write pointer(both 3 LCs long) and a small memory. If the shift-register goes longer than 16 bits, then the SRL-LUTs double, while the pointers only add 2 logic cells each.  

If you're doing hand-coded stuff with the dynamic SRL LUT(which I seldom see, users generally just infer SRL LUTs as static shift registers), then you can get a big gain since that doesn't easily go into the memory(but can be done elegantly, especially if the length is a power of 2).  

It's a tough topic when there is no existing design. I personally think users often over-value this feature, but in some designs it's definitely helpful. So the ideal situation is to evaluate it on a real design, rather than talking in generalities.
0 Kudos
Altera_Forum
Honored Contributor II
459 Views

Thanks Avatar and Rysc. 

In my design,the shifter should be 1bit wide , 1024 long. 1023 taps need, tap distance is 1. The clock is 250Mhz. 

 

I have tried altshift_taps, but the minimum distance between taps is 3.
0 Kudos
Altera_Forum
Honored Contributor II
459 Views

What part are you targeting? Yes, this is where it's generally easier to use the SRL LUT, and where I can eat some crow. That being said, I think you can still build something smaller like so: 

1) Create a 1Kx1 RAM with the output registers on(for performance) 

2) Create a free-running 10 bit counter that powers up to 1(basically have your asynchronous clear reset it to this value.) This will be your write pointer, and your shift register will always write to this value. So the writes will occur like so: 

SR bit : Memory Location 

0 : 1 

1 : 2 

2 : 3 

3 : 4 

4 : 5 

etc 

3) Whatever tap you want to pull off will be your read address. So if you tap address 4, it will take one cycle to access the memory, and what was in that location(data bit 3) will have been shifted once, i.e. it will now be the 4th bit. 

4) You may need to have a bypass register that is always being written to. It is only read if your tap value is 1.  

5) Finally, this is without the memory output registered, whcih will be slightly slower since memory accesses are slower. I don't know what device, speed grade, and logic this SR is feeding, so I don't know if this is necessary for 250MHz performance. If it is, then edit the memory, turn on the output registers, and change your write pointer to begin writing at memory location 2. You will then need two bypass registers, for when you tap 1 and 2. 

You'll probably want to throw down a quick simulation, as this is off the top of my head and I may be off on something. Hopefully the whole thing doesn't take more than an hour to code up and simulate. The net result is that it should take ~12 logic cells and a memory(assuming your tap select is already encoded).
0 Kudos
Altera_Forum
Honored Contributor II
459 Views

By the way, your implementation in X with SRLUts would be 64 LUTs to do the dynamic shift-registers, and I think you'll also need a 64:1 mux to choose which one of those individual SR-LUTs you're using(which will add logic, but perhaps more importantly, will add a long delay that might stop you from running at 250MHz). This implementation, assuming my thinking is correct, should be 1 memory block, <15 logic cells, and should run at 250MHz easily.

0 Kudos
Altera_Forum
Honored Contributor II
459 Views

Hi Rysc, 

My target device is Stratix III Ep3SL50F1152C4. 

Your comments is of great help to me. I try it. Thank you.
0 Kudos
Reply