FPGA Intellectual Property
PCI Express*, Networking and Connectivity, Memory Interfaces, DSP IP, and Video IP
6343 Discussions

RAM Based Shift Register (ALTSHIFT_TAPS) : Distance between taps equal to 1?

Altera_Forum
Honored Contributor II
1,854 Views

Is it possible to modify the ALTSHIFT_TAPS code to create a RAM based shift register where the distance between the taps is 1?

0 Kudos
10 Replies
Altera_Forum
Honored Contributor II
726 Views

No, I believe the minimum distance is 3. The read and write pointers have a spacing in their logic where they don't read and write the same location right after each other. In theory it might be do-able but it would be extremely inefficient. For example, let's say you had an 8bit wide SR and using a RAM with a 20-bit output. As soon as you had 3 taps, you would need to use another RAM since that would require 24 outputs. So the RAM could only hold a 2-bit deep shift register if there are taps at every point. You might as well build it out of registers. (Altshift_taps gets its efficiency as a shift register by not needing to read intermediate tap points very often.) 

Plus it would be slow, since you couldn't use the output register of the RAM(which would add another delay cycle), so you'd have the large memory access time to deal with.
0 Kudos
Altera_Forum
Honored Contributor II
726 Views

 

--- Quote Start ---  

Is it possible to modify the ALTSHIFT_TAPS code to create a RAM based shift register where the distance between the taps is 1? 

--- Quote End ---  

 

 

I did one design setting tap distance to 3 but clocking ram three times faster than data rate. It worked.
0 Kudos
Altera_Forum
Honored Contributor II
726 Views

 

--- Quote Start ---  

I did one design setting tap distance to 3 but clocking ram three times faster than data rate. It worked. 

--- Quote End ---  

 

 

That's an interesting idea Kaz! Although my application is clocked at a pretty high frequency so I'm not sure if that will work for me in particular.  

 

Essentially I am trying to use memory resources on my board instead of logic resources.  

 

Rysc, thank you for the example. I will have to look at the details for the RAM and see what it's limitations are.  

 

Is there some way I can code a shift register in VHDL such that Quartus will synthesize it using on-chip memory? I am a bit of a beginner when it comes to using memory for digital design so I apologize if some questions are obvious.
0 Kudos
Altera_Forum
Honored Contributor II
726 Views

I'm not sure how overclocking helps, shift-registers are generally used to hold the data until X cycles later, so if you clock at three times the speed, your shift register grows to 3x, and you still have the limitation of how many output ports there are in a RAM. 

As for inferring altshift_taps, Quartus synthesis is really good at this. Just about any shift-register(without lots of taps) gets inferred. A very common request I get is how to disable it, as it goes overboard and uses too much memory. (Assignments -> Settings -> Compilation -> More Analysis & Synthesis -> Auto Shift Register Replacement = Off. This can also be applied to specific hierarchies in the Assignment Editor.)
0 Kudos
Altera_Forum
Honored Contributor II
726 Views

 

--- Quote Start ---  

I'm not sure how overclocking helps, shift-registers are generally used to hold the data until X cycles later, so if you clock at three times the speed, your shift register grows to 3x, and you still have the limitation of how many output ports there are in a RAM. 

As for inferring altshift_taps, Quartus synthesis is really good at this. Just about any shift-register(without lots of taps) gets inferred. A very common request I get is how to disable it, as it goes overboard and uses too much memory. (Assignments -> Settings -> Compilation -> More Analysis & Synthesis -> Auto Shift Register Replacement = Off. This can also be applied to specific hierarchies in the Assignment Editor.) 

--- Quote End ---  

 

 

And it can also merge shift registers in bizarre ways. ive see timequest report timing violations between two registers in blocks of code that should have been completly unrelated. The only way to fix it was turn auto shift register recognition off for this particular shift register!
0 Kudos
Altera_Forum
Honored Contributor II
726 Views

Rysc, now that I think about it a little more, clocking at a higher speed won't solve my issue unless I perhaps put in garbage data every 2 clock cycles. I'll spend some time considering it. Are you familiar with a technique where I can have simultaneous access to every element in a 2D array using memory elements? I am able to do this if I synthesize an array of registers but my arrays can get pretty big so that uses a lot of logic resources.  

 

The question I just asked is the crux of my problem. Should I try to ask that in a different section of the forum since it may better belong somewhere else?
0 Kudos
Altera_Forum
Honored Contributor II
726 Views

If you clock at 3x the speed but put in garbage data every 2 cycles, I think you're right back to where you started, just more complicated. I think you're trying to get around the altshift_taps limitation of requiring 3 spaces between taps, but an altshift_taps is nothing more than a memory with a free running write-pointer and read pointer that are offset by length of the shift registers. That's physically what you're dealing with, and your limitation is the number of outputs of the RAM. 

 

For Tricky's comment, Quartus used to merge shift registers across hierarchies, so if you had two shift registers that were 8 bits wide and 36 bits deep and same clock, we might merge them into a single RAM. This pulls two pieces of logic that might be spread far apart by the fitter into the same RAM location. We no longer merge shift registers across hierarchies(there is a setting to turn this back on, but by default it is off). 

 

mzivkovic, no there is no way to access every element in a memory, as once again you're limited to the size of the output bus. If your RAM is 256x8, are you saying you need 256x8 =4K outputs? Is each output a random location from the RAM(i.e. you would have 256 addresses), or is each output reading from a fixed location. 

 

Overclocking does work in getting more outputs, i.e. you could read a RAM at 2x/3x or whatever clock rates you can manage. If your width isn't as wide as the RAM, then true dual port will let you do two simultaneous reads per clock cycle. These all help, but may not be near close enough to get what you want, although I don't fully understand what you're doing.
0 Kudos
Altera_Forum
Honored Contributor II
726 Views

I had a design with data running at 122.88Msps. I needed to move shift to ram to save logic. I increased ram depth by 3 and clocked it at 368.64Mhz. The input was updated every 3 clocks and written to ram at 368.64. The read was also at 368.64 but data updated at 122.88. It is working in several thousand customer field units since 2011. 

 

as such the ram write/read rate is fast but only updated to logic 1 in 3, not complicated at all provided you can achieve timing.
0 Kudos
Altera_Forum
Honored Contributor II
726 Views

Kaz, 

Sorry, I'm not saying it won't work, and it actually seems more applicable to what the original poster wants, as running 3x the rate gets 3x the data out per base cycle. When I think of a shift register I think of the user wanting to delay the data N cycles. Running at three times the rate means the shift register has to be 3 times longer(and you have to feed data in 3 times faster), which is different than the common shift register usage, but I'm sure it has applications.
0 Kudos
Altera_Forum
Honored Contributor II
726 Views

Sorry for not being so clear, I'll try better to explain. I am trying to implement a "sliding window" for an image processing application. Right now I have a register instantiated for each element in my window. For small windows of 9x9 or less this isn't an issue. When I want to create larger windows, the number of registers needed takes up a lot of logic resources. Each output would have a fixed location.  

 

I was confused because your example is for a single memory resource and I think I understand what you are saying. However, when you use multiple memory blocks such as this example 

 

-8 bit shiftin 

-32 taps 

-tap distance of 3 

 

the resource usage is 4 lut + 7 M10K + 4 reg. This would represent one row in my array if my row was for example 32 elements wide. 

 

Although I need to give it more thought on how multiple memory blocks are linked together to create this shift register, it appears that using a higher clocked frequency as kaz suggested should work. I just need to make sure I have enough memory blocks to complete my array.
0 Kudos
Reply