Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
20803 Discussions

Making smart FIFO ?!

Altera_Forum
Honored Contributor II
1,809 Views

Hello all, 

 

First, I wish you an happy new year! 

 

I need some help. For my project, I need to store data coming from external memory (DDR2) to internal memory (MRAM or M4K blocks). 

 

I have a 256 bits width interface with my DDR2 chips. With every single frame, I generate a 256 bits word who is anded with my DDR2 data bits. (it always mask the start or the end of my data bits, never the middle but width of mask change dynamically). 

 

I need to store this new data word in a ram memory, but only data bits who are not masked! 

 

But if I use a standard fifo, I must write data which have always the same bus width. Or datas what I have to write have different bus width. 

 

The first idea i'm thinking is to put my datas in a shift register (parallel to serial converter) and after write bit after bit in my internal ram. But in this case, I need too many cycle of clock to write all my data (max 256 clocks). It's too long. 

 

Ideally, I need a FIFO in which I can write datas from different width (at every clock rising edge) 

I think it's not possible. Isn't it ? 

 

Could you please help me to resolve my problem ? 

Have you got some ideas or just some hints which can help me ? 

Any help would be appreciate. Thank you very much! 

 

See you. 

 

Fabrice.
0 Kudos
9 Replies
Altera_Forum
Honored Contributor II
1,080 Views

You can't dynamically change the width of a memory, FIFO or otherwise. 

 

Do you just need to know which bits are valid when you read the data out of the internal memory? If so, then store a record of the LSB and MSB bit positions of the valid data at the same time you store the data. You can store the LSB and MSB tags in the same memory with the data if you make the memory 2 bytes wider. 

 

Depending on what you do when you read the FIFO, it might be easier to store the mask with the data. If the FIFO is shallow, then making the memory 256 bits wider to store the mask might not use too many RAM resources. This would be easier than calculating the LSB and MSB bit positions to create LSB and MSB tags to write to the FIFO, but LSB and MSB tags might be easier to work with when you read the FIFO unless you can simply keep a 256-bit data path on the read side and mask the data after you read the data and its mask from the FIFO.
0 Kudos
Altera_Forum
Honored Contributor II
1,080 Views

Interesting. Do you have another FIFO that stores how wide each word is? (Specifically, if you write in words of widths 32, 6, 78, 133, then how do you know to read out 32 the first time, 6 on the next read, then 78 and then 133? 

Anyway, you're limited by the memory as much as the FIFO. You can't dynamically change your write widths on the memory, so there's no FIFO that can take advantage of it.  

The only thing I can think of is to have an "aggregator" block before the FIFO and the reverse after it, whereby you fill up 256 bits of however many words and then do a write. On the reverse side you read out 256 at a time and then however you know the widths, use that to strip out what you need. Of course this could theoretically add a lot of delay, i.e. if you're writing 256 words all of length 1, then it will require 256 writes before that data gets into the FIFO and you can access the first bit. You could add logic that looks at your aggregator when the FIFO is empty, but it gets more complicated.
0 Kudos
Altera_Forum
Honored Contributor II
1,080 Views

Hi, 

 

Thanks for your answers brad and rysc. 

 

1/ First idea : Make a fifo who strore data and his corresponding mask bits. Good idea but it may take up much place and it will be more complicated for me to read out easily my data... (check below to have more explanation of my issue) 

 

2/ Second idea : Make an "aggregator" in order to wait to always have 256 bits to write in the FIFO. I think it's a very good idea but I don't know how to do this... 

 

 

Well, I will give more information about my problem. 

The function I need to realise is just like a texture mapping in image processing. 

 

In my DDR2 memory, I have a very big size image (it's my texture image). 

I need to extract a little image in this big image according to a position coordinate. 

 

This coordinate can point to any bit in the big image and so, can point anywhere in the DDR address range. Obviously, it will point almost never to the first data bit of a DDR address... 

 

From the DDR address who contains the first pointed bit, I can extract multiple blocks of 256 bits length. But first bits of the first data block will be useless (because they are before the correct pointed bit), and last bits of all last data blocks (at the end of each line of my little image) will be certainly useless too. 

 

For every data block, I know which bits are included in the image to extract and which are not. But I don't know how to make an aggegrator to accumulate 256 correct bits before making a write operation in the fifo buffer. 

 

Please, could you give me some tips to achieve my goal ?? 

How would you do that ? 

 

Thank you very much in advance for your help. 

 

See you. 

 

Fabrice.
0 Kudos
Altera_Forum
Honored Contributor II
1,080 Views

So if I have it right, only the first and last words are less than 256 bits? Everything else will be the correct size? How deep is the FIFO? If only 2 of the words aren't full, that's still pretty efficient on bit usage(unless it's extremely shallow). Or is this not correct? (FYI, after you reply I probably wont respond as I'm leaving on vacation. Good luck.)

0 Kudos
Altera_Forum
Honored Contributor II
1,080 Views

Yes, that's exactly correct. You are right. 

My FIFO need to store an image of 3000 * 2000 bits size, so about 23000 words deep. 

 

Every line of the image to extract is maximum 3000 bits lentgh and minimum 1024 bits length. so I need 4 to 12 consecutive read in my DDR2 memory to have all the line (and some useless bits!). 

 

"If only 2 of the words aren't full, that's still pretty efficient on bit usage(unless it's extremely shallow)." 

 

I doesn't understand very well what you mean. Please, before your vacation (lucky man ;-) ), could you give me more explanation about your solution. 

 

Thank you very much Rysc! And have nice holliday! 

 

See you. 

 

Fabrice.
0 Kudos
Altera_Forum
Honored Contributor II
1,080 Views

An internal FIFO that stores 256x23000? Are you targeting a device with enough memory? 

All I'm saying is that if you're data is chunked up into 256 bit segments, but the first word has unused lower bits(i.e. bits 0-143 might not have data), and some unused(don't care) bits on the last word, then you're still utilizing the memory bits very well, as everything in between will use all 256 bits of each word. 

I don't think that's what you're asking though. Are you asking, when the first bit of data comes out, do you want to align it to the 0 bit? I guess I'm not following the problem.
0 Kudos
Altera_Forum
Honored Contributor II
1,080 Views

My FPGA is Stratix II GX 90 with more than 5 millions of bits of memory but I think I will make a circular buffer because I don't need to have all the image datas at the same time. I can have just one piece, make image processing on this piece, then go processing the next piece and so on. 

 

Actually, just like you said, I want to align my first useful data bit receive to 0 in my FIFO and then put all other datas after that, except last data bits from every end of line. 

 

Thank you for all. Nice holliday! 

If someone else can help me, it will be appreciate. 

 

Fabrice.
0 Kudos
Altera_Forum
Honored Contributor II
1,080 Views

So you want a variable shifter, that can do up to a 255 bit shift? This isn't one of those things that is trivial as far as resources. At the most basic level, a variable index would work, but you'll end up with 256 individual 255:1 muxes. It's big and slow. 

For speed improvements, you can pipeline it, i.e. do stages of 4:1 muxes, i.e. shift by 128 and/or 64 bits and then register. Then shift by 32 and/or 16 bits and then register. Etc. This is faster but takes up resources. Another idea is to run it at a faster clock rate(or a slower data rate). This way you just need 128:1 muxes and run it twice.  

 

Not sure if anyone else has a good idea. You might want to re-post(as many won't make it this far) that you are looking for the best way to do a variable shift(up to 255 bits) on a 256/512 bit word.
0 Kudos
Altera_Forum
Honored Contributor II
1,080 Views

That sounds like a barrel shifter. It might not be the most efficient implementation, but Altera has the lpm_clshift megafunction for that. I don't see a pipelining option in the MegaWizard or on-line help, but maybe it could be broken up into a set of small lpm_clshift instances to implement pipelining like Rysc has in mind.

0 Kudos
Reply