Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
20688 Discussions

Implementing a Large RAM on Cyclone II - is it possible?

Altera_Forum
Honored Contributor II
1,073 Views

Hi, this is my first time implementing something on FPGA. 

My instructor is currently very busy and I can't find an answer to my problem, hoped I would find here someone nice enough to help. 

 

In an architecture I need to implement there is a 64X1006 (can be larger of course) bit addressable array. 

 

When I designed the architecture I didn't know the capabilities of the FPGA, but now when I want to implement it I've seen that the blockrams are of size of 4K bits... 

 

As to the number of ports: 

Dual write can greatly enhance the performance, but I can manage just fine with a single write per cycle. 

Read is always single read per cycle. 

 

The device I'm using is cyclone ii ep2c35f672c6 which seems to contain M4K blocks 

 

Is there any way I can implement this entirely on the FPGA? External dual-port RAM seems to me too exotic, and also I'm not sure if I'm allowed to use even a regular, single-port one. 

 

Thanks in advance to those who are willing to answer or to direct me to an answer. 

 

Dan
0 Kudos
4 Replies
Altera_Forum
Honored Contributor II
351 Views

How many of the bits would be accessed at once in a single cycle? More generally, what is the access pattern? Can you reorganize the data in such a way that bits are grouped into tall "columns", and, in any clock cycle, there's at most 2 accesses (2 writes or 1 read and 1 write) to each column?

0 Kudos
Altera_Forum
Honored Contributor II
351 Views

On that part you have 105 M4K blocks, which can be configured in a 4096x1 access thru 128x32. Since you only need a depth of 64 vs the smallest native depth of 128 it means you will need to build a memory that is twice as big, like (128x32)x32 slices to give a 128x1024 array, assuming you want to access all 1006 bits in parallel. 

 

With 105 M4K blocks on that EP2C35 device you have more than enough M4K resources (105 >> 32).
0 Kudos
Altera_Forum
Honored Contributor II
351 Views

 

--- Quote Start ---  

How many of the bits would be accessed at once in a single cycle? More generally, what is the access pattern? Can you reorganize the data in such a way that bits are grouped into tall "columns", and, in any clock cycle, there's at most 2 accesses (2 writes or 1 read and 1 write) to each column? 

--- Quote End ---  

 

 

Yes. The pattern is 2 writes in a single cycle to the same column in some particular pattern such that the column get filled, then moving-on to filling the next one.  

After the table is full there is a read of a single bit from the last (1006th) column, then another single bit from the 1005th column (row is determined by the previous bit extracted), and so on. 

So the read have to be 1-by-1, as the next "row address" to read from depends on the value from the previous read. 

 

 

--- Quote Start ---  

On that part you have 105 M4K blocks, which can be configured in a 4096x1 access thru 128x32. Since you only need a depth of 64 vs the smallest native depth of 128 it means you will need to build a memory that is twice as big, like (128x32)x32 slices to give a 128x1024 array, assuming you want to access all 1006 bits in parallel. 

 

With 105 M4K blocks on that EP2C35 device you have more than enough M4K resources (105 >> 32). 

--- Quote End ---  

 

 

So what you are saying basically is to use multiple M4Ks and use muxes and other selection blocks to write\read to\from the appropriate block. 

 

A follow-up question to you:  

If I use HDL to decribe such a 64 X 1024 array of bits, will the synthesizer "know" to build the selection circuitry needed?  

Or is the right way of doing it is specifying exactly the use of 32 M4Ks and the proper selection circuitry by myself? 

 

 

Thanks a lot to both of you!
0 Kudos
Altera_Forum
Honored Contributor II
351 Views

So, 2 writes at most at any single time? 

 

Sounds like you should be able to format it as 16 sub-arrays of width 1 and depth 4096, by combining up to 64 columns in each sub-array, and fit each in one M4K.  

 

The synthesizer is generally pretty good at inferring memory blocks, so, as long as your access pattern is recognized as something it should be able to implement in RAM, it'll usually do it for you. If it fails, you'll know it because it'll put everything into logic cells instead, and you don't have enough logic cells for that, so it'll end up being unable to fit the design. 

 

If you can't coax it into doing M4Ks for you, instantiate a bunch of altsyncram blocks instead using the IP editor. Not necessary to go down to coding all details of selection circuitry.
0 Kudos
Reply