Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
17268 Discussions

weired results assigning memory blocks

Altera_Forum
Honored Contributor II
1,961 Views

Hey Guys, 

 

I have this weird situation (to me at least) and wanted to know if anyone here has any insights on it. 

 

I was trying to synthesize a design on an EP4SGX530 Stratix IV FPGA. I needed 2364 dual-port memory blocks all being accessed in parallel, each of these memories are very small with both address and data widths of 4. I looked at the specs for EP4SGX530 and noticed that it only has 1280 M9Ks, so I figured since I can only have 1280 memories to read and write simultaneously, it might not be possible. 

But, I thought let’s try it and created memory modules using “altsyncram megafunction” and synthesized the design. 

 

 

To my surprise, it successfully finished synthesis, place and route! 

I check the fitter report, and it reported that only 264 of the available 1280 M9K blocks are used! This means that somehow, it sliced an M9K block into 8 independent dual-port memories with independent clocks, read and write ports, and fitted 2364 blocks into these 264 M9Ks, but I thought this is not possible! 

 

I even tested the design, and it is working fine, so I am missing something here? 

 

Sina
0 Kudos
8 Replies
Altera_Forum
Honored Contributor II
867 Views

You cannot split the M9K block as you suggest into multiple independent true dual port memories. You can split an M9K into two independent simple dual port memories, but this does not explain what you observed. I suggest you check and make sure that you didn't get any messages during synthesis that memory blocks were synthesized away for some reason. This doesn't explain why your design apparently functions as you expect. Perhaps QSyn was able to merge multiple memories that you thought were independent into a lower number of memory blocks than you thought.

0 Kudos
Altera_Forum
Honored Contributor II
867 Views

Tnx Jimbo ;) 

 

Please look at the attached fitter report. 

 

In section 10 (HardCopy Device Resource Guide), you can see that 292/1280 M9K rams are used. 

 

Now if you look at section 26 (Fitter RAM Summary), if you count the numbers in M9K column, you will get a total of 2399! (I added them using excel) 

 

Also, if you look at locations here, you can see that many M9K location are used by mutiple instances, for example, location M9K_X122_Y86_N0 is used in MEM[0], MEM[1500], MEM[1540], MEM[1775], MEM[1800], MEM[1840], MEM[295], MEM[80], MEM[950] and MEM[965]. 

 

This does not make sense to me at all. 

 

You said it might merged memories, how can I check that? 

 

Thanks, 

Sina
0 Kudos
Altera_Forum
Honored Contributor II
867 Views

Assuming that Quartus doesn't merge RAM blocks erroneously, the RTL must allow it. Without knowing the RTL, it's impossible to determine, how the RAM blocks can be merged. You can simply check one example of blocks going to the same location, which signals are connected in the RTL netlist.

0 Kudos
Altera_Forum
Honored Contributor II
867 Views

Are you inferring the memory, or using the MegaWizard? If the former, then check the section in the Quartus II Handbook on synthesis regarding inferring dual-port memory. Your RTL code must be coded in the specific manner detailed in the handbook for this to work. If the latter, then the MegaWizard will tell you how many M9K blocks are required. 

 

Try taking as small a portion of your RTL code as possible and compile just this to get a better idea how it is being synthesized.
0 Kudos
Altera_Forum
Honored Contributor II
867 Views

Thanks guys, this is how I instantiate RAMs: 

 

generate 

for (i=0; i < memsize; i=i+1) begin : MEM 

memory# ( .soft_bits(soft_bits), .z(normalized_address_width), .address_width(address_width)) U ( .clock(sys_clk), .data(data_in), .rdaddress(address_out), .rden(read), .wraddress(address_in), .wren(write), .q(data_out)); 

end 

endgenerate 

 

and then, I use MegaWizard to generate rams, I made some changes to make it parametric, but did not change anything else. 

 

Note that the RAM modules I geenrate are very small, but they are dual port and I didn't see anything about merging small memory blocks in datasheets. 

 

 

 

`timescale 1 ps / 1 ps 

module memory ( 

clock, 

data, 

rdaddress, 

rden, 

wraddress, 

wren, 

q); 

 

parameter soft_bits=5;  

parameter z=4;  

parameter address_width=2; 

 

input clock; 

input [soft_bits-1:0] data; 

input [address_width-1:0] rdaddress; 

input rden; 

input [address_width-1:0] wraddress; 

input wren; 

output [soft_bits-1:0] q; 

 

`ifndef ALTERA_RESERVED_QIS 

// synopsys translate_off 

`endif 

tri1 clock; 

tri1 rden; 

tri0 wren; 

`ifndef ALTERA_RESERVED_QIS 

// synopsys translate_on 

`endif 

 

wire [soft_bits-1:0] sub_wire0; 

wire [soft_bits-1:0] q = sub_wire0[soft_bits-1:0]; 

 

altsyncram altsyncram_component ( 

.address_a (wraddress), 

.clock0 (clock), 

.data_a (data), 

.rden_b (rden), 

.wren_a (wren), 

.address_b (rdaddress), 

.q_b (sub_wire0), 

.aclr0 (1'b0), 

.aclr1 (1'b0), 

.addressstall_a (1'b0), 

.addressstall_b (1'b0), 

.byteena_a (1'b1), 

.byteena_b (1'b1), 

.clock1 (1'b1), 

.clocken0 (1'b1), 

.clocken1 (1'b1), 

.clocken2 (1'b1), 

.clocken3 (1'b1), 

.data_b ({soft_bits{1'b1}}), 

.eccstatus (), 

.q_a (), 

.rden_a (1'b1), 

.wren_b (1'b0)); 

defparam 

altsyncram_component.address_aclr_b = "NONE", 

altsyncram_component.address_reg_b = "CLOCK0", 

altsyncram_component.clock_enable_input_a = "BYPASS", 

altsyncram_component.clock_enable_input_b = "BYPASS", 

altsyncram_component.clock_enable_output_b = "BYPASS", 

altsyncram_component.intended_device_family = "Stratix IV", 

altsyncram_component.lpm_type = "altsyncram", 

altsyncram_component.numwords_a = z, 

altsyncram_component.numwords_b = z, 

altsyncram_component.operation_mode = "DUAL_PORT", 

altsyncram_component.outdata_aclr_b = "NONE", 

altsyncram_component.outdata_reg_b = "CLOCK0", 

altsyncram_component.power_up_uninitialized = "FALSE", 

altsyncram_component.rdcontrol_reg_b = "CLOCK0", 

altsyncram_component.read_during_write_mode_mixed_ports = "OLD_DATA", 

altsyncram_component.widthad_a = address_width, 

altsyncram_component.widthad_b = address_width, 

altsyncram_component.width_a = soft_bits, 

altsyncram_component.width_b = soft_bits, 

altsyncram_component.width_byteena_a = 1; 

 

 

endmodule
0 Kudos
Altera_Forum
Honored Contributor II
867 Views

Are in and out address and data of an array type? How they are connected in the upper module? Are the control signals of generated RAM blocks identical or different? Without this information, you can't know, if RAM blocks can be merged. As sad, you should check the RTL netlist of the compiled design, it shows the real connection of the inferred RAM blocks. 

 

P.S.: Please consider, that compilation results of a test design, that doesn't completely connect all RAM instances at the outside, would be meaningless.
0 Kudos
Altera_Forum
Honored Contributor II
867 Views

Here is a very basic test result, you can again see that the fitter merged 6 memory blocks into 1 M9K (M9K_X3_Y4_N0), I understand that since in this sample,inputs and outputs are not connected, it might not be accurate, but to my surprise, even in the full design with all connections, it is doing the same merges.  

 

As you said, I checked RTL netlist in my design and everything makes perfect sence and all simple dual port ram blocks are considered as single ram blocks with separate input and outputs, but I am observing the same merging effect.  

 

The question is, I didn't see anything about merging anywhere in Altera's Literature!, how is this even possible?
0 Kudos
Altera_Forum
Honored Contributor II
867 Views

It has been said. 

 

--- Quote Start ---  

Please consider, that compilation results of a test design, that doesn't completely connect all RAM instances at the outside, would be meaningless. 

--- Quote End ---  

 

Generally, Quartus integrated synthesis will optimize any part of the design, it's able do. So it does in th epresent case.
0 Kudos
Reply