Arria 10 M20k block usage- More block with less memory bits?

TSchu3 · ‎02-12-2025

Hello I have a design that fits on the Arria 10 GX dev board with chip 10AX11552F4sI1SG which has 2713 M20K blocks.
The total memory bits for this design are 36,171,984, all 2713 blocks are used.

I am moving this design to a smaller Arria 10, 10AX066H2F34I2sG, which has 2131 M20K blocks.
I reduced the memory usage to 32,961,368 memory bits, BUT here is where I am getting really confused. The fitter fails and says it now needs 2845 M20k blocks more than the design when it used 36,171,984 memory bits.

How can the same design, with reduced memory, need more M20k blocks than the original design? Any suggestions are greatly appreciated.

This project is being compiled in Quartus Pro 23.4.0.

sstrell · ‎02-13-2025

Are you saying on the original device you used *exactly* the number of M20K blocks available, 2713? That is quite a feat.

As for why it may need more now, it really depends on how you are implementing the RAM. It may not be able to be optimized exactly the same way it was on the original chip.

Showing some code might help.

TSchu3 · ‎02-14-2025

Yes I used exactly 2713 blocks. Its really not that difficult. It did not use all of the bits though.

The RAM is implemented in many many ways, from Fifos to instantiated Block RAM IP.
I am not seeing how the RAM would not be implemented in the same way on the new chip. It is just a smaller version of the same Arria 10. It has fewer M20K blocks, not different memory blocks (M10k for example); its not a different family (Cyclone, Stratix, etc.)
I am failing to see how optimization could cause it to need MORE memory blocks when I significantly reduced the required memory bits.

Thanks

sstrell · ‎02-14-2025

Without seeing the design and how you've organized all the memory, there's really no way to know what happened.

KennyTan_Altera · ‎02-16-2025

Can you attached both of your design for our investigation?

TSchu3 · ‎02-18-2025

Hi, no I can't send the design it is proprietary.
Can you point me to any Quartus reports or settings that might shed some light on this?

Another point of information. If I further reduce memory usage to 30,466,048 which is a reduction of 2,495,320 it is suddenly able to fit and uses all 2131 M20k blocks. But somehow the design with 32,961,368 bits needs 2845 blocks? That's a difference of 714 M20k blocks or 14,280,000 bits but I only needed 2,295,530 bits.

This is really not adding up for me.

Thanks again.

TruHy · ‎02-18-2025

Hmm, the number of bits mentioned doesn't seem correct. M20K represents 20Kbits = 1024*20 bits

To use all of 2713 M20K your design would need to use:

1024*20*2713 = 55562240 bits

To use all of 2131 M20K your design would need to use:

1024*20*2131 = 43642880 bits

Perhaps you got the array size or index calculation wrong?

TSchu3 · ‎02-20-2025

TruHy, you have hit the nail on the head. It doesn't make any sense.
I did not calculate any of those numbers. They were pulled directly from the fitter summary.
This is the nature of my question. How can only 32 million bits need 2845 blocks?

KennyTan_Altera · ‎02-18-2025

Here is some of the key Quartus Reports to Check

1. Fitter Resource Usage Report (.fit.rpt)

Location: quartus_fit_report.html or in the Quartus GUI under Processing → Compilation Report → Fitter → Resource Section → RAM Summary

What to Look For:

Compare how many M20K blocks are used in both the working and failing designs.

Look for any significant changes in RAM block utilization patterns.

Identify whether certain RAMs suddenly require multiple M20Ks instead of fewer.

2. RAM Classification Report (.ram_summary.rpt)

Location: Quartus GUI → Fitter → Resource Section → RAM Summary

What to Look For:

How Quartus is mapping logical RAMs to M20Ks.

Whether the Fitter is splitting memories into multiple blocks inefficiently.

3. RAM Packing Efficiency (.fit.rpt)

Location: In the Fitter → RAM Summary section of the report.

What to Look For:

If Quartus is failing to pack RAMs efficiently in the smaller device.

Look at the average block utilization to see if it has dropped.

4. Post-Mapping Resource Usage (.map.rpt)

Location: quartus_map_report.html or Quartus GUI under Processing → Compilation Report → Synthesis → RAM Usage

What to Look For:

How Quartus initially assigns memory to M20K blocks before placement constraints.

This can highlight unexpected fragmentation before the fitter even runs.

Possible Causes Based on Your Observation

Fragmentation Threshold Effect

Quartus could be fitting RAMs optimally up to a certain utilization threshold but then failing when the M20Ks get too full.

Instead of packing smaller memories efficiently, it breaks them up inefficiently across additional blocks.

Auto-Merging of Smaller RAMs

Quartus sometimes merges small RAMs into single M20Ks for efficiency.

If you crossed a memory limit where it can no longer merge, it may be splitting these into extra M20Ks instead.

FIFO Depth Adjustments

Since you have FIFOs and instantiated Block RAM, check if Quartus is suddenly adding padding or increasing depth beyond what’s needed.

Address Width Misalignment

If a change in memory size increased an address width (e.g., from 9 to 10 bits), it could be using more M20Ks to store extra address bits.

Quartus Settings to Try

If the reports show inefficient memory packing, try forcing Quartus to optimize RAM placement:

Disable Power Optimization (Which Can Impact Packing)

Quartus setting -> advance fitter setting -> power optimize during fitting -> Off

Sometimes Quartus optimizes power at the cost of more RAM blocks.

Force Specific RAM Implementation (If Quartus is Mapping Poorly)

set_instance_assignment -name AUTO_RAM_REPLACEMENT -to <ram_instance_name>

set_instance_assignment -name Infer_Ram_from_raw_logic -to <ram_instance_name>

Look for the setting that you can play in the advance fitter setting or advance synthesis setting.

TSchu3 · ‎02-20-2025

Kenny_Tan, Thank you for all of this information I'll look through it all and see what I can find.

ShengN_Intel · ‎02-23-2025

Hi,

May I know is there any further update or concern?

Thanks,

Regards,

Sheng

TSchu3 · ‎02-24-2025

Ok, so the problem seems to be that the new project targeting the smaller Arria device is not using any MLABS.

The original project on the larger Arria, is using many MLABs.

The settings that I am aware of that cover MLAB usage are

Auto RAM to MLAB Conversion- This is set to ON in both projects
Equivalent RAM and MLAB Paused read capabilities- Set to CARE in both projects

Equivalent RAM and MLAB power up- Set to AUTO in both projects

MLAB add timing contraints for mised port feed through mode setting don't care- set to OFF in both projects

They match which is expected since the new smaller chip project is just an archive of the original (and functional) project but moved to a different (smaller) chip.

What could cause the new project to not use any MLABs? Other settings I'm missing?

Thanks again.

TSchu3 · ‎03-03-2025

Does anyone have any suggestions? This problem still does not make any sense to me and none of the suggested ideas/issues explain what I am seeing.

Thanks.

KennyTan_Altera · ‎03-03-2025

Sorry for the late reply as I am off for a week

The fact that the smaller Arria 10 design isn't using any MLABs while the original one does is likely the root cause of the extra M20K block usage. MLABs (which use ALM-based memory) help offload some RAM demand from M20Ks, so their absence forces Quartus to place everything in M20Ks, explaining the increased block count.

Possible Reasons MLABs Are Not Being Used

Even though the MLAB-related settings match between projects, there are a few other factors that might be preventing Quartus from using MLABs in the smaller device:

1. MLAB Utilization Can Depend on Fitter Heuristics

Even if "Auto RAM to MLAB Conversion" is ON, Quartus still makes a decision based on available resources and timing feasibility.

Since the smaller device has fewer ALMs and M20Ks, Quartus may have chosen to disable MLAB usage to meet timing or placement constraints.

Check in the Fitter Report

Look at the MLAB Utilization section in .fit.rpt in both projects and see if Quartus states a reason for not using MLABs.

Quartus may be rejecting MLABs due to high logic utilization or routing congestion in the smaller chip.

2. Synthesis and Fitting Differences Due to Chip Resource Constraints

Since the smaller FPGA has fewer total ALMs, Quartus might prioritize ALMs for logic instead of MLABs for memory.

If the larger FPGA had excess ALMs, Quartus may have felt comfortable assigning MLABs, but in the smaller FPGA, it might reserve ALMs for combinational logic instead.

Check Logic Utilization

Open Fitter → Resource Usage Report and see if ALM usage is significantly higher in the smaller device compared to the original.

If ALM utilization is high, Quartus might be avoiding MLABs to prevent routing congestion.

3. Memory Depth and Width Constraints for MLAB Usage

MLABs can only store 32-bit wide words with a depth of 640 words. If Quartus synthesized the RAM slightly differently (e.g., with increased depth/width due to address alignment constraints), it may have disqualified MLAB usage.

Check RAM Depth & Width

Look at the RAM Summary Report (.ram_summary.rpt) and check if:

Quartus changed the width or depth of certain RAM blocks.

Some RAMs that previously fit in MLABs are now too deep or too wide.

Manually force certain RAMs to use MLABs and see if Quartus accepts it

set_instance_assignment -name RAM_BLOCK_TYPE MLAB -to <ram_instance>

4. MLAB Timing Constraints Might Be More Strict in the Smaller FPGA

MLABs have higher access latency than M20Ks. If Quartus detected that using MLABs would violate timing constraints (especially due to tighter placement in the smaller FPGA), it may have disabled their use.

Check Timing Report

Run a Timing Analysis (.sta.rpt) and check for MLAB-related timing violations.

If Quartus is failing to meet setup/hold timing with MLABs, it may have automatically forced all RAMs into M20Ks instead.

KennyTan_Altera · ‎03-06-2025

Is there any further queries?

TSchu3 · ‎03-07-2025

Hi, I'm still digging through this as time permits. I will update later today or next week.

I am still skeptical that any of the above listed options will explain the current problem but I will check them all.

Thanks

KennyTan_Altera · ‎03-09-2025

Understood, let me know the feedback by end of the week.

TSchu3 · ‎03-10-2025

Here are my findings so far-

1- I implemented a version of the design with reduced memory just to see when it would start fitting. I discovered that when I reduce memory usage in order to fit onto the smaller chip then Quartus uses MLABS. However when I try to implement the full design and fitment fails. The report then shows 0 MLAB usage.
So it is tricky to say what the problem is for the failing design.
For the large Arria - It shows 2415 MLABS used.
For the Small Arria with reduced memory - it shows 3653

For the full design on smaller Arria- is shows 0

Interestingly it states that MLABs can be up to 1/2 of total LABs. But in neither case does it use half.
And for the Small Chip design w/ reduced memory it only uses 92% of Logic and MLABs

2- ALM usage of the Full design on smaller chip, which fails fitment is 119,902 (48%)
ALM usage is 185,619 on the small chip (74%) w/ reduced memory
ALM usage is 172,802 on the large chip (40%)

3- I could not see any differences in port depth.
I have not yet tried forcing an instance into an MLAB. It is failing by such a large degree I'm not sure which instances to try or how many.

4 I am not able to determine whether this applies or not. Since the design fails fitment it can't run timing analysis

This all brings me back to my initial question.
Why does the smaller chip claim to need 36,086,072 memory bits and 3,055 blocks , when the larger chip only needs 35,349,568 bits and 2713 blocks?
Also please note that 36,086,072 bits is only 83% of the smaller Arria chip and should fit (in my experience).

Thanks again.

TSchu3 · ‎03-12-2025

Additional information-

I attempted to force several memory instances to MLAB via
set_instance_assignment -name RAM_BLOCK_TYPE MLAB -to <ram_instance>

This had no effect.

However, what did work was altering fifos in the platform designer.
I discovered that Quartus was constraining all fifos to use only M20k.
This seemed odd to me, so I looked at them in the platform designer. They were all set to 'Auto' under the "What should the memory block type be?". To test this out I set some of them to MLAB. Then I saw a significant reduction in M20K use and MLABs were selected instead.

It seems that for some reason Quartus is interpreting 'Auto' to be constraining to M20k only? Not sure how else to explain it.
These fifos are all set to 'Auto' in the successful project on the larger chip and Quartus does not constrain them to M20k there. But on this smaller chip it does.

KennyTan_Altera · ‎03-13-2025

Thanks for your feedback, here are some reasoning behind:

On the larger FPGA, Quartus has more total resources (ALMs + M20Ks) to distribute memory more flexibly.

On the smaller FPGA, it might prioritize M20Ks for timing, routing, or stability reasons, avoiding MLABs unless explicitly forced.

Different Default "Auto" Settings Per Device

Quartus may have different default optimization strategies for different device sizes.

Some device families default to MLAB for small FIFOs, while others default to M20K, even when set to "Auto".

MLAB Timing Constraints in Smaller FPGA

If Quartus predicted MLAB timing to be worse on the smaller FPGA, it may have avoided them unless explicitly forced.

The smaller FPGA may have different routing congestion issues that caused Quartus to prefer M20Ks.

Platform Designer Defaults Might Differ Per Target Device

The Platform Designer's "Auto" setting may not be purely a Quartus decision—it could be pre-optimized per FPGA family.

The larger FPGA might default to "best-fit" (MLAB + M20K mix), while the smaller FPGA defaults to "safe-fit" (M20K only).

TSchu3 · ‎03-17-2025

Is it possible to change the setting for 'AUTO' from 'Best-Fit" or "Safe-Fit"? OR at least see what it is set at?

Arria 10 M20k block usage- More block with less memory bits?

General Architecture (Non|IO)