- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi all,

I am optimizing a filter that was made in a Cyclone V device. The current filter uses two DSP blocks

for two multiplications; according to the Cyclone V device handbook, it should be possible to fit two

independent multipliers in one DSP block. I tried to implement the multipliers by using the ‘Multiply

adder’ intel FPGA which uses the ‘altera_mult_add’ module from the ‘altera_lm’ library.

Unfortunately, it seems to implement the two multipliers in ‘Multiplier Adder mode’, this mode is

shown in the figure below.

I would like to implement the two multipliers independently. However, I can’t figure out how I can bypass the adder from the DSP block as shown in the next figure.

I would be very thankful if someone can give me a solution to this problem, I am working with the

Intel FPGA lite 18.1 software.

Kind regards,

Bas van Wijngaarden

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi all,

I was working with SanderWeijers on the same project and we had given up this issue at some point.

Last week I decided to have one last attempt into solving this problem which - if solved - could lead into an enormous resource improvement regarding DSP block usage. We finally found the solution and I will place it here for future reference, afterwards we also saw that this solution was brought up in the following thread: https://community.intel.com/t5/Intel-Quartus-Prime-Software/Instantiation-of-9x9-multipliers-in-cyclone-V-GT-FPGA/m-p/701753/highlight/true?profile.language=en.

We have to use the Logic Lock (LL) regions feature which is not available in Quartus Prime Lite, therefore we can use Quartus Prime Standard to improve the DSP block usage in the example project. The logic lock regions force the fitter to place two independent multipliers into one DSP block. I will describe a step by step approach to solve the problem in the example project since it is a little bit complicated.

1) First we download the .qar file given by Sander on 06-29-2023

2) Synthesize the project, we see that the DSP block usage will be 15 DSP blocks. If we would run the fitter now, the DSP block

usage would stay at 15. You might expect a DSP block usage of 16 blocks, but the first DSP block is in 'sum of two 18x18' mode which makes it possible for the fitter to directly place the two multipliers into one block together with the adder functionality. Our issue is with the other 7 dual_multipliers which are in 'two independent 18x18' mode and now utilize 14 DSP blocks.

3) Open the chip planner: Tools -> Chip Planner

4) Find the DSP blocks in the chip planner

5) Go to: View -> Logic Lock Regions -> Create Logic Logic region, and create a region that fits exactly the amount of blocks we have to fit; in this case we have to create a region which consists of 7 DSP blocks only!

6) Right click on the region and go to: Logic Lock Regions -> Logic Lock Region Properties

7) Click on add and in the 'Add Node' window under 'Node Name ' add 'dual_multiplier:\multipliers:1:multiplier:i_dual_multiplier', also do this for multipliers 2 to 7. Do not add multiplier 0 since this is the MULT_ADD .

9) Click three times on the 'ok' button.

10) Run the fitter

11) You will find that the total DSP blocks now are reduced to only 8.

The picture below is used to clarify some point of the step-by-step approach here above.

The highlighted dark blue lines depict the LL region which consists of the 7 DSP blocks (which I gave a light blue color). In the Logic Lock properties you see the design elements that I added and also the excluded element types which gives the fitter more freedom.

If we now look into the Technology map viewer we see that the fitter successfully uses the two multipliers from each DSP block and also bypasses the adder.

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

To fit two multipliers in one DSP block on the Cyclone V GX FPGA, you can use the following approach:

Enable the M10K Memory: In the Quartus Prime software, enable the M10K memory feature for the Cyclone V GX device. This allows you to use the embedded memory blocks within the DSP blocks for storing intermediate results.

Optimize Resource Sharing: When implementing your design, ensure that the multipliers you want to fit in one DSP block are utilized concurrently or have overlapping usage. By optimizing the resource sharing, you can maximize the utilization of the available DSP blocks.

Enable Parallelism: If possible, consider parallelizing your design to exploit the parallel processing capabilities of the DSP blocks. By distributing the computation across multiple DSP blocks, you can increase the efficiency of resource usage.

Optimize Arithmetic Precision: Adjust the arithmetic precision of your design to find a balance between accuracy and resource utilization. Reducing the precision can lead to more efficient utilization of DSP blocks.

Use the Quartus Prime Compiler: Utilize the Quartus Prime Compiler's optimization features. Enable advanced optimizations and explore different optimization settings to maximize the usage of DSP blocks.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello Pooledworthy,

Thanks for your suggestions, see below our remarks.

1) Enable the M10K Memory:

The M10K memory was already enabled with the maximum number at '-1', unlimited.

2) Optimize Resource Sharing:

We enabled auto resource sharing, but this did not reduce the total DSP blocks used.

3) Enable Parallelism:

The design already utilizes the maximum amount of DSP blocks available. To reduce resources, it is desired to implement two multipliers in one DSP block.

4) Optimize Arithmetic Precision:

The design is not flexible and it is not possible to reduce precision. However, the resolution we are using should fit according to the Cyclone V handbook.

5) Use the Quartus Prime Compiler:

The total amount of DSP blocks that can be used were limited in the compiler. I hoped that this forces the compiler to implement two multipliers in one DSP block, this is not the case.

I would like to thank you for the suggestions; however, our problem is not solved yet.

It would be helpful if someone who knows the solution can improve the example project.

Kind regards

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi,

After implementing the suggestions, are you able to fit the design?

Thank you

Kshitij Goel

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Dear K Goel,

Unfortunately the suggestions did not fix the problem.

Other suggestions are welcome.

Kind regards,

Sander Weijers

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi,

Please share your project, I will look into it.

Are you using LPM_MULT IP? If yes, how much is the latency you have entered for pipelining?

If you are using 1 or 2 please try with 3.

Thank you

Kshitij Goel

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello K. Goel,

Thanks for your feedback. See my post (above) from june 29th for the example project.

Kind regards,

Sander Weijers

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi,

I have reviewed your design and it seems like you said you are using the multiply adder so it will use the Mult add architecture and in your code also you are doing addition.

* FOR k IN 0 TO (AB_WIDTH - 1) LOOP*

* sum := sum + p_output(k);*

* END LOOP;*

Try to use the LPM_MULT IP to implement the two multipliers independently. Please refer 3.4.1. Operational Modes (intel.com).

As of now, your code is referring to 3.6.2.1. 18 x 19 Complex Multiplier (intel.com)

Hope this will solve your issue.

Thank you

Kshitij Goel

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi,

As we do not receive any response from you on the previous reply that we have provided. Please login to ‘https://supporttickets.intel.com’, view details of the desire request, and post a response within the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support. The community users will be able to help you on your follow-up questions.

Thank you

Kshitij Goel

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi all,

I was working with SanderWeijers on the same project and we had given up this issue at some point.

Last week I decided to have one last attempt into solving this problem which - if solved - could lead into an enormous resource improvement regarding DSP block usage. We finally found the solution and I will place it here for future reference, afterwards we also saw that this solution was brought up in the following thread: https://community.intel.com/t5/Intel-Quartus-Prime-Software/Instantiation-of-9x9-multipliers-in-cyclone-V-GT-FPGA/m-p/701753/highlight/true?profile.language=en.

We have to use the Logic Lock (LL) regions feature which is not available in Quartus Prime Lite, therefore we can use Quartus Prime Standard to improve the DSP block usage in the example project. The logic lock regions force the fitter to place two independent multipliers into one DSP block. I will describe a step by step approach to solve the problem in the example project since it is a little bit complicated.

1) First we download the .qar file given by Sander on 06-29-2023

2) Synthesize the project, we see that the DSP block usage will be 15 DSP blocks. If we would run the fitter now, the DSP block

usage would stay at 15. You might expect a DSP block usage of 16 blocks, but the first DSP block is in 'sum of two 18x18' mode which makes it possible for the fitter to directly place the two multipliers into one block together with the adder functionality. Our issue is with the other 7 dual_multipliers which are in 'two independent 18x18' mode and now utilize 14 DSP blocks.

3) Open the chip planner: Tools -> Chip Planner

4) Find the DSP blocks in the chip planner

5) Go to: View -> Logic Lock Regions -> Create Logic Logic region, and create a region that fits exactly the amount of blocks we have to fit; in this case we have to create a region which consists of 7 DSP blocks only!

6) Right click on the region and go to: Logic Lock Regions -> Logic Lock Region Properties

7) Click on add and in the 'Add Node' window under 'Node Name ' add 'dual_multiplier:\multipliers:1:multiplier:i_dual_multiplier', also do this for multipliers 2 to 7. Do not add multiplier 0 since this is the MULT_ADD .

9) Click three times on the 'ok' button.

10) Run the fitter

11) You will find that the total DSP blocks now are reduced to only 8.

The picture below is used to clarify some point of the step-by-step approach here above.

The highlighted dark blue lines depict the LL region which consists of the 7 DSP blocks (which I gave a light blue color). In the Logic Lock properties you see the design elements that I added and also the excluded element types which gives the fitter more freedom.

If we now look into the Technology map viewer we see that the fitter successfully uses the two multipliers from each DSP block and also bypasses the adder.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page