Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16556 Discussions

Instantiation of 9x9 multipliers in cyclone V GT FPGA?

Yogesh
Novice
809 Views

I have instantiated 96 8x8 multipliers as below each input width of 8bits , and output of 16 bits as below:

genvar i; generate   for(i=0; i<96 ; i=i+1) begin : mult_instan lp_mult lp_mult_component ( .clock(clk), .clken(i_conv_clken), .dataa (i_input_data_A[(8*(i+1))-1:8*(i)]), .datab (i_input_data_B[(8*(i+1))-1:8*(i)]), .result (out[i]) ); end endgenerate

after I synthesize my code code , fitter is using 48 DSPs(48 sum of two 18x18 multipliers) to implement these 96 multipliers.

 

If I understand correctly each DSP in Cyclone V devices have three 9x9 or two 18x18 .I have 96 9x9 multipliers used , So 96/3= 32 DSPs should be used.

 

Why is the tool using 48 DSPs instead of 32 DSPs ?

 

Is there some other way of making tool use 32 DSPs? or am I doing something wrong ?

 

Since each DSP have two 18x18 multipliers, Can I somehow split each 18x18 multipliers into two independent 9x9 multipliers? So

that each DSP provide me four 9x9 multiplier?

 

0 Kudos
11 Replies
CheePin_C_Intel
Employee
684 Views

Hi,

 

As I understand it, you are inquiring why Fitter is using 48 DSP blocks instead of 32 DSP blocks to implement your design with 96 8x8 multipliers. Based on my understanding, by default, if there are sufficient DSP resources, Fitter will optimize the resource usage for better performance instead of packing all into minimum amount of DSP blocks. If you are attempting to pack your design into limited number of DSP blocks, you can try to use logic lock to force your design into a region with limited DSP resources to see to force Fitter to pack the multipliers. 

 

Please let me know if there is any concern. Thank you.

0 Kudos
Yogesh
Novice
684 Views

Thankyou.

But since I am wasting the resources, I want to force the fitter to use only 32 DSPs instead of 48 DSPs .

1) How can I use logic lock in quartus , to force fitter use limited DSPs?

2) What might be the performance in terms of frequency in both cases(32 DSPs and 48 DSPs) , if I force fitter to use limited DSPs?

0 Kudos
CheePin_C_Intel
Employee
684 Views

Hi,

 

Regarding your latest inquiries, please see my responses as following:

 

1) How can I use logic lock in quartus , to force fitter use limited DSPs?

[CP] Just to share on one of the method which I am aware of. At Quartus -> Tools -> Chip Planner, look for "Create LogicLock Region" button on the left side of the window. Click on it then select a region in the chip planner of where you would like to create a LogicLock region. You can try to select region with limited DSPs. Then go back to Project Navigator -> Hierarchy, look for the instances that you would like to assign to the previously created region. Right click on the instances -> LogicLock Region -> Assign to Existing LogicLock Region. You may then run through Fitter compilation.

2) What might be the performance in terms of frequency in both cases(32 DSPs and 48 DSPs) , if I force fitter to use limited DSPs?

[CP] As I understand it, the performance would be dependent on your design as well as compilation. Thus, you might need to compare after running compilation with the two designs on your side.

By the way, generally if you have sufficient DSP resources, it would be recommended to let the Fitter to perform auto-fitting instead of using LogicLock to pack the DSP builder.

 

Please let me know if there is any concern. Thank you.

 

 

Best regards,

Chee Pin

0 Kudos
Yogesh
Novice
684 Views

Hi,

 

I tried using logic_lock method, like you suggested, but felt it was complex and couldnt achieve end result .

 

I am sharing archieve file (modules.qar)of my sample code.

Here I have instantiated 48 lpm_mult IP. Each lpm_mult takes two 8 bit inputs , and will provide 16 bit output.

Ideally tool should use 16 DSPs since each DSP in cyclone V has three 9x9 multipliers.

But it is using 24 DSPs(with each DSP using sum of two 18x18 multipliers)

If this is the case ,I will face shortage of DSPs in my top level.

So, I want the tool to use three 9x9 for each DSP(since i am only doing 8x8 multiplication).

 

 

1))I thought tool might be using more since its available, but it is not the case. Let me take a situation , Say If i instantiate 685 9x9 multipliers , It will use 342 DSPs(100 % usage- each DSP using sum of two 18x18), and for one more 9x9(i.e 685 th multiplier) it will use ALMs.

As a result there is drop in my required frequency . Tool should have used 3 9x9 atleast in this case since there are no more DSPs, to achieve required design right?But it is not happening.

  

2) Am I doing something wrong in the instantiation , so that tool is not understanding that it should use 1 DSP for 3 instances of 9x9 lpm_mults? If yes, please tell me how to make tool use it so?

 

3) I want to know how to use DSPs in different operational modes mentionaed in below document (page number 3-10)

 

https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/cyclone-v/cv_5v2.pdf

 

0 Kudos
CheePin_C_Intel
Employee
684 Views

Hi,

 

For your information, I have attached a simple test design with 3 instances of 8x8 lpm_mult and use LogicLock to force packing into a single DSP block for your reference. You may refer to it and then apply to your own design. You may start with smaller amount of multiplier and then slowly expand the logiclock region.

 

Note that generally the recommendation would be to let Fitter to perform auto-fitting for your design instead of using LogicLock to pack the DSP. Fitter will select the right resources to fit your design.

 

Please let me know if there is any concern. thank you.

0 Kudos
Yogesh
Novice
684 Views
0 Kudos
Yogesh
Novice
684 Views

Hi Chee Pin,

 

Below are my observations from the file you shared:

1) I found a.bdf with 3 copies of mult units.

2) Analysis was showing 3 DSP units usage, but fitter used just 1 DSP for 3 mult units because of logic_lock. So all well till this point.

 

 

Below are the changes I did in order to understand the logic-lock feauture :

1) changed mult type to signed , added 3 clock cycles latency to the IP.

2) instantiated the a.bdf in a_top.v, added a.sdc file.

 

Result after the changes:

1) fitter use to fail showing the error in logic_lock region. So I did this change region_0 or region_1 -> logic_lock region properties -> size and origin -> auto_fit

2) Now fitter is using 3 DSPs , which I dont want.

 

I am sharing new qar file.

1)Please look into it and tell me what is the mistake I am doing and share the solution .qar file , where fitter is using just 1 DSP for 3 mult units.

2)How to give correct height and width for a logic lock region?

3)Region_0 has 3 mult units, but what is the purpose of using region_1 ?

 

0 Kudos
CheePin_C_Intel
Employee
684 Views

Hi,

 

Sorry for the delay. Please see my responses as following:

 

1)Please look into it and tell me what is the mistake I am doing and share the solution .qar file , where fitter is using just 1 DSP for 3 mult units.

[CP] For your information, I have sent you a updated QAR which fit your a.qar into 1 DSP builder. I am not sure what might be wrong with the existing logiclock assignment. However, I have deleted the existing regions and create a new region. Then, I assign the logic into the new region and Fitter is able to pass correctly.

 

2)How to give correct height and width for a logic lock region?

[CP] Normally I will perform the region selection directly at the Chip Planner. I am not really an expert into LogicLock and not very sure the method to give right height and width. We might need to further engage Software Team to further assist you if you would like to pursue on this.

 

3)Region_0 has 3 mult units, but what is the purpose of using region_1 ?

[CP] Sorry for any confusion. If I remember it correctly, one of the region is used for testing purpose. The right region should have only one DSP block inside.

 

Please let me know if there is any concern. Thank you.

 

Best regards,

Chee Pin

 

0 Kudos
CheePin_C_Intel
Employee
684 Views

Hi,

 

For your information, as I look into your design, I notice that there seems to be multiple LogicLock regions which are overlapping with each other. I have removed all the regions and then create a new region with only 210 DSP block. The design can pass Fitter and result in 210 DSPs utilization. I have sent the updated QAR to you through email.

 

Regarding your Q2 and Q3 which is specific to LogicLock, I believe we might need the SW expert to further assist. Since I am unable to duplicate a case (not sure why, but believe due to access issue), would you mind to open a new case in this Forum with title specific to LogicLock. You may then let me know the case number so that I could help to expedite the routing.

 

Please let me know if there is any concern. Thank you.

 

Best regards,

Chee Pin

0 Kudos
Yogesh
Novice
684 Views

Hi Chee Pin,

I have opened the new case as 'specific to logic lock'. Its case number is as below:

04620545

 

Thankyou,

regards,

Yogesh

0 Kudos
CheePin_C_Intel
Employee
684 Views

Thanks for your help. For your information, I have notified the SW team on the new case to expedite the routing.

0 Kudos
Reply