Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16624 Discussions

Single multiplier takes up a whole DSP block for

Honored Contributor II



I'm using a Cyclone V SOC FPGA. 


Currently my design has 8 multipliers (which I coded in VHDL instead of instantiating). 

The inputs to the multipliers are 12 and 16 bits wide. 


According to this document: 


I expected the tool to pack 2 multipliers into a single DSP block - so that for 8 multipliers only 4 DSP blocks shall be consumed. 

Unfortunately - the compilation report shows that 8 DSP blocks are consumed (one per each multiplier). 

I tried to change the synthesis behavior to area driven - but nothing changed. 


Any idea what can cause such behavior ?
0 Kudos
3 Replies
Honored Contributor II

Can you show the VHDL code? Have you tried instantiating the multipliers from the IP Catalog instead of using code inference?

0 Kudos
Honored Contributor II


--- Quote Start ---  

Have you tried instantiating the multipliers from the IP Catalog instead of using code inference? 

--- Quote End ---  



I preferred pure HDL since I want to parameterize the multiplier with generics during compilation. 


entity multiplier is generic ( LOCATION_FIRST_RESULT_BIT : natural ; WIDTH_A : positive ; WIDTH_B : positive ; WIDTH_RESULT : positive ) ; port ( IN_A : in std_logic_vector ( WIDTH_A - 1 downto 0 ) ; IN_B : in std_logic_vector ( WIDTH_B - 1 downto 0 ) ; OUT_RESULT : out std_logic_vector ( WIDTH_RESULT - 1 downto 0 ) ) ; end entity multiplier ; architecture rtl_multiplier of multiplier is signal signed_multiplier_result : signed ( WIDTH_B + WIDTH_A - 1 downto 0 ) ; begin signed_multiplier_result <= signed ( IN_B ) * signed ( IN_A ) ; OUT_RESULT <= std_logic_vector ( signed_multiplier_result ( WIDTH_RESULT + LOCATION_FIRST_RESULT_BIT - 1 downto LOCATION_FIRST_RESULT_BIT ) ) ; end architecture rtl_multiplier ;
0 Kudos
Honored Contributor II

According to my observation, Quartus uses all available DSP block before it starts packing multipliers. See same-topic discussion at Edaboard 


I managed to fill up all 25 DSP blocks of Cyclone5 A2 with this test 


library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.NUMERIC_STD.ALL; entity test1 is generic( n : integer := 50; w : integer := 18 ); port( clk : in STD_LOGIC; sel : in integer range 0 to n-1; ax : in signed(w-1 downto 0); bx : in signed(w-1 downto 0); cx : out SIGNED(2*w-1 downto 0) ); end test1; architecture rtl of test1 is type ar18 is array(0 to n-1) of signed(w-1 downto 0); type ar36 is array(0 to n-1) of signed(2*w-1 downto 0); signal ar : ar18; signal br : ar18; signal cr : ar36; begin process (clk) begin if rising_edge(clk) then for i in 0 to n-1 loop cr(i) <= ar(i)*br(i); if i = sel then ar(i) <= ax; br(i) <= bx; cx <= cr(i); end if; end loop; end if; end process; end rtl;
0 Kudos