Hello,I'm using a Cyclone V SOC FPGA. Currently my design has 8 multipliers (which I coded in VHDL instead of instantiating). The inputs to the multipliers are 12 and 16 bits wide. According to this document: https://www.altera.com/content/dam/altera-www/global/en_us/pdfs/literature/wp/wp-01159-arriav-cyclon... I expected the tool to pack 2 multipliers into a single DSP block - so that for 8 multipliers only 4 DSP blocks shall be consumed. Unfortunately - the compilation report shows that 8 DSP blocks are consumed (one per each multiplier). I tried to change the synthesis behavior to area driven - but nothing changed. Any idea what can cause such behavior ?
--- Quote Start --- Have you tried instantiating the multipliers from the IP Catalog instead of using code inference? --- Quote End --- No. I preferred pure HDL since I want to parameterize the multiplier with generics during compilation.
entity multiplier is generic ( LOCATION_FIRST_RESULT_BIT : natural ; WIDTH_A : positive ; WIDTH_B : positive ; WIDTH_RESULT : positive ) ; port ( IN_A : in std_logic_vector ( WIDTH_A - 1 downto 0 ) ; IN_B : in std_logic_vector ( WIDTH_B - 1 downto 0 ) ; OUT_RESULT : out std_logic_vector ( WIDTH_RESULT - 1 downto 0 ) ) ; end entity multiplier ; architecture rtl_multiplier of multiplier is signal signed_multiplier_result : signed ( WIDTH_B + WIDTH_A - 1 downto 0 ) ; begin signed_multiplier_result <= signed ( IN_B ) * signed ( IN_A ) ; OUT_RESULT <= std_logic_vector ( signed_multiplier_result ( WIDTH_RESULT + LOCATION_FIRST_RESULT_BIT - 1 downto LOCATION_FIRST_RESULT_BIT ) ) ; end architecture rtl_multiplier ;
According to my observation, Quartus uses all available DSP block before it starts packing multipliers. See same-topic discussion at Edaboardhttp://www.edaboard.com/showthread.php?t=368754 I managed to fill up all 25 DSP blocks of Cyclone5 A2 with this test
library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.NUMERIC_STD.ALL; entity test1 is generic( n : integer := 50; w : integer := 18 ); port( clk : in STD_LOGIC; sel : in integer range 0 to n-1; ax : in signed(w-1 downto 0); bx : in signed(w-1 downto 0); cx : out SIGNED(2*w-1 downto 0) ); end test1; architecture rtl of test1 is type ar18 is array(0 to n-1) of signed(w-1 downto 0); type ar36 is array(0 to n-1) of signed(2*w-1 downto 0); signal ar : ar18; signal br : ar18; signal cr : ar36; begin process (clk) begin if rising_edge(clk) then for i in 0 to n-1 loop cr(i) <= ar(i)*br(i); if i = sel then ar(i) <= ax; br(i) <= bx; cx <= cr(i); end if; end loop; end if; end process; end rtl;