Hello,
I'm using a Cyclone V SOC FPGA. Currently my design has 8 multipliers (which I coded in VHDL instead of instantiating). The inputs to the multipliers are 12 and 16 bits wide. According to this document: https://www.altera.com/content/dam/altera-www/global/en_us/pdfs/literature/wp/wp-01159-arriav-cyclon... I expected the tool to pack 2 multipliers into a single DSP block - so that for 8 multipliers only 4 DSP blocks shall be consumed. Unfortunately - the compilation report shows that 8 DSP blocks are consumed (one per each multiplier). I tried to change the synthesis behavior to area driven - but nothing changed. Any idea what can cause such behavior ?Link Copied
Can you show the VHDL code? Have you tried instantiating the multipliers from the IP Catalog instead of using code inference?
--- Quote Start --- Have you tried instantiating the multipliers from the IP Catalog instead of using code inference? --- Quote End --- No. I preferred pure HDL since I want to parameterize the multiplier with generics during compilation.
entity multiplier is
generic
(
LOCATION_FIRST_RESULT_BIT : natural ;
WIDTH_A : positive ;
WIDTH_B : positive ;
WIDTH_RESULT : positive
) ;
port
(
IN_A : in std_logic_vector ( WIDTH_A - 1 downto 0 ) ;
IN_B : in std_logic_vector ( WIDTH_B - 1 downto 0 ) ;
OUT_RESULT : out std_logic_vector ( WIDTH_RESULT - 1 downto 0 )
) ;
end entity multiplier ;
architecture rtl_multiplier of multiplier is
signal signed_multiplier_result : signed ( WIDTH_B + WIDTH_A - 1 downto 0 ) ;
begin
signed_multiplier_result <= signed ( IN_B ) * signed ( IN_A ) ;
OUT_RESULT <= std_logic_vector ( signed_multiplier_result ( WIDTH_RESULT + LOCATION_FIRST_RESULT_BIT - 1 downto LOCATION_FIRST_RESULT_BIT ) ) ;
end architecture rtl_multiplier ;
According to my observation, Quartus uses all available DSP block before it starts packing multipliers. See same-topic discussion at Edaboard
http://www.edaboard.com/showthread.php?t=368754 I managed to fill up all 25 DSP blocks of Cyclone5 A2 with this testlibrary IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
entity test1 is
generic(
n : integer := 50;
w : integer := 18
);
port(
clk : in STD_LOGIC;
sel : in integer range 0 to n-1;
ax : in signed(w-1 downto 0);
bx : in signed(w-1 downto 0);
cx : out SIGNED(2*w-1 downto 0)
);
end test1;
architecture rtl of test1 is
type ar18 is array(0 to n-1) of signed(w-1 downto 0);
type ar36 is array(0 to n-1) of signed(2*w-1 downto 0);
signal ar : ar18;
signal br : ar18;
signal cr : ar36;
begin
process (clk)
begin
if rising_edge(clk) then
for i in 0 to n-1 loop
cr(i) <= ar(i)*br(i);
if i = sel then
ar(i) <= ax;
br(i) <= bx;
cx <= cr(i);
end if;
end loop;
end if;
end process;
end rtl
;
For more complete information about compiler optimizations, see our Optimization Notice.