Solved: FIR filter implementation doesn't use Cyclone DSP blocks

Oliver_I_Sedlacek · ‎09-18-2023

I've coded up an FIR filter in VHDL that uses up to 10 clocks to multiply sample pairs by filter coefficients and accumulate the results. This seems to work but Quartus implements this all as logic, which isn't going to fit when I need 50+ filters. Ive had a look at https://community.intel.com/t5/FPGA-Wiki/FIR-Filter-Design-in-Arria-V-Cyclone-V-DSP-Block-Using-VHDL/ta-p/735970 but I'm confused by the lack of background context that I'm supposed to know.

Does anyone have any guidance on how to use the DSP blocks as an FIR filter using sequential accumulation?

My code looks like this:

entity FIRLPF0_20 is
   generic (IO_WIDTH : integer := 15; COEFF_WIDTH : integer := 16);
   port (
      clk:        in  Std_logic;			-- Expecting this to be 100 MHz
      ph:         in  Std_logic_vector(9 downto 0);       -- Expecting this to be one hot of 10
		din:        in  signed(IO_WIDTH - 1 downto 0);
      dout:       out signed(IO_WIDTH - 1 downto 0)
   );
end;

-- Pipeline samples and run MAC over them
architecture DFLT of FIRLPF0_20 is
   -- when you add two samples you need an extra bit for the result
   constant PAIR_WIDTH : integer := IO_WIDTH + 1;
   type t_data_pipe    is array (0 to 12) of signed(IO_WIDTH - 1 downto 0);
   signal p_data     : t_data_pipe;
  
   type FAState_et is (Idle, S0, S1, S2, S3, S4, S5, S6);
   signal fas: FAState_et;

   -- define the filter coefficients a Q15 numbers
	constant  K0_12: signed(COEFF_WIDTH - 1 downto 0) := to_signed(1040, COEFF_WIDTH);
	constant  K1_11: signed(COEFF_WIDTH - 1 downto 0) := to_signed(-1962, COEFF_WIDTH);
	constant  K2_10: signed(COEFF_WIDTH - 1 downto 0) := to_signed(-1723, COEFF_WIDTH);
	constant  K3_9:  signed(COEFF_WIDTH - 1 downto 0) := to_signed(325, COEFF_WIDTH);
	constant  K4_8:  signed(COEFF_WIDTH - 1 downto 0) := to_signed(4438, COEFF_WIDTH);
   constant  K5_7:  signed(COEFF_WIDTH - 1 downto 0) := to_signed(8724, COEFF_WIDTH);
   constant  K6:    signed(COEFF_WIDTH - 1 downto 0) := to_signed(10556, COEFF_WIDTH);

	signal samppair:  Signed(PAIR_WIDTH - 1 downto 0);
	signal accum:     Signed(COEFF_WIDTH + PAIR_WIDTH - 1 downto 0);
	signal Kprod:     Signed(COEFF_WIDTH + PAIR_WIDTH - 1 downto 0);

	
begin
   -- Chunk the samples through the pipeline
   process(clk)
	Begin
      if rising_edge( clk ) then
		   if ('1' = ph(9)) then
            p_data <= signed(din)&p_data(0 to p_data'length-2);
			end if;
		end if;	
	end process; 

   -- Do the MACs when there is fresh data
   process(clk)
	Begin
      if rising_edge( clk ) then
         case fas is
         when Idle =>
			   -- on exit samppair is [5] + [7], Kprod is [6]
				-- Don't touch the accum as we want it available for output
			   Kprod <= K6 * resize(p_data(6), 16);
				samppair <= resize(p_data(5), PAIR_WIDTH) + resize(p_data(7), PAIR_WIDTH);
				if '1' = ph(0) then
				   fas <= S0;
				end if;
         when S0 =>
			   -- on exit samppair is [4] + [8], Kprod is [5] + [7]
			   Kprod <= K5_7 * samppair;
				accum <= Kprod;
				samppair <= resize(p_data(4), PAIR_WIDTH) + resize(p_data(8),PAIR_WIDTH);
				fas <= S1;
			when S1 =>
			   Kprod <= K4_8 * samppair;
				accum <= Kprod + accum;
				samppair <= resize(p_data(3), PAIR_WIDTH) + resize(p_data(9),PAIR_WIDTH);
				fas <= S2;
			when S2 =>
			   Kprod <= K3_9 * samppair;
				accum <= Kprod + accum;
				samppair <= resize(p_data(2), PAIR_WIDTH) + resize(p_data(10),PAIR_WIDTH);
				fas <= S3;
			when S3 =>
			   Kprod <= K2_10 * samppair;
				accum <= Kprod + accum;
				samppair <= resize(p_data(1), PAIR_WIDTH) + resize(p_data(11),PAIR_WIDTH);
				fas <= S4;
			when S4 =>
			   Kprod <= K1_11 * samppair;
				accum <= Kprod + accum;
				samppair <= resize(p_data(0), PAIR_WIDTH) + resize(p_data(12),PAIR_WIDTH);
				fas <= S5;
			when S5 =>
			   Kprod <= K0_12 * samppair;
				accum <= Kprod + accum;
				fas <= S6;
			when S6 =>
				accum <= Kprod + accum;
				fas <= Idle;
			end case;
		end if;	
	end process; 
	
	dout <= signed(accum(29 downto 15));
end;

Kshitij_Intel · ‎09-21-2023

Hi,

Please refer this coding style to infer DSP block.

dsp-block-vhdl-inferring-a5c5-figure-7.jpg (748×397) (intel.com)

dsp-block-vhdl-inferring-a5c5-figure-8.jpg (809×390) (intel.com)

Thank you,

Kshitij Goel

View solution in original post

Kshitij_Intel · ‎09-21-2023

Hi,

Please refer this coding style to infer DSP block.

dsp-block-vhdl-inferring-a5c5-figure-7.jpg (748×397) (intel.com)

dsp-block-vhdl-inferring-a5c5-figure-8.jpg (809×390) (intel.com)

Thank you,

Kshitij Goel

Oliver_I_Sedlacek · ‎09-21-2023

Thanks, that's pointing me in the right direction.

Kshitij_Intel · ‎09-25-2023

Hi,

I’m glad that your question has been addressed, I now transition this thread to community support. If you have a new question. Please login to ‘ https://supporttickets.intel.com’, view details of the desire request, and post a response within the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support. The community users will be able to help you on your follow-up questions.

Thank you,

Kshitij Goel

FIR filter implementation doesn't use Cyclone DSP blocks

Design Entry|Synthesis|Compilation