Re:Inference of DSP block with accumulator does not work

Geert · ‎01-07-2021

I'm trying to infer a DSP block with accumulator for Arria 10, using Quartus Prime 17.0.

The high-level functionality I need is:

if rising_edge(clk) then
  if sload = '1' then
      out <= a * b;
  else
      out <= out + a * b
  end if;
end if;

I started from the template provided in Quartus: VHDL/Full Designs/Arithmetic/Signed Multiply-Accumulate, but this does not work: it uses a DSP block for the multiplier, but it does not use the accumulator function.

Instead, for small word sizes, it creates a loop back path via the second multiplier inputs to bring the output back to the adder.

When I increase the accumulator width to 48, the accumulator is implemented entirely in LUTs

Any ideas how to force use of the DSP block accumulator (preferably using inference) ?

Thanks, Geert

SengKok_L_Intel · ‎01-07-2021

Hi Greet,

If you are using an independent multiplier, could you please increase the input data width to >19 bits? If using lower than 18 bits, it will not fit into the hard accumulator.

Regards -SK Lim

Geert · ‎01-08-2021

Hi,

Thanks for your answer.

I have tried with multiple different input sizes and indeed, 2x 16-bit multiplier inputs fails (accumulator is implemented in LUTs) , while with a 16-bit + a 24-bit input, I got the expected implementation (hard accumulator).

Could you explain what the exact criterion is? Is it the multiplier result that needs to have a minimal width, or is it just sufficient that one of the multiplier inputs is > 18 bits?

regards,

Geert

SengKok_L_Intel · ‎01-08-2021

Hi,

I use the template below and change the width to 27, and it can fit the accumulator into the hard block. Perhaps, you may try to use the ALTERA_MULT_ADD to configure the accumulator to the mode that you needed.

Please refer to Table 25 for the accumulator function:

https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/arria-10/a10_memory.pdf

// Quartus Prime Verilog Template

// Unsigned multiply-accumulate

module unsigned_multiply_accumulate

#(parameter WIDTH=27)

(

input clk, aclr, clken, sload,

input [WIDTH-1:0] dataa,

input [WIDTH-1:0] datab,

output reg [2*WIDTH-1:0] adder_out

);

// Declare registers and wires

reg [WIDTH-1:0] dataa_reg, datab_reg;

reg sload_reg;

reg [2*WIDTH-1:0] old_result;

wire [2*WIDTH-1:0] multa;

// Store the results of the operations on the current data

assign multa = dataa_reg * datab_reg;

// Store the value of the accumulation (or clear it)

always @ (adder_out, sload_reg)

begin

if (sload_reg)

old_result <= 0;

else

old_result <= adder_out;

end

// Clear or update data, as appropriate

always @ (posedge clk or posedge aclr)

begin

if (aclr)

begin

dataa_reg <= 0;

datab_reg <= 0;

sload_reg <= 0;

adder_out <= 0;

end

else if (clken)

begin

dataa_reg <= dataa;

datab_reg <= datab;

sload_reg <= sload;

adder_out <= old_result + multa;

end

endmodule

SengKok_L_Intel · ‎01-26-2021

If further support is needed in this thread, please post a response within 15 days. After 15 days, this thread will be transitioned to community support. The community users will be able to help you with your follow-up questions.