Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
21600 Discussions

Infering DSP block in Arria V

Altera_Forum
Honored Contributor II
4,991 Views

Hello 

 

I'm trying to infer DSP blocks in my Arria V design as a replacement for my former ALTMULT_ADD instance in Stratix III. I notice that the input registers are placed outside (i.e. the DSP block's input register bank is not used) and the same happens for the output registers (which should go into the DSP block's output register). For the VHDL code I tried to keep close to the one in qts_Qii51007/Example 13-7, but I need some more flexibility. The entity was completely within the Stratix III ALTMULT_ADD. 

 

My target is to use the input and output register banks of the DSP block. What am doing wrong? How can I solve this? 

 

Regards, 

Peter
0 Kudos
9 Replies
Altera_Forum
Honored Contributor II
1,210 Views

try this on i/o registers: 

 

attribute altera_attribute : string; 

attribute altera_attribute of my_reg : signal is "-name auto_packed_registers normal" 

 

BTW, I assume your aware that delay only applies to simulation(not synthesis)
0 Kudos
Altera_Forum
Honored Contributor II
1,210 Views

I tried to work with that (although it is already set to "auto" on global level), without success. I'm not able to bring the input registers into that DSP block. Furthermore, not even the accumulator is put in there. It's located outside. 

 

Going back to the minimal version from the coding style manual (exactly using Qii51007/Example 13-7, but changed signal names for I/F purpose), I see the same behaviour. So it cannot be my enables, loads, etc. but muss be a different thing. I synthesized with timings at the edge or relaxed timing, no impact. 

 

Any idea? Is there a different coding style required for Arria V or something else I missed? 

 

P.S: The PROP_DELAY is a global constant (time := 1 ns) used for simulation purpose (had no influence anyway, as I took this out, too).
0 Kudos
Altera_Forum
Honored Contributor II
1,210 Views

try this direct coding style: 

 

process(rst,clk) begin if rst = '1' then ... elsif rising_edge(clk) then A_reg <= A; B_reg <= B; prod <= A_reg * B_reg; sum <= sum + prod; end if; end process; out <= sum;
0 Kudos
Altera_Forum
Honored Contributor II
1,210 Views

The only difference (besides the signals names) I can make out in your code sequence is that you replace [clk'event and clk = '1'] (which was proposed by Altera documentation) by [rising_edge(clk)]. I tried that to, but it ends up with the same result. 

 

As a next step, I placed EXACTLY the mentioned Altera example in my test design to look how it deals with that. Now I found out that, if using the same input signal source for more than one DSP element, the register is shared an located outside. When making sure each MAC gets individual signals, a_reg and b_reg will be placed inside the DSP block as expected (for both, my code and the Altera example). 

 

However, even when using exactly the Altera Example, the adder is still placed outside the DSP block in nearby LAB cell (even a completely isolated one, having all I/Os directly connected to FPGA I/Os). I have no problem in close the timing so far, and as it is a test design on the StarterKit, the FPGA is far from congested.  

Why worry further? It's because I get an uneasy feeling if I can't make the tool putting a simple MAC entirely into the DSP block which was designed to be used for that.
0 Kudos
Altera_Forum
Honored Contributor II
1,210 Views

According post-fitter netlist, the Altera template unsigned_multiply_accumulate will be completely implemented in DSP block. 

 

-- Quartus II VHDL Template -- Unsigned Multiply-Accumulate library ieee; use ieee.std_logic_1164.all; use ieee.numeric_std.all; entity unsigned_multiply_accumulate is generic ( DATA_WIDTH : natural := 8 ); port ( a : in unsigned ((DATA_WIDTH-1) downto 0); b : in unsigned ((DATA_WIDTH-1) downto 0); clk : in std_logic; sload : in std_logic; accum_out : out unsigned ((2*DATA_WIDTH-1) downto 0) ); end entity; architecture rtl of unsigned_multiply_accumulate is -- Declare registers for intermediate values signal a_reg : unsigned ((DATA_WIDTH-1) downto 0); signal b_reg : unsigned ((DATA_WIDTH-1) downto 0); signal sload_reg : std_logic; signal mult_reg : unsigned ((2*DATA_WIDTH-1) downto 0); signal adder_out : unsigned ((2*DATA_WIDTH-1) downto 0); signal old_result : unsigned ((2*DATA_WIDTH-1) downto 0); begin mult_reg <= a_reg * b_reg; process (adder_out, sload_reg) begin if (sload_reg = '1') then -- Clear the accumulated data old_result <= (others => '0'); else old_result <= adder_out; end if; end process; process (clk) begin if (rising_edge(clk)) then a_reg <= a; b_reg <= b; sload_reg <= sload; -- Store accumulation result in a register adder_out <= old_result + mult_reg; end if; end process; -- Output accumulation result accum_out <= adder_out; end rtl;
0 Kudos
Altera_Forum
Honored Contributor II
1,210 Views

I used Arria GX (I don't have Arria v installed yet) and all logic gets inserted in dsp block. 

of course the input registers should not be shared and input/output registers should not be assigned to io blocks.
0 Kudos
Altera_Forum
Honored Contributor II
1,210 Views

I put FvM's "unsigned_multiply_accumulate" unchanged into my design, added an additional register to the output datapath (to be sure to have no influence of the subsequent datapath elements), and let it run. Same result: The adder (accumulator) is outside the DSP block. 

 

I noticed that the signal path goes directly from a_reg/b_reg to adder_out, but besides a longer path, I'd not expect influence on the mapping (no registering of multiplication output, although the signal is called 'mult_reg'). 

 

Is there any other setting I might have missed that prevents integrating the adder into the DSP block?
0 Kudos
Altera_Forum
Honored Contributor II
1,210 Views

I think it's time now to move on to a service request to the Altera guys: I notice that even a MegaWizard generated "altera_mult_add" instance places its accumulator outside the DSP block, so I assume it can't be my coding style that causes the problems. 

 

Thanks anyways to you two guys giving hints to me!
0 Kudos
Altera_Forum
Honored Contributor II
1,210 Views

And it was the coding style: 

 

Replacing 

"sum_next <= p0 + sum_cur;" 

by  

"sum_next <= sum_cur + p0;"  

results in placing the acc inside the DSP element.
0 Kudos
Reply