- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello
I'm trying to infer DSP blocks in my Arria V design as a replacement for my former ALTMULT_ADD instance in Stratix III. I notice that the input registers are placed outside (i.e. the DSP block's input register bank is not used) and the same happens for the output registers (which should go into the DSP block's output register). For the VHDL code I tried to keep close to the one in qts_Qii51007/Example 13-7, but I need some more flexibility. The entity was completely within the Stratix III ALTMULT_ADD. My target is to use the input and output register banks of the DSP block. What am doing wrong? How can I solve this? Regards, PeterLink Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
try this on i/o registers:
attribute altera_attribute : string; attribute altera_attribute of my_reg : signal is "-name auto_packed_registers normal" BTW, I assume your aware that delay only applies to simulation(not synthesis)- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried to work with that (although it is already set to "auto" on global level), without success. I'm not able to bring the input registers into that DSP block. Furthermore, not even the accumulator is put in there. It's located outside.
Going back to the minimal version from the coding style manual (exactly using Qii51007/Example 13-7, but changed signal names for I/F purpose), I see the same behaviour. So it cannot be my enables, loads, etc. but muss be a different thing. I synthesized with timings at the edge or relaxed timing, no impact. Any idea? Is there a different coding style required for Arria V or something else I missed? P.S: The PROP_DELAY is a global constant (time := 1 ns) used for simulation purpose (had no influence anyway, as I took this out, too).- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
try this direct coding style:
process(rst,clk)
begin
if rst = '1' then
...
elsif rising_edge(clk) then
A_reg <= A;
B_reg <= B;
prod <= A_reg * B_reg;
sum <= sum + prod;
end if;
end process;
out <= sum;
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The only difference (besides the signals names) I can make out in your code sequence is that you replace [clk'event and clk = '1'] (which was proposed by Altera documentation) by [rising_edge(clk)]. I tried that to, but it ends up with the same result.
As a next step, I placed EXACTLY the mentioned Altera example in my test design to look how it deals with that. Now I found out that, if using the same input signal source for more than one DSP element, the register is shared an located outside. When making sure each MAC gets individual signals, a_reg and b_reg will be placed inside the DSP block as expected (for both, my code and the Altera example). However, even when using exactly the Altera Example, the adder is still placed outside the DSP block in nearby LAB cell (even a completely isolated one, having all I/Os directly connected to FPGA I/Os). I have no problem in close the timing so far, and as it is a test design on the StarterKit, the FPGA is far from congested. Why worry further? It's because I get an uneasy feeling if I can't make the tool putting a simple MAC entirely into the DSP block which was designed to be used for that.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
According post-fitter netlist, the Altera template unsigned_multiply_accumulate will be completely implemented in DSP block.
-- Quartus II VHDL Template
-- Unsigned Multiply-Accumulate
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity unsigned_multiply_accumulate is
generic
(
DATA_WIDTH : natural := 8
);
port
(
a : in unsigned ((DATA_WIDTH-1) downto 0);
b : in unsigned ((DATA_WIDTH-1) downto 0);
clk : in std_logic;
sload : in std_logic;
accum_out : out unsigned ((2*DATA_WIDTH-1) downto 0)
);
end entity;
architecture rtl of unsigned_multiply_accumulate is
-- Declare registers for intermediate values
signal a_reg : unsigned ((DATA_WIDTH-1) downto 0);
signal b_reg : unsigned ((DATA_WIDTH-1) downto 0);
signal sload_reg : std_logic;
signal mult_reg : unsigned ((2*DATA_WIDTH-1) downto 0);
signal adder_out : unsigned ((2*DATA_WIDTH-1) downto 0);
signal old_result : unsigned ((2*DATA_WIDTH-1) downto 0);
begin
mult_reg <= a_reg * b_reg;
process (adder_out, sload_reg)
begin
if (sload_reg = '1') then
-- Clear the accumulated data
old_result <= (others => '0');
else
old_result <= adder_out;
end if;
end process;
process (clk)
begin
if (rising_edge(clk)) then
a_reg <= a;
b_reg <= b;
sload_reg <= sload;
-- Store accumulation result in a register
adder_out <= old_result + mult_reg;
end if;
end process;
-- Output accumulation result
accum_out <= adder_out;
end rtl;
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I used Arria GX (I don't have Arria v installed yet) and all logic gets inserted in dsp block.
of course the input registers should not be shared and input/output registers should not be assigned to io blocks.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I put FvM's "unsigned_multiply_accumulate" unchanged into my design, added an additional register to the output datapath (to be sure to have no influence of the subsequent datapath elements), and let it run. Same result: The adder (accumulator) is outside the DSP block.
I noticed that the signal path goes directly from a_reg/b_reg to adder_out, but besides a longer path, I'd not expect influence on the mapping (no registering of multiplication output, although the signal is called 'mult_reg'). Is there any other setting I might have missed that prevents integrating the adder into the DSP block?- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think it's time now to move on to a service request to the Altera guys: I notice that even a MegaWizard generated "altera_mult_add" instance places its accumulator outside the DSP block, so I assume it can't be my coding style that causes the problems.
Thanks anyways to you two guys giving hints to me!- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
And it was the coding style:
Replacing "sum_next <= p0 + sum_cur;" by "sum_next <= sum_cur + p0;" results in placing the acc inside the DSP element.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page