Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16603 Discussions

Inferred RAM: Pass-through logic generated despite no_rw_check

Altera_Forum
Honored Contributor II
4,073 Views

Dear all, 

 

I'm a hobbyist currently learning VHDL. My learning vehicle is a self-developed RISC-V CPU that that I develop in the Lite edition of Quartus Prime 15.1. The CPU is working fine already, but I'd like to get some rough edges smoothed out and understand what's going on. 

 

My CPU has a unit that contains the CPU's registers. Quartus nicely infers RAM for the register file - however, it generates some pass-through logic for the particular read-during-write behavior it is seeing: 

 

 

--- Quote Start ---  

Warning (276020): Inferred RAM node "cpu_toplevel:cpu_instance|registers:reg_instance|regs_rtl_0" from synchronous design logic. Pass-through logic has been added to match the read-during-write behavior of the original design. 

 

--- Quote End ---  

 

 

The code: 

 

library IEEE; use IEEE.std_logic_1164.ALL; use IEEE.NUMERIC_STD.ALL; library work; use work.constants.all; entity registers is Port( I_clk: in std_logic; I_en: in std_logic; I_op: in std_logic_vector(1 downto 0); I_selS1: in std_logic_vector(4 downto 0); I_selS2: in std_logic_vector(4 downto 0); I_selD: in std_logic_vector(4 downto 0); I_dataAlu: in std_logic_vector(XLEN-1 downto 0); I_dataMem: in std_logic_vector(XLEN-1 downto 0); O_dataS1: out std_logic_vector(XLEN-1 downto 0); O_dataS2: out std_logic_vector(XLEN-1 downto 0) ); end registers; architecture Behavioral of registers is type store_t is array(1 to 31) of std_logic_vector(XLEN-1 downto 0); signal regs: store_t := (others => X"00000000"); attribute ramstyle : string; attribute ramstyle of regs : signal is "no_rw_check"; -- why does Quartus still add pass-through logic? begin process(I_clk) begin if rising_edge(I_clk) and I_en = '1' then -- TODO: find out why synthesis sees read-during-write behavior case I_op is when REGOP_READ => if I_selS1 = R0 then O_dataS1 <= X"00000000"; else O_dataS1 <= regs(to_integer(unsigned(I_selS1))); end if; if I_selS2 = R0 then O_dataS2 <= X"00000000"; else O_dataS2 <= regs(to_integer(unsigned(I_selS2))); end if; when REGOP_WRITE_ALU => if I_selD /= R0 then regs(to_integer(unsigned(I_selD))) <= I_dataAlu; end if; when REGOP_WRITE_MEM => if I_selD /= R0 then regs(to_integer(unsigned(I_selD))) <= I_dataMem; end if; when others => null; end case; end if; end process; end Behavioral;  

 

(XLEN is a constant that denotes architecture bit width, in this case the value is 32. Register 0 is always zero, thus there are registers 1 to 31.) 

 

The synthesized design works fine in both simulation and on FPGA (Cyclone IV on a DE0 Nano board), but I currently do not understand following issues: 

 

  • Why does synthesis see read-during-write behavior? As far as I can see (and as far as is intended) there will either be two read accesses (when reading register values) or one write access (when storing results from the ALU or from memory) - but not both in the same cycle. I assume that I misunderstand VHDL semantics in this regard.  

  • The pass-through logic is generated despite the "no_rw_check" ramstyle attribute. My understanding is that this attribute is supposed to tell synthesis not to synthesize additional logic to implement the design's read-during-write behavior. The attribute itself seems to be attached properly, for instance I can successfully tell synthesis to generate the registers in logic instead of utilizing ram blocks.  

 

 

I would greatly appreciate hints to improve my understanding of what's going on there. 

 

 

Best regards, 

 

Maik
0 Kudos
14 Replies
Altera_Forum
Honored Contributor II
1,992 Views

You should observe your RTL diagram to find answer. 

if I were you: (brainstorm, suggestions) 

1)try out make only one assignment to ram.  

2)your read two addresses . Are both really needed in the same time? if not, you should read ram only once. 

3)it seems tool believes that iseld and isels1, isels2 could be the same. 

 

Using attribute for assignments implied from code sometimes bring another attribute . please check Analysis&Synthesis report for additional signal attribute that comes together.
0 Kudos
Altera_Forum
Honored Contributor II
1,992 Views

 

--- Quote Start ---  

You should observe your RTL diagram to find answer. 

if I were you: (brainstorm, suggestions) 

1)try out make only one assignment to ram.  

2)your read two addresses . Are both really needed in the same time? if not, you should read ram only once. 

3)it seems tool believes that iseld and isels1, isels2 could be the same. 

 

Using attribute for assignments implied from code sometimes bring another attribute . please check Analysis&Synthesis report for additional signal attribute that comes together. 

--- Quote End ---  

 

 

Thanks for your kind response. 

 

Only having one assignment to RAM does not seem to do the trick. Reading from two addresses is quite necessary to fetch data from two CPU registers and I'd assume that dual port RAM can handle that, although I think that reading twice from the same address may be troublesome. If desperate I can distribute the two reads over clock cycles at the expense of performance, but I'd prefer to avoid that. 

 

The register selection signals (selS1, selS2 and selD) can indeed assume the same values. In the following version I tried to make sure that 

  • only one read is specified in the case that the two read addresses are the same  

  • it should be clear that read and write are mutually exclusive in logic  

 

 

library IEEE; use IEEE.std_logic_1164.ALL; use IEEE.NUMERIC_STD.ALL; library work; use work.constants.all; entity registers is Port( I_clk: in std_logic; I_en: in std_logic; I_op: in std_logic_vector(1 downto 0); I_selS1: in std_logic_vector(4 downto 0); I_selS2: in std_logic_vector(4 downto 0); I_selD: in std_logic_vector(4 downto 0); I_dataAlu: in std_logic_vector(XLEN-1 downto 0); I_dataMem: in std_logic_vector(XLEN-1 downto 0); O_dataS1: out std_logic_vector(XLEN-1 downto 0); O_dataS2: out std_logic_vector(XLEN-1 downto 0) ); end registers; architecture Behavioral of registers is type store_t is array(1 to 31) of std_logic_vector(XLEN-1 downto 0); signal regs: store_t := (others => X"00000000"); attribute ramstyle : string; attribute ramstyle of regs : signal is "no_rw_check"; -- why does Quartus still add pass-through logic? begin process(I_clk) variable data: std_logic_vector(XLEN-1 downto 0); begin if rising_edge(I_clk) and I_en = '1' then -- TODO: find out why synthesis sees read-during-write behavior if I_op = REGOP_READ then -- we read, we don't write! if I_selS1 = I_selS2 then -- both selects address same location -- avoid reading same address twice if I_selS1 = R0 then data := X"00000000"; else data := regs(to_integer(unsigned(I_selS1))); end if; O_dataS1 <= data; O_dataS2 <= data; else -- two different locations selected for reading -- dual port memory should handle that just fine if I_selS1 = R0 then O_dataS1 <= X"00000000"; else O_dataS1 <= regs(to_integer(unsigned(I_selS1))); end if; if I_selS2 = R0 then O_dataS2 <= X"00000000"; else O_dataS2 <= regs(to_integer(unsigned(I_selS2))); end if; end if; elsif I_op = REGOP_WRITE_ALU then -- we write data from ALU (no read involved) if I_selD /= R0 then regs(to_integer(unsigned(I_selD))) <= I_dataAlu; end if; elsif I_op = REGOP_WRITE_MEM then -- we write data from memory (no read involved) if I_selD /= R0 then regs(to_integer(unsigned(I_selD))) <= I_dataMem; end if; end if; end if; end process; end Behavioral;  

 

This, however, does not seem to change the result regarding the pass-through logic. I still have no idea how read-during-write behavior could happen there. 

 

In the compilation log I see 

 

READ_DURING_WRITE_MODE_MIXED_PORTS OLD_DATA Untyped 36 READ_DURING_WRITE_MODE_PORT_A NEW_DATA_NO_NBE_READ Untyped 37 READ_DURING_WRITE_MODE_PORT_B NEW_DATA_NO_NBE_READ Untyped 38  

 

and I assume that the "NEW_DATA"-part is the reason for the pass-through. I don't know how this is determined, though.
0 Kudos
Altera_Forum
Honored Contributor II
1,992 Views

you need solve that iseld and isels signals different in write and read operation 

With case statement code is more clear. 

If processing is pure sequential you can fix with state machine
0 Kudos
Altera_Forum
Honored Contributor II
1,992 Views

You can split read and write in two if statement. So they are parallel. But in your code they are in one statement

0 Kudos
Altera_Forum
Honored Contributor II
1,992 Views

As you are doing the read in a clock process, I agree that your code shouldn't generate a memory with new read-during-write behaviour. But the code you posted here differs from the recommended altera hdl styles (https://documentation.altera.com/#/00030683-aa$nt00064438) and it's hard to predict how the synthesis tool will implement this. Sometimes it can do some really crazy stuff, and different stuff with different versions. I would recommend instead to make an entity that sticks as much as possible to the recommended style and put your own wrapper logic around it: (you can put both read/write accesses in the same process and avoid using a shared variable if you want though, it works with all the Quartus versions that I tried): 

library ieee; use ieee.std_logic_1164.all; entity true_dual_port_ram_single_clock is generic ( DATA_WIDTH : natural := 8; ADDR_WIDTH : natural := 6 ); port ( clk : in std_logic; addr_a : in natural range 0 to 2**ADDR_WIDTH - 1; addr_b : in natural range 0 to 2**ADDR_WIDTH - 1; data_a : in std_logic_vector((DATA_WIDTH-1) downto 0); data_b : in std_logic_vector((DATA_WIDTH-1) downto 0); we_a : in std_logic := '1'; we_b : in std_logic := '1'; q_a : out std_logic_vector((DATA_WIDTH -1) downto 0); q_b : out std_logic_vector((DATA_WIDTH -1) downto 0) ); end true_dual_port_ram_single_clock; architecture rtl of true_dual_port_ram_single_clock is -- Build a 2-D array type for the RAM subtype word_t is std_logic_vector((DATA_WIDTH-1) downto 0); type memory_t is array((2**ADDR_WIDTH - 1) downto 0) of word_t; -- Declare the RAM signal. shared variable ram : memory_t; begin process(clk) begin if(rising_edge(clk)) then -- Port A if(we_a = '1') then ram(addr_a) <= data_a; -- Read-during-write on the same port returns NEW data q_a <= data_a; else -- Read-during-write on the mixed port returns OLD data q_a <= ram(addr_a); end if; end if; end process; process(clk) begin if(rising_edge(clk)) then -- Port B if(we_b = '1') then ram(addr_b) := data_b; -- Read-during-write on the same port returns NEW data q_b <= data_b; else -- Read-during-write on the mixed port returns OLD data q_b <= ram(addr_b); end if; end if; end process; end rtl; 

I think the key is to have only two address signals, and have your own logic to connect them to I_selS1, I_selS2 or I_selD depending on the current operation. That way you ensure Quartus will recognize your code and implement it correctly. 

 

That said, I see that the recommended HDL in fact uses new read-during-write, and they say in the text that this is what most FPGA memories natively support. If you see it still generates extra logic around the memory block, you can try and move the read operation outside the if. But stick to two address signals to avoid confusing the synthesizer IMHO.
0 Kudos
Altera_Forum
Honored Contributor II
1,992 Views

Thank you both for your responses! 

 

I'm glad that the issue at hand apparently is merely caused by a coding style mismatch and not by a severe misunderstanding on my side regarding VHDL semantics. I will try to express the desired behavior in a way that conforms more to the recommended coding style. 

 

Best regards, 

 

Maik
0 Kudos
Altera_Forum
Honored Contributor II
1,992 Views

While it doesnt match the coding style in the guildlines, quartus is normally very good at recognising rams from code that doesnt quite match the guidelines. 

I would try the wrapper as suggested for now, but also raise a mysupport request to try and understand why it doesnt work from your original code, and raise an enhancement request if it turns out to be a fault/deficit in the synthesis engine.
0 Kudos
Altera_Forum
Honored Contributor II
1,992 Views

Thanks to the forum input I was now able to construct a solution that is recognized as not needing pass-through logic: 

 

library IEEE; use IEEE.std_logic_1164.ALL; use IEEE.NUMERIC_STD.ALL; library work; use work.constants.all; entity registers is Port( I_clk: in std_logic; I_en: in std_logic; I_op: in std_logic_vector(1 downto 0); I_selS1: in std_logic_vector(4 downto 0); I_selS2: in std_logic_vector(4 downto 0); I_selD: in std_logic_vector(4 downto 0); I_dataAlu: in std_logic_vector(XLEN-1 downto 0); I_dataMem: in std_logic_vector(XLEN-1 downto 0); O_dataS1: out std_logic_vector(XLEN-1 downto 0); O_dataS2: out std_logic_vector(XLEN-1 downto 0) ); end registers; architecture Behavioral of registers is type store_t is array(0 to 31) of std_logic_vector(XLEN-1 downto 0); signal regs: store_t := (others => X"00000000"); attribute ramstyle : string; attribute ramstyle of regs : signal is "no_rw_check"; begin ----------------------------------- -- first port of dual-port RAM -- used for one read or one write ----------------------------------- process(I_clk) variable write_enabled: boolean; variable data: std_logic_vector(XLEN-1 downto 0); begin if rising_edge(I_clk) and I_en = '1' then data := X"00000000"; -- by default assume read access write_enabled := false; -- determine details of write operations case I_op is when REGOP_WRITE_ALU => -- write to destination register write_enabled := true; if I_selD /= R0 then data := I_dataAlu; end if; when REGOP_WRITE_MEM => -- write to destination register write_enabled := true; if I_selD /= R0 then data := I_dataMem; end if; when others => null; end case; -- this is a pattern that Quartus RAM synthesis understands -- as *not* being read-during-write (with no_rw_check attribute) if write_enabled then regs(to_integer(unsigned(I_selD))) <= data; else O_dataS1 <= regs(to_integer(unsigned(I_selS1))); end if; end if; end process; -------------------------------- -- second port of dual port RAM -- used only for one read -------------------------------- process(I_clk) begin if rising_edge(I_clk) and I_en = '1' then if I_op = REGOP_READ then O_dataS2 <= regs(to_integer(unsigned(I_selS2))); end if; end if; end process; end Behavioral;  

 

Will Altera actually process service requests from private individuals? The profile structure of the support pages really has that "we like companies!" vibe ;)
0 Kudos
Altera_Forum
Honored Contributor II
1,992 Views

 

--- Quote Start ---  

 

Will Altera actually process service requests from private individuals? The profile structure of the support pages really has that "we like companies!" vibe ;) 

--- Quote End ---  

 

 

They should, but higher paying companies usually get priority on requests. 

This is as someone who has always requested as part of a company..
0 Kudos
Altera_Forum
Honored Contributor II
1,992 Views

Well, I can always just try my luck ;) 

 

Btw, it turns out that in my case the second process is not needed, instead I can have the two read accesses in the same process: 

 

 

-- this is a pattern that Quartus RAM synthesis understands -- as *not* being read-during-write (with no_rw_check attribute) if write_enabled then regs(to_integer(unsigned(I_selD))) <= data; else O_dataS1 <= regs(to_integer(unsigned(I_selS1))); O_dataS2 <= regs(to_integer(unsigned(I_selS2))); end if;  

 

What *is* important: There apparently can be only one assignment to the output signals for RAM contents. If one includes, e.g., something like 

 

if I_selS1 = R0 then O_dataS1 <= X"00000000"; end if;  

 

somewhere (to ensure that reads on CPU register 0 always returns zero) Quartus will include the pass-through logic, no matter if no_rw_check is there or not. Instead I currently live with address 0 actually being a memory location, initializing it with zero and never allowing anything but zero to be written there.
0 Kudos
Altera_Forum
Honored Contributor II
1,992 Views

wonderful. 

you can put if statment outside case statement. look at changes. So find the structure that your await.
0 Kudos
Altera_Forum
Honored Contributor II
1,992 Views

 

--- Quote Start ---  

Well, I can always just try my luck ;) 

 

Btw, it turns out that in my case the second process is not needed, instead I can have the two read accesses in the same process: 

 

 

-- this is a pattern that Quartus RAM synthesis understands -- as *not* being read-during-write (with no_rw_check attribute) if write_enabled then regs(to_integer(unsigned(I_selD))) <= data; else O_dataS1 <= regs(to_integer(unsigned(I_selS1))); O_dataS2 <= regs(to_integer(unsigned(I_selS2))); end if;  

 

What *is* important: There apparently can be only one assignment to the output signals for RAM contents. If one includes, e.g., something like 

 

if I_selS1 = R0 then O_dataS1 <= X"00000000"; end if;  

 

somewhere (to ensure that reads on CPU register 0 always returns zero) Quartus will include the pass-through logic, no matter if no_rw_check is there or not. Instead I currently live with address 0 actually being a memory location, initializing it with zero and never allowing anything but zero to be written there. 

--- Quote End ---  

 

 

What would make sense, because the registers in the ram blocks have no mux feeding them. If you code it like this then it has to move the register out of the ram and into general logic. So if you infer a mux, then it becomes an async ram and passthrough would be needed.
0 Kudos
Altera_Forum
Honored Contributor II
1,992 Views

This indeed makes perfect sense and also explains why my original code is synthesized in the observed fashion. Seems that the "read-during-write" problem as implied by the synthesizer warning is not about concurrent reads and writes at all in this case ;)

0 Kudos
Altera_Forum
Honored Contributor II
1,992 Views

Where does your R0? 

If you want to ensure reading always 0 , just create register with initialized value, and use keep attribute or preserve assignment. No operation write to register . or you can separate by AND-gate internal and external.  

 

you can use one register near RAM to output on sclear or sload signals. 

I don't know but RAM should have async clear for input/output register. will it be good solution?
0 Kudos
Reply