Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
17256 Discussions

cache controller and read-during-write behavior

Altera_Forum
Honored Contributor II
3,411 Views

Hello all, 

 

I'm in the process of writing a simple cache controller. I want this controller to exhibit write-back behavior, i.e. when the cache is being written to by the processor, the controller first snoops the appropriate tag and flags. If the tag and valid flag indicate a hit, then the write data goes straight into cache; if the tag does not indicate a hit, but the location is dirty then the controller first flushes the cache line into memory and then loads the requested cache line so as to produce a hit. This all sounds good, but I've come across a problem I can't seem to shake. As just described, when writing to the cache the controller first needs to check the cache line tag/flags. Obviously, all the block rams of all devices are synchronous, so there will be a one cycle delay before I get back the tag and flags. This would imply the controller will always be only 50% efficient at best (first snoop the tag/flags then write the data). My first solution was to read the tag/flags and write the data on the same clock cycle, and then on the next clock cycle, if the tag/flags indicated a miss, to rewind the operation using some skid buffers and then flush/load cache like usual. This, however, requires that the block ram return the old data on read-during-write operations. But I can't find any dual-port ram options on my target device (Stratix V) that support that setting. It always returns new data! Trying to force the ram to return old data for read-during-write operations via inferrence just results in it placing the ram in logic, which is unacceptable. Does anyone have any suggestions? Thanks.
0 Kudos
5 Replies
Altera_Forum
Honored Contributor II
1,831 Views

You are probably getting a random mix of old and new values for each bit. The timings might be such that the new value always wins! 

 

I guess reads aren't a problem - you can just discard the data. 

For writes you may have to add a 'store buffer' so they can be processes asynchonously (and pipelined).
0 Kudos
Altera_Forum
Honored Contributor II
1,831 Views

From what I can see the only way to get the old memory value in a read-during-write transaction in Stratix V M20K blocks is to use a simple dual port (1 read address, 1 write address, 1 clock). My implementation would have been considerably easier if I was allowed a true dual port, dual clock ram. Instead I'll have to mux between the processor and lower memory accessing the ram, which should be fun for timing analysis. I still welcome any more elegant solutions.

0 Kudos
Altera_Forum
Honored Contributor II
1,831 Views

I've just been looking at some signaltap traces of bus cycles for M9K on ArriaII. 

With 'OLD_DATA' enabled (and single clock) during a write on s1, s1 returns the old data, but s2 returns the new data. 

So to read the old data during a write you'd need to put the write address onto both the s1 and s2 address inputs. 

This might be what you've already discovered.
0 Kudos
Altera_Forum
Honored Contributor II
1,831 Views

That is what I expected, as the following code indicates: 

 

wr_addr_int <= to_integer(unsigned(wr_addr)); 

rd_addr_int <= to_integer(unsigned(rd_addr)); 

process(clk) 

begin 

if rising_edge(clk) then 

if we = '1' then 

for i in 0 to DATA_BYTE_WIDTH-1 loop 

ram(wr_addr_int)(i) <= data(8*(i+1)-1 downto 8*i); 

end loop; 

end if; 

q_int <= ram(rd_addr_int); 

end if; 

end process; 

UNPACK : for i in 0 to DATA_BYTE_WIDTH-1 generate 

q(8*(i+1)-1 downto 8*i) <= q_int(i); 

end generate; 

 

Thank you for verifying that for me, I'm just about ready to start my verificiation efforts.
0 Kudos
Altera_Forum
Honored Contributor II
1,831 Views

Something else I've discovered. 

There is a 1 clock stall when a read from tightly coupled data memory immediately follows a write to the same memory block. 

Basically the write cycle can only be done when it is actually required - and the decision takes a clock. 

The read is done unconditionally - ie regardless of the opcode byte or the actual memory block referenced by the high-order address bits. 

 

The same delay may affect data cache operations.
0 Kudos
Reply