cache controller and read-during-write behavior

Altera_Forum · ‎10-08-2012

Hello all,

I'm in the process of writing a simple cache controller. I want this controller to exhibit write-back behavior, i.e. when the cache is being written to by the processor, the controller first snoops the appropriate tag and flags. If the tag and valid flag indicate a hit, then the write data goes straight into cache; if the tag does not indicate a hit, but the location is dirty then the controller first flushes the cache line into memory and then loads the requested cache line so as to produce a hit. This all sounds good, but I've come across a problem I can't seem to shake. As just described, when writing to the cache the controller first needs to check the cache line tag/flags. Obviously, all the block rams of all devices are synchronous, so there will be a one cycle delay before I get back the tag and flags. This would imply the controller will always be only 50% efficient at best (first snoop the tag/flags then write the data). My first solution was to read the tag/flags and write the data on the same clock cycle, and then on the next clock cycle, if the tag/flags indicated a miss, to rewind the operation using some skid buffers and then flush/load cache like usual. This, however, requires that the block ram return the old data on read-during-write operations. But I can't find any dual-port ram options on my target device (Stratix V) that support that setting. It always returns new data! Trying to force the ram to return old data for read-during-write operations via inferrence just results in it placing the ram in logic, which is unacceptable. Does anyone have any suggestions? Thanks.

Altera_Forum · ‎10-09-2012

You are probably getting a random mix of old and new values for each bit. The timings might be such that the new value always wins!

I guess reads aren't a problem - you can just discard the data.

For writes you may have to add a 'store buffer' so they can be processes asynchonously (and pipelined).

Altera_Forum · ‎10-09-2012

From what I can see the only way to get the old memory value in a read-during-write transaction in Stratix V M20K blocks is to use a simple dual port (1 read address, 1 write address, 1 clock). My implementation would have been considerably easier if I was allowed a true dual port, dual clock ram. Instead I'll have to mux between the processor and lower memory accessing the ram, which should be fun for timing analysis. I still welcome any more elegant solutions.

Altera_Forum · ‎10-17-2012

I've just been looking at some signaltap traces of bus cycles for M9K on ArriaII.

With 'OLD_DATA' enabled (and single clock) during a write on s1, s1 returns the old data, but s2 returns the new data.

So to read the old data during a write you'd need to put the write address onto both the s1 and s2 address inputs.

This might be what you've already discovered.

Altera_Forum · ‎10-17-2012

That is what I expected, as the following code indicates:

wr_addr_int <= to_integer(unsigned(wr_addr));

rd_addr_int <= to_integer(unsigned(rd_addr));

process(clk)

begin

if rising_edge(clk) then

if we = '1' then

for i in 0 to DATA_BYTE_WIDTH-1 loop

ram(wr_addr_int)(i) <= data(8*(i+1)-1 downto 8*i);

end loop;

end if;

q_int <= ram(rd_addr_int);

end if;

end process;

UNPACK : for i in 0 to DATA_BYTE_WIDTH-1 generate

q(8*(i+1)-1 downto 8*i) <= q_int(i);

end generate;

Thank you for verifying that for me, I'm just about ready to start my verificiation efforts.

Altera_Forum · ‎10-23-2012

Something else I've discovered.

There is a 1 clock stall when a read from tightly coupled data memory immediately follows a write to the same memory block.

Basically the write cycle can only be done when it is actually required - and the decision takes a clock.

The read is done unconditionally - ie regardless of the opcode byte or the actual memory block referenced by the high-order address bits.

The same delay may affect data cache operations.