Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
17268 Discussions

cache controller and read-during-write behavior

Altera_Forum
Honored Contributor II
3,434 Views

Hello all, 

 

I'm in the process of writing a simple cache controller. I want this controller to exhibit write-back behavior, i.e. when the cache is being written to by the processor, the controller first snoops the appropriate tag and flags. If the tag and valid flag indicate a hit, then the write data goes straight into cache; if the tag does not indicate a hit, but the location is dirty then the controller first flushes the cache line into memory and then loads the requested cache line so as to produce a hit. This all sounds good, but I've come across a problem I can't seem to shake. As just described, when writing to the cache the controller first needs to check the cache line tag/flags. Obviously, all the block rams of all devices are synchronous, so there will be a one cycle delay before I get back the tag and flags. This would imply the controller will always be only 50% efficient at best (first snoop the tag/flags then write the data). My first solution was to read the tag/flags and write the data on the same clock cycle, and then on the next clock cycle, if the tag/flags indicated a miss, to rewind the operation using some skid buffers and then flush/load cache like usual. This, however, requires that the block ram return the old data on read-during-write operations. But I can't find any dual-port ram options on my target device (Stratix V) that support that setting. It always returns new data! Trying to force the ram to return old data for read-during-write operations via inferrence just results in it placing the ram in logic, which is unacceptable. Does anyone have any suggestions? Thanks.
0 Kudos
5 Replies
Altera_Forum
Honored Contributor II
1,854 Views

You are probably getting a random mix of old and new values for each bit. The timings might be such that the new value always wins! 

 

I guess reads aren't a problem - you can just discard the data. 

For writes you may have to add a 'store buffer' so they can be processes asynchonously (and pipelined).
0 Kudos
Altera_Forum
Honored Contributor II
1,854 Views

From what I can see the only way to get the old memory value in a read-during-write transaction in Stratix V M20K blocks is to use a simple dual port (1 read address, 1 write address, 1 clock). My implementation would have been considerably easier if I was allowed a true dual port, dual clock ram. Instead I'll have to mux between the processor and lower memory accessing the ram, which should be fun for timing analysis. I still welcome any more elegant solutions.

0 Kudos
Altera_Forum
Honored Contributor II
1,854 Views

I've just been looking at some signaltap traces of bus cycles for M9K on ArriaII. 

With 'OLD_DATA' enabled (and single clock) during a write on s1, s1 returns the old data, but s2 returns the new data. 

So to read the old data during a write you'd need to put the write address onto both the s1 and s2 address inputs. 

This might be what you've already discovered.
0 Kudos
Altera_Forum
Honored Contributor II
1,854 Views

That is what I expected, as the following code indicates: 

 

wr_addr_int <= to_integer(unsigned(wr_addr)); 

rd_addr_int <= to_integer(unsigned(rd_addr)); 

process(clk) 

begin 

if rising_edge(clk) then 

if we = '1' then 

for i in 0 to DATA_BYTE_WIDTH-1 loop 

ram(wr_addr_int)(i) <= data(8*(i+1)-1 downto 8*i); 

end loop; 

end if; 

q_int <= ram(rd_addr_int); 

end if; 

end process; 

UNPACK : for i in 0 to DATA_BYTE_WIDTH-1 generate 

q(8*(i+1)-1 downto 8*i) <= q_int(i); 

end generate; 

 

Thank you for verifying that for me, I'm just about ready to start my verificiation efforts.
0 Kudos
Altera_Forum
Honored Contributor II
1,854 Views

Something else I've discovered. 

There is a 1 clock stall when a read from tightly coupled data memory immediately follows a write to the same memory block. 

Basically the write cycle can only be done when it is actually required - and the decision takes a clock. 

The read is done unconditionally - ie regardless of the opcode byte or the actual memory block referenced by the high-order address bits. 

 

The same delay may affect data cache operations.
0 Kudos
Reply