- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello all,
I'm in the process of writing a simple cache controller. I want this controller to exhibit write-back behavior, i.e. when the cache is being written to by the processor, the controller first snoops the appropriate tag and flags. If the tag and valid flag indicate a hit, then the write data goes straight into cache; if the tag does not indicate a hit, but the location is dirty then the controller first flushes the cache line into memory and then loads the requested cache line so as to produce a hit. This all sounds good, but I've come across a problem I can't seem to shake. As just described, when writing to the cache the controller first needs to check the cache line tag/flags. Obviously, all the block rams of all devices are synchronous, so there will be a one cycle delay before I get back the tag and flags. This would imply the controller will always be only 50% efficient at best (first snoop the tag/flags then write the data). My first solution was to read the tag/flags and write the data on the same clock cycle, and then on the next clock cycle, if the tag/flags indicated a miss, to rewind the operation using some skid buffers and then flush/load cache like usual. This, however, requires that the block ram return the old data on read-during-write operations. But I can't find any dual-port ram options on my target device (Stratix V) that support that setting. It always returns new data! Trying to force the ram to return old data for read-during-write operations via inferrence just results in it placing the ram in logic, which is unacceptable. Does anyone have any suggestions? Thanks.Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You are probably getting a random mix of old and new values for each bit. The timings might be such that the new value always wins!
I guess reads aren't a problem - you can just discard the data. For writes you may have to add a 'store buffer' so they can be processes asynchonously (and pipelined).- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
From what I can see the only way to get the old memory value in a read-during-write transaction in Stratix V M20K blocks is to use a simple dual port (1 read address, 1 write address, 1 clock). My implementation would have been considerably easier if I was allowed a true dual port, dual clock ram. Instead I'll have to mux between the processor and lower memory accessing the ram, which should be fun for timing analysis. I still welcome any more elegant solutions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've just been looking at some signaltap traces of bus cycles for M9K on ArriaII.
With 'OLD_DATA' enabled (and single clock) during a write on s1, s1 returns the old data, but s2 returns the new data. So to read the old data during a write you'd need to put the write address onto both the s1 and s2 address inputs. This might be what you've already discovered.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That is what I expected, as the following code indicates:
wr_addr_int <= to_integer(unsigned(wr_addr)); rd_addr_int <= to_integer(unsigned(rd_addr)); process(clk) begin if rising_edge(clk) then if we = '1' then for i in 0 to DATA_BYTE_WIDTH-1 loop ram(wr_addr_int)(i) <= data(8*(i+1)-1 downto 8*i); end loop; end if; q_int <= ram(rd_addr_int); end if; end process; UNPACK : for i in 0 to DATA_BYTE_WIDTH-1 generate q(8*(i+1)-1 downto 8*i) <= q_int(i); end generate; Thank you for verifying that for me, I'm just about ready to start my verificiation efforts.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Something else I've discovered.
There is a 1 clock stall when a read from tightly coupled data memory immediately follows a write to the same memory block. Basically the write cycle can only be done when it is actually required - and the decision takes a clock. The read is done unconditionally - ie regardless of the opcode byte or the actual memory block referenced by the high-order address bits. The same delay may affect data cache operations.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page