Re: how to efficiently drive async SRAM?

Altera_Forum · ‎09-10-2012

What's a good way to use async SRAM?

I'm working on a Nios II system for a legacy hardware (Cyclone II) which has fast external asychronous SRAM. In fact two IS61LV51216 chips (512k*16, 8ns) in parallel for a 32 bit bus. Another component will connect to this bus, too.

The SRAM has zero hold times, so it can be almost driven synchronously, except for the write pulse. The latter can be synchronous too, if back-to-back writes are prevented with an intermediate wait cycle. (Well, at least this is my theory, have no experience.)

My Nios runs with ~80-90 MHz, so I was expecting this to match quite well. In SOPC Builder, I'm using a tristate bridge, and behind that a "Legacy IDT71V416 SRAM with SDK". It's the closest I could find. I'm getting the impression async SRAMS are almost not supported? Just very few RAMs which happened to be on some Altera boards, no generic IP core?

Running with that IP gives me 2(!) waitstates on read, writing not checked yet. Looking behind the scenes I think there is rather no SRAM controller? Instead, the generic timing is configured so that an SRAM can directly connect to the bus lines, however inefficient.

Using Component Editor, I tried to create my own tristate slave. But again I have no access to the Avalon waitrequest signal, can only specify upfront waitstates.

Do I have to go further and write my own tristate bridge?

Please comment, tell me if I'm on the wrong road and SRAMs can be efficiently interfaced easier. I hope so!

Thanks,

Jörg

PS: I'm new to this, it's my first post, hope I've hit the right section.

Altera_Forum · ‎09-11-2012

If you want access to the waitrequest signal, you will need to write your own component using HDL. Alternatively, using the component editor you can manually specify a fixed number of read and write wait states, which should be enough in your case.

You will probably still need to have at least one read state though. In addition to the SSRAM's 8ns, you have some I/O delays and routing delays inside the FPGA that can make the hole access time over one clock period. Some SRAMs can be used synchronously with pipelining and in this case you can more reliably have one read or write access per clock cycle, but you will have to write your own controller in HDL.

Altera_Forum · ‎09-11-2012

Thanks,

While being too impatient (:-P) I already started writing my own controller, as an Avalon slave. The pity with that is I can't even use the tristate bridge any more, since it seems not passing waitrequest. This is bad for the other chip on the external bus, instead of just attaching another tristate slave component I'll have to write that part by myself, too.

For I test, I only implemented reading by rather directly passing signals. It *almost* works with zero waitstates. Only some disturbance on the LCD screen, which is a continuously reading bus master. :unsure:

So you're right, I need a leadoff cycle, but by pipelining the read I will still be able to read one word per clock. I wonder if it would be helpful to make it a bursting controller. Is a leadof cycle enough penalty to make this a net win?

The fixed read and write cycles are suboptimal, don't give enough control. For example, reading works without a "setup time" cycle, but writing doesn't. However this can't be independently configured.

So far, I still stand to say that Altera supports async SRAMs not well...

Jörg

Altera_Forum · ‎09-12-2012

I'm surprised that you need a setup time for the write operation. Setting setup and hold to 0 and individual read and write wait states should work IIRC.

The Avalon tristated bus doesn't support bursts. You can try and support pipelined transfers instead, which should have the same performance with an SRAM.

Altera_Forum · ‎10-30-2012

I don't really need a setup time, this must be some artefact to get it right somehow else.

Altera_Forum · ‎10-30-2012

I'd like to wrap up my thread, since the internet is so full of dangling threads with the silently found answer never told...

I ended up writing my own SRAM controller as an Avalon slave, skipping the tristate bridge. It is (admitted) pretty tailored to my case, but perhaps it can serve as an example:

http://welecw2000a.svn.sourceforge.net/viewvc/welecw2000a/fpga/nios2/altera/ip/w2000_ext_ram_bus/ext_bus.v?revision=708&view=markup

The performance "trick" is overlapped pipelining, issue the next one or two reads even before the result of the first arrives. This depends on the clock rate, a given pipeline depth (and perhaps read phase) only works for the frequency being designed for. So I have latency, but get a data word for each clock cycle. Pretty much like a burst.

For writing, I have an issue to shape a suitable write pulse for every clock cycle, by e.g. combinatoric with the clock level to get a half cycle pulse. The pulse is probably too short. Right now I use alternating wait states and toggle the write, giving me half the performance. Suggestions welcome... Unfortunately I'm out of PLL resources to generate aux clocks.

But overall, it's way better than the stock IP, 3 times faster on read, probably still faster on write.

Jörg

Altera_Forum · ‎10-30-2012

You might be able to do something similar to what (appears to) happen on the nios cpu's tightly coupled data memory.

Reads are done every clock, with the read data being available a clock after the address. This happens whether or not the instruction opcode is a 'ldxxx' never mind which memory block is actually selected.

A write can only be done after the address decode has happened, so the address for writes is only availabe a clock later than that used for reads. Writes can still happen every clock.

If a read follows a write (to the same memory block) then the read has to stall for one cycle - since the memory block was presented with the wrong address.

Altera_Forum · ‎12-23-2012

I managed to shuffle my PLL resources around to free a clock, now I can generate a higher duty cycle signal for my write clock, now can do one write per cycle.

But I'm struggling on a different front: Not really having understood constraints, my external timing seems unreliable. In theory there should be enough timing headroom, but in fact the design is unstable, varies per synthesis. I've read about set_input_delay, set_output_delay, set_min_delay, set_max_delay but am unsure how to use them. Which are really affecting the timing, which are just informative to the fitter how tight a path is?

Maybe the point here is that the SRAM is asyncronous (has no clock), it's not really clock to output delay that matters, it's the timing of the outputs towards each other, or in case of a read access the timing of outputs to readback of inputs. How can I design for this?

Jörg