What's the burst length of SDRAM in Nios II?

Altera_Forum · ‎10-11-2004

The SDRAM Controller in Nios II is just a big black box, almost no details is supplied in the datasheet come with Nios II. For example, what's the burst length of every read or write to the SDRAM? Can I program it?

I am doing a speed-criticla project, and frequent reading and writing data is carried out between CPU and SDRAM. The overhead of communicating with SDRAM is a boring problem, in such a speed-critical environment, NO-KNOWLEDGE-OF-BURST-LENGTH is fatal.

Can someone tell me about it?

Altera_Forum · ‎10-11-2004

I've done a bit of work trying to get performance out of the Altera SDRAM controller. I'm using the same 16MB 32 bit wide Micron chip as the Cyclone devkit board.

Using dma, and with help from Altera, I've been able to *write* sdram at essentially one clock per write for lengths up to 480 writes. So emptying my external lpm_fifo takes about 485 clocks to empty.n(@75MHz)

This is while running out of the same sdram and operating on other data simultaneously. So its pretty optimal.

Now for the bad part http://forum.niosforum.com/work2/style_emoticons/<#EMO_DIR#>/smile.gif

any code such as my_array[i] or *my_array_ptr++ is very slow. Writing memory useing these common techniques results in 5 clocks per write. I haven't really studied the compiled assembly but this is devastating for my application.

One thing is, I don't see any instruction that increments the register as the read or write is performed like on other processors I've worked with. Without this, reads and writes effectively take multiple clocks *no matter how fast* the memory is or how good the controllers burst behavior.

So really, until there is some more work at the instruction + compiler level, performance is going to be illusive. If you can do things in chunks with no need to modify each word, you can use dma.

Anyone know any different? Or have a custom instruction for fast reads/writes?

Ken

Altera_Forum · ‎10-11-2004

The burst length in Altera's Avalon SDRAM controller is fixed at 1, it cannot be changed. Reading/writing at consecutive addresses in SUSTAINED mode will take one clock cycle per operation (or very close to), this is as good as burst mode. If you are accessing random locations (on different SDRAM rows/banks), there will be an overhead due to opening/closing the rows/banks, which will not disappear even if you set bursts>1. In fact, in some cases (no cache hits for example), random access performance will be even worse with bursts>1. These are common problems with any system using SDRAMs and the performance issues you see are not particularly specific to Altera's controller (although some minor optimizations could be done, I think).

Some things you can do:

- enable DMA

- use instruction and data caches (experiment with different sizes)

- use register variables in your code - in extreme cases, use an additional fast SRAM or on-chip RAM to store variables you access more frequently

- do application specific hardware accelerators (peripherals or custom instructions) to speed up certain parts of your design - like a simple dedicated cache for an array, as an example

Altera_Forum · ‎10-12-2004

To Kenland & Clancy:

Thank you!

I have got the two ideas from the Modelsim Simulation Waveform:

*The burst length of @ltera's SDRAM Controller is 1(just like what Clancy mentioned), maybe we cannot modify it while we can do such thing to DDR SDRAM Controller.

*The SDRAM can realize almost one work per clock when the controller sends "Read" commands consecutively(But I do not know how to instruct it to do so). From the waveform of example from AN351, 8 words can be read from SDRAM consecutively.

Just like what Kenland has done, I tried to use "pointer++" to get consecutively located data from SDRAM, the simulation result disappointed me down to hell: a word would return from SDRAM every more than 5 clock cycles(I did not calculate the exact cycles).

During this workday, I will try dma.

Thank you, Kenland & Clancy!

Altera_Forum · ‎10-12-2004

You're welcome!

In your case, I don't know how DMA is going to help, because you want to read/write to SDRAM from the CPU. DMA will speed up the access when you have another memory area or device you want to transfer blocks of data from/to, without CPU intervention. The reason DMA works well with SDRAMs is because it generates low level bus requests, which will help keeping the SDRAM pipeline busy, hence the one word per clock cycle stream. You should be able to achieve something similar in software, as long as you dont have other processes/interrupts kicking in and disrupting the consecutive address requests.

You said that you have an incrementing pointer for reading the SDRAM, but where are you reading the data into? Is the destination address in the SDRAM as well? In this case, the bus accesses would be read/write/read/write, hence the bad performance - probably the SDRAM controller needs to open/close the row for each access. In this case, if the data block is big and you cannot use register variables, the solution I see is adding an extra on-chip RAM peripheral and instructing the compiler to use this area for your array.

Regards,

Clancy

Altera_Forum · ‎10-12-2004

You're Welcome RM!

I've been struggling to port proven code from a 55MHz Motorola Coldfire to a 75MHz NiosI/II for months. It's somewhere between 30% and 50% of the performance on the NiosI/II.

I believe the lack of a single instruction to read/write *and* increment/decrement the address all in one clock is partially to blame. On Coldfire the *pointer++ can map to something like LDW g1, (addr)+.

Super fast sram or even cache is not going to help when you need an instruction to do the load or store and another (at least) to increment.

I see in several of the .s files that typically there is a LDBIO r4, (r11) followed by a ADDI r11, 1 to increment.

Need to add postincremental adressing i.e. LDBIO r4, (r11)+. Pre/Postdecrementing is nice too LDBIO r4, -(r11).

Hopefully this is available at least if I code by hand.

I was wondering if I could do something with loops I know I can unroll.

for(i=0 ; i<max_i ; i+=4)

{

LDW r4, 0(r11)

//do stuff with r4

LDW r4, 1(r11)

//do stuff with r4

LDW r4, 2(r11)

// ditto

LDW r4, 3(r11)

}

Clumsy though. Any clever ideas?

Ken

Altera_Forum · ‎10-12-2004

TO Clancy:

Thank you for your quick reply.

In the "Pointer++" case, I use SRAM as my data Memory & Program Memory while I let

the "Pointer" point to somewhere in SDRAM. The data read from SDRAM is given to a variable which

I think resides in SRAM (since the SRAM is my data & program memory).

In my project, I'll use 2 SDRAMs and the bulk data transfering is between FIFO and the SDRAMs or

between the 2 SDRAMs,the read/write/read/write case will not be done on the same SDRAM at the same

time. May be this method will achieve a good bandwidth.

By the way, can you give some advice of the memory assignments in my project? I want to use

2 SDRAMs to store images and SRAM as the data & program memory; bulk data transfering is carried out

among on-chip FIFOs and SDRAMs through DMA(s). In this way, SDRAMs are only used as storage bin, the

frequent instructions and data accessing of the CPU is carried out between CPU and SRAM without interfering

the bulk data transfering. But one thing pains me is that the DMA of NiosII will not work properly, I dont know

whether it is my fault or the DMA's bug. http://forum.niosforum.com/work2/style_emoticons/<#EMO_DIR#>/unsure.gif http://forum.niosforum.com/work2/style_emoticons/<#EMO_DIR#>/blink.gif