Burst transfer using the NiosII

Altera_Forum · ‎12-06-2011

Hi,

I had a doubt regarding the burst transfer interface on the Nios II processor. I am using the version with the data cache and hardware multipliers. I have read in some thread that we can access data as a burst from memory/cache, but if I have a custom accelerator can i write/read from this custom accelerator in bursts... if so... how do i do this?

The data size is not much, so i was not planning on using the DMA... the transfer happens many times, but each time the transfer is around 64 bytes.

Thanks and Regards,

joseph

Altera_Forum · ‎12-06-2011

I believe the only way the nios can do burst memory cycles is when doing cache line fill/write.

Normal uncached avalon memory cycles are always done synchronously - ie the nios cpu stalls for the entire avalon cycle (the 'result delay' for reads happens after this stall).

The 'tightly coupled data' interface isn't public - so you can't use that to interface to custome logic.

A couple of options for avoiding the avalon bus stalls:

1) use a tightly coupled memory block and get your logic to transfer to from the dual port of that memory

2) access from within the custom instruction logic - needs a little lateral thought

Altera_Forum · ‎12-07-2011

DSL is correct, the only time either Nios II master performs a burst larger than 1 is when bursting is enabled and a cache line is being read or written out (assuming the data cache line size > 4 bytes).

If you are using Altmemphy (HPII) or Uniphy you no longer need to use bursting to get good efficiency out of them. Those memory controllers support transaction merging so when the controller receives sequential or short burst accesses it attempts to merge them into a single off-chip burst. For example to get good efficiency out of Nios II this is what I would do:

1) Set the SDRAM controller local burst size to 1

2) Use 32B/line Nios II caches with bursting disabled

3) Crank the arbitration share of the instruction and data masters connected to SDRAM to 8

The arbitration share of 8 will make sure the cache line movements don't pingpong back and forth for access to the SDRAM (trying to keep each master accessing memory for 8 back to back accesses which matches the line size). So if you did the same for your hardware accelerator you can eliminate some burst complexity from your system.