Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Altera_Forum
Honored Contributor I
1,133 Views

Burst transfer from memory to Nios II

Hello, 

 

I have a hard time understanding how the Nios II handles memory accesses. 

 

In our Nios II design we want to use burst transfers to transfer data from SRAM into the caches of the processor or to on-chip memory. Does the Nios II processor use burst transfers by default or do we have to explicitly enable burst transfers? Is it recommended to use a DMA controller to transfer data from the SRAM to the on-chip memory? 

 

Regards 

Martin
0 Kudos
13 Replies
Altera_Forum
Honored Contributor I
110 Views

You have to set 'enable burst' on the cache memory to get cache-line sized memory bursts for cache line fill/writeback. 

 

Memory cycles to tightly coupled memory and cache hits complete in 1 cycle, but the data value (for reads) can't be used in the next two instructions (cpu stalls if you try to do this). 

 

Memory cycles over the Avalon bus are synchronous on the processor (ie it stalls until the slave cycle finishes), the two cycle delay then applies to reads. 

 

I'm sure it wouldn't have been that difficult to give the option of 'posting' a single write. Aynchronous reads would be slightly more difficult (due to scheduling the write into the register file).
Altera_Forum
Honored Contributor I
110 Views

When bursting is enabled in the cache settings, the following burst lengths are used: 

 

Instruction master burst length = 8 (32-bit wide beats) 

Data master burst length = 4/8 (32-bit wide beats base on cache line size of 16/32 bytes) 

 

For SRAM accesses enabling bursting on the CPU side doesn't buy you anything since the memory doesn't support bursting. Whether using a DMA makes sense depends on the algorithm you are implementing. If your code works on small buffers of data that fit in an onchip RAM then it might make sense to DMA the data from ofchip SRAM (I'm assuming you are using SRAM and not SSRAM) and access it as tightly coupled memory. If you are doing short quick accesses then the overhead of moving the data with a DMA is probably not worth the effort.
Altera_Forum
Honored Contributor I
110 Views

Hello, 

 

the topic of using bursts transfers to the processor popped up again. This time I have written a component which exposes a bursts capable Avalon MM Slave interface. 

Is there a cost associated with enabling the "Bursts transfers" option for the Nios II processor in Qsys? 

 

 

Regards 

Martin
Altera_Forum
Honored Contributor I
110 Views

When the master and slave are perfectly matched in terms of width, max burst count, and burst wrapping capabilities then there is no overhead in SOPCB or Qsys. 

 

When burst adapters must be inserted in the logic due to a mismatch there is no overhead with the adapters in Qsys (there is overhead in the ones for SOPCB though). 

 

So the only cost is the additional logic in any burst adaptation logic that Qsys might need to insert along with the extra logic in the masters to perform the burst. If you require burst adaptation logic it's possible that the master provides a suboptimal burst for the slave interface you have built. I don't know what your slave is so I can't say for certain.
Altera_Forum
Honored Contributor I
110 Views

My slave exposes a memory interface to the Nios II processor. The width of the data bus is 32 bits and the number of pending read transactions can be set by generics. 

 

There are also other devices attached to the data bus like SRAM and DDR2 memory (I'm developing on a Altera Embedded Systems Development Kit, Cyclone III Edition). The SRAM is not used currently but for the target platform we have designated SRAM as main memory.  

 

Does the DDR2 SDRAM Controller with ALTMEMPHY support burst transfers? There doesn't seem to be any documentation for Qsys. I've only seen a datasheet for the MegaWizard.
Altera_Forum
Honored Contributor I
110 Views

Yes, Altmemphy supports bursting, it should be called "local burst size" or something along those lines in the wizard that pops up on the screen. That said, now that there is the high performance 2 controller option I never use bursting with the Nios II processor and DDR memories. 

 

Instead I set the local burst size to 1 (nonbursting) and then increase the arbitration share of the instruction and data master to 8 to match my cache line size with the Nios II bursting disabled. The HP2 controller will take those sequential back to back transfers (arbitation share will make sure the master has the opportunity to present 8 back to back transfers) and condense them into the optimal offchip burst of 4/8 beats. This ensures that I don't have burst adapters dropped down all over the place in my system and shifts all the complexity off to the memory controller instead.
Altera_Forum
Honored Contributor I
110 Views

Hello, 

 

sorry for getting back to this topic so late... 

 

Can I say as a rule of thumb that bursting is only useful when all components connected to the bus do support the same burst size? Or if I have burst capable and non-burst capable components on the same bus I need to trade of the benefits of burst transfers with the penalties in having burst adapters? 

 

I've found the setting for supporting burst transfers in the Nios II edit menu in Qsys but I haven't found the setting for the arbitration share. Can you tell me where I have to change those settings? 

 

 

Best regards 

Martin
Altera_Forum
Honored Contributor I
110 Views

Your first generalization is mostly correct, if your slave doesn't need bursting then you shouldn't enable it on the master (assuming it's optional) because it doesn't buy you anything. The only time bursting might be handy in cases like these is if you want to implement a higher quality of service for one master over other masters since bursts lock the arbiter. That said you can do the same sort of thing with arbitration share. It really depends on your system and what your throughput needs are whether bursting is worth enabling or not. Enabling bursting for non-bursting slaves doesn't increase performance but if you are working with slaves that do need bursts (like PCIe for example) to be efficient then it's a good idea to make sure whatever accesses it uses bursting too. 

 

In fact if you connect a bursting master to a non-bursting slave in SOPC Builder your efficiency decreases since for every burst through a burst adapter there is a single cycle of overhead. So if you used a burst length of 2 on the master and a non-bursting slave, you will require 3 cycles for every burst which means your efficiency is only 67% of what it would have been with bursting disabled. Qsys doesn't have this penalty since the burst adapters are improved over the SOPC Builder version.
Altera_Forum
Honored Contributor I
110 Views

Somewhat on topic I hope: I have been unable to use SGDMA with the TSE to send Ethernet packets that are stored in onchip memory. I assumed it wasn't possible but now I wonder if our system is misconfigured and it should be possible. 

 

Do any of you SGDMA experts know about SGDMA from onchip memory when used with the TSE? 

 

Thanks, 

Bill
Altera_Forum
Honored Contributor I
110 Views

It should definitely be possible, I've done it on several designs. Check that the SGDMA masters are correctly connected to the on-chip memory and that the software is configured to put the packets data into those memories. If it doesn't work, use SignalTap to find out what the DMA is doing.

Altera_Forum
Honored Contributor I
110 Views

Thank you! I think I know what it is now - we wanted one onchip memory for code and data and want to send data stored in this onchip memory. I see we would have to create a second onchip region to do this, which is OK but not as convenient. 

 

Bill
Altera_Forum
Honored Contributor I
110 Views

Assuming you have enough room in your code on-chip memory then it should be possible to use it for the data storage as well for the SGDMA. You might want to keep those as separate onchip memories for performance reasons though (so that Nios can access one and the SGDMA access the other concurrently).

Altera_Forum
Honored Contributor I
110 Views

Or give the nios 'tightly coupled' access and let the SGMA access the other as an Avalon slave. 

 

If you are using on-chip memory for code/data the performance is better if you use tightly coupled memories. It may well mean that you don't need the instruction or data caches (except you'll need the i-cache to use the JTAG debug and most of the boot options). 

 

You do want to make sure that the code put into tightly coupled instruction memory is pure (contains no data) - to avoid slow Avalon cycles to it. This probably required you use a non-standard linker script and may be impossible if you are trying to use the default EPCS loader. Also the gcc4 built by Altera puts jump tables (for switch statemants) into the code segment - you'd need to use the gcc3 build (or rebuild the compiler). 

I also got worse code from gcc4!
Reply