Re: NIOS : IOWR works fine but writing through a pointer not

Altera_Forum · ‎08-15-2012

I have two problems with a generic NIOS system with data and instruction caches.

1. A Avalon MM slave is attached to NIOS system. A write to the slave is successful through IOWR. However, without IOWR using simple pointers, its unsuccessful. The following code does not generate any write transaction to the slave port. Any suggestion..?

int * PtrToSlave = (int *) Avalon_MM_Slave_Base;

writedata = generatedata() ;

*PtrToSlave = (int) writedata ;

2. How to support burst transfer to the same slave without DMA...? The NIOS Avalon data master does support burst transfer and caches are also used to support this.

Altera_Forum · ‎08-15-2012

IOWR() uses the 'stio' instruction in order to bypass the data cache.

If you aren't using the mmu (likely unless you are running something like linux) then you can also bypass the data cache by setting bit31 of the address.

If all your data memory is internal to the fpga (ie not SRAM/SDRAM/DDR) then you might as well connect it as 'tightly coupled data memory' and completely remove the data cache.

An avalon slave can (IIRC) latch the data for writes in one clock - as fast as the nios can ever write data, so there is no point supporting burst write transfers.

Avalon reads (by the nios) do have at least one wait state - as well as the two clock delay before the data can be used. In practise this is likely to be less than the cost of setting up any other form of transfer.

If you want to process large blocks of data, an an avalon master interface to your logic block.

Altera_Forum · ‎08-15-2012

Many thanks for the reply.

I received some feedback for question 2 and my comments are below.

It's known that IOWR() bypasses cache. MMU is not used in this example case. Data cache is added to support burst transfer. The nios is connected to avalon master which is also configured to support burst transfer.

I would like to support burst transfer to the slave and that's the reason for adding cache...If its not correct, then how to support burst transfer with the slave while nios acts as master on the avalon mm fabric.

There is appreciable cycle gap between two IOWR/IORD commands and the gap makes these not that much attractive.

Could any please also respond to question 1.

Altera_Forum · ‎08-16-2012

AFAIK if you use the regular pointers to write data (without setting the bit 31 on the address) and then flush the cache line, it will be written using a burst transfer.

I'm not sure you will gain that much speed though. Do you have that much latency on your slave? The CPU still needs to write the data word by word to the data cache, and this takes many cycles. If your slave has 1-2 clock cycles latency (and as dsl said it is the case with most Avalon slaves on a write access) you will probably not see the difference between IOWR and cache access with burst. If you really want a fast transfer, you need to use a DMA.

Altera_Forum · ‎08-16-2012

OK, I'll try this to check the burst transfer. There is appreciable cycle gap between two writes (IOWR) or reads (IORD) even if two write/read statements are placed next to each other with no other expression in between.

Regarding the pointer access, somehow, I have not been able to force a write or read transaction with pointers (not using the stio IOWR/IORD). As mentioned earlier the pointer expression below is not generating an write transaction on the slave port.

int *PtrToSlave = (int *) Avalon_MM_Slave_Base; // Slave base address

*PtrToSlave = (int) data;

Altera_Forum · ‎08-16-2012

You don't see the slave transfer because when you use a pointer access the CPU just writes to its data cache. The written value will only be written to your slave during a cache line flush.

Your "appreciable cycle gap" isn't due to the slave latency, it's just that the CPU requires several cycles to execute your instructions. Sure if you use bursts al the values will be written faster, but before that, the CPU will use about the same amount of cycles to fill the data cache than what you measured with the IOWR instructions. So using bursts will not make the total execution time any shorter.