Nios® V/II Embedded Design Suite (EDS)
Support for Embedded Development Tools, Processors (SoCs and Nios® V/II processor), Embedded Development Suites (EDSs), Boot and Configuration, Operating Systems, C and C++

Data Cache

Altera_Forum
Honored Contributor II
1,088 Views

I was looking at the documentation, and found that address bit 31 is used to determine cache and non-cachable memory. When bit 31 is low is the memory/IO cachable, or is it non-cachable? Didn't see what state bit 31 is for cache/non-cache in the documentation. 

 

Rick
0 Kudos
6 Replies
Altera_Forum
Honored Contributor II
337 Views

Bit 31 set (logic-1) bypasses cache. 

 

IORD/IOWR on any address (bit 31 set or not) also bypasses cache. 

 

Mallocing a buffer with the 'uncached' malloc provided with the HAL will give you a buffer that also bypasses cache.
0 Kudos
Altera_Forum
Honored Contributor II
337 Views

Jesse, 

 

Thanks for the reply. Just couple of more questions: 

1) How come you can't put memory above the A31? I tried a couple of memory types and it says that the current tool chain does not support a 256MB boundry. 

2) If I need to tag memory lower than A31, is there a penalty if I use the alt_remap_uncached function? In another words if I tag 1/2 my memory as uncached, do I take a performance hit in that region? 

 

Rick
0 Kudos
Altera_Forum
Honored Contributor II
337 Views

The first 2GB of address space is cached. The second 2GB is non-cached. These are not two seperate memory spaces or anything so there is a total of 2GB of address space (mirrored memory). 

 

In other words address 0x00000000 (cachable) maps to address 0x8000000 (non-cachable) 

" " " " " " 0x00000001 " " " " " " " " 0x8000001 (and so on .....) 

 

 

So use the first 2GB for non-peripheral access (local memory), and the second 2GB for peripherals (use IORD and IOWR for this type of access so that you don't really need to know what's going on)
0 Kudos
Altera_Forum
Honored Contributor II
337 Views

In SOPC builder you should put all your memory and peripherals as close to 0 as possible for efficient hardware. 

 

For the purposes of bypassing dcache, your best plan is probably to use IORD/IOWR for all register accesses. 

 

For accesses to memory shared with other masters (other processors, DMA controllers etc) you can either flush the cache after you write to it (or before you read from it) or set bit 31 using the HAL function call. 

 

If you take the second route then you can choose to access some locations within a memory with bit31 set and other parts of the same memory with bit31 clear - this will work correctly and there will be no performance hit on the bit31 clear regions. 

 

But if your code is accessing the same location in the memory, sometimes with bit31 set and sometimes clear then you must flush the cache in between to ensure it works correctly. 

 

 

ps. The 256MB limit is imposed by the call instruction, which can only jump within the same 256MB region.
0 Kudos
Altera_Forum
Honored Contributor II
337 Views

Just a thought that BadOmen might be able to answer, 

 

Why not have the facility to specify non zero based memory. Then with the addition of a write through location zero exception you'd be able to detect 

and trap misbehaving code that accidentally accesses location zero?  

Maybe not a necessary feature on perfect code but something useful I've found 

on other embedded processors such as Fujitsu F2C series. 

 

Regards...
0 Kudos
Altera_Forum
Honored Contributor II
337 Views

I'm not too familar with that, however the reason why the cache/no-cache access is set up this way is to allow for fast bypassing. The hardware simply detects this bit and bypasses the cache in hardware (and this is the fastest possible access you can have to non-cache memory). 

 

Wombat also brought up a good point. You really want to locate your addresses near 0x00000000 simply because with any addressing logic comes mux logic. If you map the peripherals all over the place in the address space you may cause not only wasteful LE usage, but poor timing performance. If you want a good optimized mapping use the auto assign base addresses function in SOPC Builder which will modify the addressing in order to make a tightly packed addressing map (with tight address packing some of the multiplexor logic just becomes decoding based off a few msb address bits).  

 

Another recommendation is if you don't require interrupts on some devices, then make it a no connect (NC) and use up the highest priority IRQs first (so instead of 0, 3, 15, 21; use 0, 1, 2, 3 for example).
0 Kudos
Reply