Nios® V/II Embedded Design Suite (EDS)
Support for Embedded Development Tools, Processors (SoCs and Nios® V/II processor), Embedded Development Suites (EDSs), Boot and Configuration, Operating Systems, C and C++
12690 Discussions

C2H speedup vs clearing data cache

Altera_Forum
Honored Contributor II
1,006 Views

We're planning to use C2H in several areas to improve performance, but have a general question about performance relative to clearing the data cache. 

 

In any C2H accelerator which is going to modify NIOS memory, it is necessary to clear the data cache to prevent any consistency problems between the contents of the cache and the RAM -- we understand the need for this. 

 

The general question is how to make the tradeoff decision between just using NIOS C code and writing an accelerator. 

 

Simple example: Suppose you need to clear an array of integers to zero. Basic C would be: 

 

void clearIntegers(int *array, int iCnt){ 

while(iCnt-- > 0) 

*array++ = 0; 

 

This would seem to be perfect for C2H, but requires clearing the data cache. We assume that the actual clearing of the cache is very fast (1 or 2 cycles), but then subsequent NIOS data reads which might have been satisfied by the cache will require a full read cycle from RAM until the data cache is re-populated. 

 

Is there any way to estimate what the value of iCnt needs to be for the overall result to be faster using the accelerator? 

 

Clearly, for iCnt == 0 you would be better off not invoking this function. How about for iCnt = 10, or 20, or 30 ....? 

 

We also understand that this answer will vary considerably based on the current application and the locality of data references within that application, and are just looking for any "rule of thumb" that's been developed by those with more experience in using C2H accelerators. 

 

Thanks.
0 Kudos
2 Replies
Altera_Forum
Honored Contributor II
343 Views

Unfortunately there is no rule of thumb since this will be dependent on the main memory(s), cache size, and cache line size. 

 

How you handle this is very system specific like you mentioned. If Nios II doesn't need to access this data much then I would allocate the data in a non-cached region of your memory space or use a tightly coupled memory (dual port it with C2H accessing the other port). If Nios needs to get at this data a lot you might be better off letting the processor clear it (some of the zeros may remain cached when it goes to use them for calculations). 

 

The same can be said if you were using a DMA peripheral as well (actually your example is a zero stuffing DMA).
0 Kudos
Altera_Forum
Honored Contributor II
343 Views

We sort of figured that there wouldn't be much of a rule available -- too many environment-specific variables. But also wanted to make sure that we weren't missing something important. 

 

Thanks for your help.
0 Kudos
Reply