Nios® V/II Embedded Design Suite (EDS)
Support for Embedded Development Tools, Processors (SoCs and Nios® V/II processor), Embedded Development Suites (EDSs), Boot and Configuration, Operating Systems, C and C++

IOWR/IORD latency

Altera_Forum
Honored Contributor II
1,500 Views

Hi all! 

 

Sorry for prvious empty thread, I have no idea what happened. 

 

 

I'm experiencing some strange latency when using IOWR macros. I've added added custom 8-bit slave to QSYS and got huge number of cycles to read/write its registers. I thought that this issue related to some mistakes in my peripherial but then I've tried to read on-chip memory and got the same result! 

 

 

Here is the code, I'm using performance counter: 

 

 

int main() { PERF_RESET(PERFORMANCE_COUNTER_0_BASE); PERF_START_MEASURING(PERFORMANCE_COUNTER_0_BASE); PERF_BEGIN(PERFORMANCE_COUNTER_0_BASE,1); IORD_8DIRECT(ONCHIP_MEMORY2_0_BASE, PRER_LO); PERF_END(PERFORMANCE_COUNTER_0_BASE,1); PERF_BEGIN(PERFORMANCE_COUNTER_0_BASE,2); IORD_16DIRECT(ONCHIP_MEMORY2_0_BASE, PRER_LO); PERF_END(PERFORMANCE_COUNTER_0_BASE,2); PERF_BEGIN(PERFORMANCE_COUNTER_0_BASE,3); IORD_32DIRECT(ONCHIP_MEMORY2_0_BASE, PRER_LO); PERF_END(PERFORMANCE_COUNTER_0_BASE,3); perf_print_formatted_report(PERFORMANCE_COUNTER_0_BASE,50000000,3,"IORD_8","IORD_16","IORD32"); return 0; } 

 

And what I get: 

 

--Performance Counter Report-- Total Time : 10 usec (532 clock-cycles) +---------------+-----+------------+---------------+------------+ | Section | % | Time (usec)| Time (clocks)|Occurrences | +---------------+-----+------------+---------------+------------+ | IORD_8| 9 | 1 | 51 | 1 | +---------------+-----+------------+---------------+------------+ | IORD_16| 9 | 1 | 50 | 1 | +---------------+-----+------------+---------------+------------+ | IORD32| 8 | 0 | 47 | 1 | +---------------+-----+------------+---------------+------------+  

Ok, timer adds some time to this, as I measured, 30 clock cycles. So we have about 20 clock cycles per word, still bad. What could I do wrong? 

I'm using Quartus 15.0 Web Edition and Nios 2 Gen 2 /e.
0 Kudos
2 Replies
Altera_Forum
Honored Contributor II
740 Views

You will probably find that the performance counters represent considerable overhead. Try reading the registers 10,000 times in one performance counter. Then divide the time by 10,000 to get a result that isn't lost in noise.

0 Kudos
Altera_Forum
Honored Contributor II
740 Views

Nios 2 /e is a slow processor that required 5 clock cycles minimum to complete 1 instruction. 

 

The overhead from the performance counter is larger in comparison to the total measured time per IORD instruction (30/50). I agree with Galfonz that you should perform iterations of IORD for more accuracy. 

 

Also try to look at simulations. This would definitely be a better way to understand the behavior of the RTL.
0 Kudos
Reply