Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Altera_Forum
Honored Contributor I
814 Views

IOWR/IORD latency

Hi all! 

 

Sorry for prvious empty thread, I have no idea what happened. 

 

 

I'm experiencing some strange latency when using IOWR macros. I've added added custom 8-bit slave to QSYS and got huge number of cycles to read/write its registers. I thought that this issue related to some mistakes in my peripherial but then I've tried to read on-chip memory and got the same result! 

 

 

Here is the code, I'm using performance counter: 

 

 

int main() { PERF_RESET(PERFORMANCE_COUNTER_0_BASE); PERF_START_MEASURING(PERFORMANCE_COUNTER_0_BASE); PERF_BEGIN(PERFORMANCE_COUNTER_0_BASE,1); IORD_8DIRECT(ONCHIP_MEMORY2_0_BASE, PRER_LO); PERF_END(PERFORMANCE_COUNTER_0_BASE,1); PERF_BEGIN(PERFORMANCE_COUNTER_0_BASE,2); IORD_16DIRECT(ONCHIP_MEMORY2_0_BASE, PRER_LO); PERF_END(PERFORMANCE_COUNTER_0_BASE,2); PERF_BEGIN(PERFORMANCE_COUNTER_0_BASE,3); IORD_32DIRECT(ONCHIP_MEMORY2_0_BASE, PRER_LO); PERF_END(PERFORMANCE_COUNTER_0_BASE,3); perf_print_formatted_report(PERFORMANCE_COUNTER_0_BASE,50000000,3,"IORD_8","IORD_16","IORD32"); return 0; } 

 

And what I get: 

 

--Performance Counter Report-- Total Time : 10 usec (532 clock-cycles) +---------------+-----+------------+---------------+------------+ | Section | % | Time (usec)| Time (clocks)|Occurrences | +---------------+-----+------------+---------------+------------+ | IORD_8| 9 | 1 | 51 | 1 | +---------------+-----+------------+---------------+------------+ | IORD_16| 9 | 1 | 50 | 1 | +---------------+-----+------------+---------------+------------+ | IORD32| 8 | 0 | 47 | 1 | +---------------+-----+------------+---------------+------------+  

Ok, timer adds some time to this, as I measured, 30 clock cycles. So we have about 20 clock cycles per word, still bad. What could I do wrong? 

I'm using Quartus 15.0 Web Edition and Nios 2 Gen 2 /e.
0 Kudos
2 Replies
Altera_Forum
Honored Contributor I
54 Views

You will probably find that the performance counters represent considerable overhead. Try reading the registers 10,000 times in one performance counter. Then divide the time by 10,000 to get a result that isn't lost in noise.

Altera_Forum
Honored Contributor I
54 Views

Nios 2 /e is a slow processor that required 5 clock cycles minimum to complete 1 instruction. 

 

The overhead from the performance counter is larger in comparison to the total measured time per IORD instruction (30/50). I agree with Galfonz that you should perform iterations of IORD for more accuracy. 

 

Also try to look at simulations. This would definitely be a better way to understand the behavior of the RTL.
Reply