IOWR/IORD latency

Altera_Forum · ‎07-08-2015

Hi all!

Sorry for prvious empty thread, I have no idea what happened.

I'm experiencing some strange latency when using IOWR macros. I've added added custom 8-bit slave to QSYS and got huge number of cycles to read/write its registers. I thought that this issue related to some mistakes in my peripherial but then I've tried to read on-chip memory and got the same result!

Here is the code, I'm using performance counter:

int main() { 
PERF_RESET(PERFORMANCE_COUNTER_0_BASE);
PERF_START_MEASURING(PERFORMANCE_COUNTER_0_BASE);
PERF_BEGIN(PERFORMANCE_COUNTER_0_BASE,1);
IORD_8DIRECT(ONCHIP_MEMORY2_0_BASE, PRER_LO);
PERF_END(PERFORMANCE_COUNTER_0_BASE,1);
PERF_BEGIN(PERFORMANCE_COUNTER_0_BASE,2);
IORD_16DIRECT(ONCHIP_MEMORY2_0_BASE, PRER_LO);
PERF_END(PERFORMANCE_COUNTER_0_BASE,2);
PERF_BEGIN(PERFORMANCE_COUNTER_0_BASE,3);
IORD_32DIRECT(ONCHIP_MEMORY2_0_BASE, PRER_LO);
PERF_END(PERFORMANCE_COUNTER_0_BASE,3);
perf_print_formatted_report(PERFORMANCE_COUNTER_0_BASE,50000000,3,"IORD_8","IORD_16","IORD32");
return 0;
}

And what I get:

--Performance Counter Report--
Total Time : 10 usec (532 clock-cycles)
+---------------+-----+------------+---------------+------------+
| Section       |  %  | Time (usec)|  Time (clocks)|Occurrences |
+---------------+-----+------------+---------------+------------+
|         IORD_8|   9 |          1 |            51 |         1  |
+---------------+-----+------------+---------------+------------+
|        IORD_16|   9 |          1 |            50 |         1  |
+---------------+-----+------------+---------------+------------+
|         IORD32|   8 |          0 |            47 |         1  |
+---------------+-----+------------+---------------+------------+

Ok, timer adds some time to this, as I measured, 30 clock cycles. So we have about 20 clock cycles per word, still bad. What could I do wrong?

I'm using Quartus 15.0 Web Edition and Nios 2 Gen 2 /e.

Altera_Forum · ‎07-08-2015

You will probably find that the performance counters represent considerable overhead. Try reading the registers 10,000 times in one performance counter. Then divide the time by 10,000 to get a result that isn't lost in noise.

Altera_Forum · ‎10-02-2015

Nios 2 /e is a slow processor that required 5 clock cycles minimum to complete 1 instruction.

The overhead from the performance counter is larger in comparison to the total measured time per IORD instruction (30/50). I agree with Galfonz that you should perform iterations of IORD for more accuracy.

Also try to look at simulations. This would definitely be a better way to understand the behavior of the RTL.