- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all!
Sorry for prvious empty thread, I have no idea what happened. I'm experiencing some strange latency when using IOWR macros. I've added added custom 8-bit slave to QSYS and got huge number of cycles to read/write its registers. I thought that this issue related to some mistakes in my peripherial but then I've tried to read on-chip memory and got the same result! Here is the code, I'm using performance counter:int main() {
PERF_RESET(PERFORMANCE_COUNTER_0_BASE);
PERF_START_MEASURING(PERFORMANCE_COUNTER_0_BASE);
PERF_BEGIN(PERFORMANCE_COUNTER_0_BASE,1);
IORD_8DIRECT(ONCHIP_MEMORY2_0_BASE, PRER_LO);
PERF_END(PERFORMANCE_COUNTER_0_BASE,1);
PERF_BEGIN(PERFORMANCE_COUNTER_0_BASE,2);
IORD_16DIRECT(ONCHIP_MEMORY2_0_BASE, PRER_LO);
PERF_END(PERFORMANCE_COUNTER_0_BASE,2);
PERF_BEGIN(PERFORMANCE_COUNTER_0_BASE,3);
IORD_32DIRECT(ONCHIP_MEMORY2_0_BASE, PRER_LO);
PERF_END(PERFORMANCE_COUNTER_0_BASE,3);
perf_print_formatted_report(PERFORMANCE_COUNTER_0_BASE,50000000,3,"IORD_8","IORD_16","IORD32");
return 0;
}
And what I get: --Performance Counter Report--
Total Time : 10 usec (532 clock-cycles)
+---------------+-----+------------+---------------+------------+
| Section | % | Time (usec)| Time (clocks)|Occurrences |
+---------------+-----+------------+---------------+------------+
| IORD_8| 9 | 1 | 51 | 1 |
+---------------+-----+------------+---------------+------------+
| IORD_16| 9 | 1 | 50 | 1 |
+---------------+-----+------------+---------------+------------+
| IORD32| 8 | 0 | 47 | 1 |
+---------------+-----+------------+---------------+------------+
Ok, timer adds some time to this, as I measured, 30 clock cycles. So we have about 20 clock cycles per word, still bad. What could I do wrong? I'm using Quartus 15.0 Web Edition and Nios 2 Gen 2 /e.
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You will probably find that the performance counters represent considerable overhead. Try reading the registers 10,000 times in one performance counter. Then divide the time by 10,000 to get a result that isn't lost in noise.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Nios 2 /e is a slow processor that required 5 clock cycles minimum to complete 1 instruction.
The overhead from the performance counter is larger in comparison to the total measured time per IORD instruction (30/50). I agree with Galfonz that you should perform iterations of IORD for more accuracy. Also try to look at simulations. This would definitely be a better way to understand the behavior of the RTL.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page