Nios® V/II Embedded Design Suite (EDS)
Support for Embedded Development Tools, Processors (SoCs and Nios® V/II processor), Embedded Development Suites (EDSs), Boot and Configuration, Operating Systems, C and C++
12748 Discussions

Nios Instruction Execution Count

Altera_Forum
Honored Contributor II
2,602 Views

Hi everyone, 

 

I was wondering if there's any way to know how many instructions a Nios processor has executed. So what I need to do is to start the processor and after say 1M instructions, I want it to somehow trigger a signal for me. Any help is appreciated, 

 

Cheers
0 Kudos
8 Replies
Altera_Forum
Honored Contributor II
812 Views

If you enable the level 4 debug module you could probably sample the off-chip trace to count the number of instructions going by.

0 Kudos
Altera_Forum
Honored Contributor II
812 Views

Maybe the number of cpu cycles is accurate enough - in which case a hardware counter clocked by sys_clk can be used ....

0 Kudos
Altera_Forum
Honored Contributor II
812 Views

thanks for the responses. Well, I won't be able to use clock cycle, because the minor differences is my point in doing this, I want to benchmark the processor. 

 

The debug module unfortunately is not clear for me how to use it, and I am targeting instruction counts in the order of 1,000,000,000, would that be possible? Does the debug module slow the processor? or is the debug module virtually transparent to the cpu? Thanks guys.
0 Kudos
Altera_Forum
Honored Contributor II
812 Views

It ought to be possible to write a slave for the 'tightly coupled instruction memory' (that just accesses some M9K memory) and count the number of code fetches. 

 

Or are you really trying to benchmark the compiler! In which case analysing the generated code might be better. 

 

However, overall performance is significantly changed by memory delays, execution stalls, and mis-predicted branches. 

So any attempt to benchmark the processor should really take these into account. 

Unless, of course, you are trying to deterimine the number of lost cycles (ie comparing the clock count to the instruction count).
0 Kudos
Altera_Forum
Honored Contributor II
812 Views

that's great I'll look into the slave for coupled memory, thanks a lot. Yes that is exactly what I need to do, I need to get an IPC for NiosII. 

 

Cheers, 

Kaveh
0 Kudos
Altera_Forum
Honored Contributor II
812 Views

Using your own memory block would only work if you don't use an instruction cache....

0 Kudos
Altera_Forum
Honored Contributor II
812 Views

The IPC is documented in the processor reference handbook, generally it is 1. 

 

However: 

- The result of a non-ALU instruction (ld, mul, shift) has to go via the register file, resulting in a 2 clock delay before the value can be used. I presume there is some 'result forwarding logic' within the ALU. 

- Any Avalon MM transfers (including writes) are done synchronously and take at least 3 clocks (I haven't seen any writes taking 2 clocks). 

- I think the SDRAM interface buffers at least 2 write requests - so the first 2 randon writes to SDRAM (and maybe other memory) complete in 3 cycles. 

- As documented, branches are 1 clock predicted not taken, 2 clocks predicted taken, 4 clocks mispredicted. 

 

For relatively small code blocks it is possible to adjust the C source to avoid almost all the pipeline stalls (usually by loading values into local variables and using 'asm volatile ("":::"memory")' to stop gcc reordering instructions). 

 

The __builtin_expect() can be used to set the static branch prediction for conditionals (sometimes it is necessary to put an empty 'asm' statement in an otherwise empty 'else' branch). 

 

If you are trying to squeeze out every last drop of performance, then the dynamic branch prediction will only slow things down - getting the source right is better unless a single instruction needs to predict in different directions at different times. Your Altera rep should be able to tell you how to disable the dynamic branch prediction.
0 Kudos
Altera_Forum
Honored Contributor II
812 Views

Maybe I should tell you what exactly is my point in doing this. I'm writing my own processor with NiosII ISA, and wanted to compare IPCs. I just wanted to get exact IPC from both processors running my own benchmarks, but looks like there's no way to get the instruction count from NiosII. Thanks everyone.

0 Kudos
Reply