Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Altera_Forum
Honored Contributor I
750 Views

how many cycles does each command take in nios?

in my project, i make a loop to blink a led: 

int main(void) 

int i=0; 

for(i=0;i<100;i++) 

IOWR_ALTERA_AVALON_PIO_DATA(QD_PIO_0_BASE,0xff); 

delay(); 

IOWR_ALTERA_AVALON_PIO_DATA(QD_PIO_0_BASE,0x0); 

delay(); 

// printf("Hello NIOS II! %d\n",i); 

return 0; 

void delay(void) 

alt_u32 i =0; 

while(i < 100000) 

i++; 

the output wave is 30ms period, and my clk frequency is 100M, 

so it looks like the "i++", takes 15 clock cycles, is it right? 

how can i know each command takes how many cycles in my project?
0 Kudos
3 Replies
Altera_Forum
Honored Contributor I
28 Views

If you are worried about how long NIOS instructions take, be aware that NIOS is a very slow processor. The free one is staggeringly slow and inefficent. If this is a concern, use one of the SoC chips with built in ARM processor or write your algorithm in Verilog or VHDL. Almost any external micro will be faster than NIOS as well.

Altera_Forum
Honored Contributor I
28 Views

thanks for your reply. 

i am not worried about it, i just want to know how long NIOS instructions take,  

this may be helpful. 

 

i read the objdump file, i looks like a assembly language, about the i++, it shows: 

void delay(void) 

alt_u32 i =0; 

while(i < 100000) 

80031c: e0ffff17 ldw r3,-4(fp) 

800320: 008000b4 movhi r2,2 

800324: 10a1a7c4 addi r2,r2,-31073 

800328: 10fff92e bgeu r2,r3,800310 <__reset+0xff7f8310> 

i++; 

80032c: e037883a mov sp,fp 

800330: df000017 ldw fp,0(sp) 

800334: dec00104 addi sp,sp,4 

800338: f800283a ret 

 

does each line cost one clock?
Altera_Forum
Honored Contributor I
28 Views

https://www.altera.com/content/dam/altera-www/global/en_us/pdfs/literature/hb/nios2/n2cpu_nii5v1.pdf 

See "Instruction Performance" on page 5-11, 5-19, or 5-21 depending on what core you're using. 

 

Your question was asking about instruction performance, but if you really just care about higher-level C function/loop execution times, AN391 is a good read: https://www.altera.com/content/dam/altera-www/global/en_us/pdfs/literature/an/an391.pdf Especially the Performance Counter IP block is very useful. 

 

 

Many things can be done in a single cycle. But getting the compiler to emit the best code, and constructing optimized hardware, can all become a small research project by themselves. 

 

For example, if you just rewrote your delay() in a form that GCC likes just a little bit better, it looks like it would average (3) cycles per loop iteration on an "F" core. 

void delay(void) { register int i =0; const register int limit = 100000; for(i=0; i < limit; i++) { } }  

 

And the assembly (gcc -S foo.c): (.L3 is the loop iterator increment, followed by the .L2 "blt" compare against the 100000) 

delay: addi sp, sp, -12 stw fp, 8(sp) stw r17, 4(sp) stw r16, 0(sp) addi fp, sp, 8 mov r17, zero movhi r16, 2 addi r16, r16, -31072 mov r17, zero br .L2 .L3: addi r17, r17, 1 .L2: blt r17, r16, .L3 addi sp, fp, -8 ldw fp, 8(sp) ldw r17, 4(sp) ldw r16, 0(sp) addi sp, sp, 12 ret
Reply