Nios® V/II Embedded Design Suite (EDS)
Support for Embedded Development Tools, Processors (SoCs and Nios® V/II processor), Embedded Development Suites (EDSs), Boot and Configuration, Operating Systems, C and C++
12589 Discussions

how to use the floating point hardware in nios II custom instructions

Altera_Forum
Honored Contributor II
3,258 Views

hi  

i want to compare the speed difference of floating point calc between custom instruction floating point and software. 

 

this is my code:# include <stdio.h># include <altera_avalon_pio_regs.h># include "altera_avalon_pio_regs.h"# include "system.h" 

 

 

int main() 

printf("Hello from Nios II!\n"); 

float a=3.1415926; 

float b=1.2578; 

float c=0; 

while(1) 

IOWR_ALTERA_AVALON_PIO_DATA(PIO_0_BASE,0xf); 

 

 

c=a*b; 

IOWR_ALTERA_AVALON_PIO_DATA(PIO_0_BASE,0x0); 

c=a-b; 

IOWR_ALTERA_AVALON_PIO_DATA(PIO_0_BASE,0xf); 

c=b-a; 

IOWR_ALTERA_AVALON_PIO_DATA(PIO_0_BASE,0x0); 

 

 

return 0; 

 

 

in the code,i use pio to estimate the calc time roughly, 

firstly, i delete the floating point hardware, generate the hdl, compile the quartus project, build the software, then i find the calc time is 2.3us(the sys clk is 100M); 

then, i add the floating point hardware, and redo these steps, i find the calc time is still 2.3us 

i guess the nios did not use the floating point hardware, how can i fix this problem? 

is there any examples to show me how to use it? 

thanks!
0 Kudos
14 Replies
Altera_Forum
Honored Contributor II
1,491 Views

i check the system.h 

and i find this: 

/* 

* Custom instruction macros 

*/ 

 

# define ALT_CI_NIOS_CUSTOM_INSTR_FLOATING_POINT_0(n,A,B) __builtin_custom_inii(ALT_CI_NIOS_CUSTOM_INSTR_FLOATING_POINT_0_N+(n&ALT_CI_NIOS_CUSTOM_INSTR_FLOATING_POINT_0_N_MASK),(A),(B)) 

# define ALT_CI_NIOS_CUSTOM_INSTR_FLOATING_POINT_0_N 0xfc 

# define ALT_CI_NIOS_CUSTOM_INSTR_FLOATING_POINT_0_N_MASK ((1<<2)-1) 

 

i also tried  

c=ALT_CI_NIOS_CUSTOM_INSTR_FLOATING_POINT_0(3,a,b); 

c=ALT_CI_NIOS_CUSTOM_INSTR_FLOATING_POINT_0(1,a,b); 

c=ALT_CI_NIOS_CUSTOM_INSTR_FLOATING_POINT_0(2,b,a); 

c=ALT_CI_NIOS_CUSTOM_INSTR_FLOATING_POINT_0(0,b,a) 

 

but the result is 0.0
Altera_Forum
Honored Contributor II
1,491 Views

I assume you are using Floating Point Hardware 1 (not the 2). You will need to check your objdump to compare between the one with/out the FPH. If the custom instructions are properly inserted, you should see "custom" assembly codes appear in your objdump. 

 

Floating point reference: https://www.altera.com/content/dam/altera-www/global/en_us/others/literature/ug/ug_fph2.pdf
Altera_Forum
Honored Contributor II
1,491 Views

Forgot to ask you, what optimization did you use for compiling the C code? The compiler seems to be trying to be smart by optimizing away the floating points operations.

Altera_Forum
Honored Contributor II
1,491 Views

hi mikedsouze, thanks for replying 

1.what is objdump?  

2. i did use some optimization to reduce the code size, since there is no sdram in my test board: 

https://www.alteraforum.com/forum/attachment.php?attachmentid=11275
0 Kudos
Altera_Forum
Honored Contributor II
1,491 Views

1. <application_name>.objdump is a file found besides your <application_name>.elf file. It is a readable object file that gets translated from elf. It will shows you the exact code compiled. 

 

To make sure that the floating point operations gets compiled, you should see something similar to below (for c=a*b operation): 

 

Without floating point hardware, software emulation: 

c=a*b; 

2ac: d9000217 ldw r4,8(sp) 

2b0: d9400117 ldw r5,4(sp) 

2b4: 000031c0 call 31c <__mulsf3> 

2b8: d8800015 stw r2,0(sp) 

 

With floating point hardware: 

c=a*b; 

2ac: d8800217 ldw r2,8(sp) 

2b0: d8c00117 ldw r3,4(sp) 

2b4: 10c5ff32 custom 252,r2,r2,r3 

 

2. do you know what is the optimization used? ie -O2 or -O0?
0 Kudos
Altera_Forum
Honored Contributor II
1,491 Views

hi mikedsouze, thanks for your help. 

i check the objdump file, i find it is a little like a assembly language file. 

when i add the floating point hardware in nios II, i use both  

c=a*b 

and  

c=ALT_CI_NIOS_CUSTOM_INSTR_FLOATING_POINT_0(0,a,b); 

 

then i check the objdump file: 

c=a*b; 

1082c4: e13ffc17 ldw r4,-16(fp) 

1082c8: e17ffd17 ldw r5,-12(fp) 

1082cc: 01084140 call 108414 <__mulsf3> 

1082d0: 1007883a mov r3,r2 

1082d4: e0fffe15 stw r3,-8(fp) 

 

 

 

c=ALT_CI_NIOS_CUSTOM_INSTR_FLOATING_POINT_0(0,b,a); 

1083d8: e13ffd17 ldw r4,-12(fp) 

1083dc: 0108d340 call 108d34 <__fixsfsi> 

1083e0: 1021883a mov r16,r2 

1083e4: e13ffc17 ldw r4,-16(fp) 

1083e8: 0108d340 call 108d34 <__fixsfsi> 

1083ec: 8085ff32 custom 252,r2,r16,r2 

1083f0: 1009883a mov r4,r2 

1083f4: 0108dac0 call 108dac <__floatsisf> 

1083f8: 1007883a mov r3,r2 

1083fc: e0fffe15 stw r3,-8(fp) 

 

 

 

it looks like the floating point hardware will only be used when i use  

c=ALT_CI_NIOS_CUSTOM_INSTR_FLOATING_POINT_0(0,b,a); 

 

but when run debug, the result of the c=ALT_CI_NIOS_CUSTOM_INSTR_FLOATING_POINT_0(0,b,a); is 0.0
0 Kudos
Altera_Forum
Honored Contributor II
1,491 Views

the optimization is -O0, no optization

0 Kudos
Altera_Forum
Honored Contributor II
1,491 Views

That is very weird. With floating-point hardware inserted and BSP re-generated, c=b*a should directly infer "custom 252". Can you try to delete BSP and create a new one? 

 

I tried c=ALT_CI_NIOS_CUSTOM_INSTR_FLOATING_POINT_0(0,b,a) ;, it gives me 0.0 as well. I am curious why there is the additional fixsfsi. It is trying to convert single precision to integer operation, which could mess the operation up.  

2f0: 00003900 call 390 <__fixsfsi> 

2f4: 9009883a mov r4,r18 

2f8: 1023883a mov r17,r2 

2fc: 00003900 call 390 <__fixsfsi> 

 

 

What version of Quartus are you using? I tried 14.1/15.0 works for me,
0 Kudos
Altera_Forum
Honored Contributor II
1,491 Views

okay, I think I know what is the problem, the system.h is passing the wrong argument for the builtin function. It supposed to be fnff instead of inii. 

# define ALT_CI_NIOS_CUSTOM_INSTR_FLOATING_POINT_0(n,A, __builtin_custom_fnff(ALT_CI_NIOS_CUSTOM_INSTR_FLO ATING_POINT_0_N+(n&ALT_CI_NIOS_CUSTOM_INSTR_FLOATI NG_POINT_0_N_MASK),(A),() 

 

If you update the system.h, it should remove the additional fixsfsi. Hopefully this works.
Altera_Forum
Honored Contributor II
1,491 Views

thank you very much! 

when i change the system.h file, the objdump file becomes: 

c=a*b; 

1082c0: e13ffd17 ldw r4,-12(fp) 

1082c4: e17ffe17 ldw r5,-8(fp) 

1082c8: 01083c00 call 1083c0 <__mulsf3> 

1082cc: 1007883a mov r3,r2 

1082d0: e0ffff15 stw r3,-4(fp) 

 

 

c=ALT_CI_NIOS_CUSTOM_INSTR_FLOATING_POINT_0(0,b,a); //* 

10832c: e0bffe17 ldw r2,-8(fp) 

108330: e0fffd17 ldw r3,-12(fp) 

108334: 10c5ff32 custom 252,r2,r2,r3 

108338: e0bfff15 stw r2,-4(fp) 

 

and the c=ALT_CI_NIOS_CUSTOM_INSTR_FLOATING_POINT_0(0,b,a); return the right result, and the calc time is much lesss than the c=a*b 

 

how do you know to change the inii to fnff in system.h file? 

i am new to altera and nios, i want to learn more. 

 

it seems like my project still has some problems, but at least i know how to use the floating point hardware in nios II custom instructions 

 

 

but i still do not understand why the system.h in my project have to change manually? and why i can not use c=a*b. 

may be i can try floating point hardware2. 

 

 

thanks again!
0 Kudos
Altera_Forum
Honored Contributor II
1,491 Views

I think somehow the following lines are not included into your public.mk file found in your BSP folder: 

# Hardware Floating Point Custom Instruction without Divider present.  

ALT_CFLAGS += -mcustom-fpu-cfg=60-1 

ALT_LDFLAGS += -mcustom-fpu-cfg=60-1 

 

What Quartus version are you using? I guess you could open nios2-bsp-editor in the BSP folder and add "-mcustom-fpu-cfg=60-1" into the bsp_cflags_user_user_flags under main>>advanced tab.  

Or try to create a new BSP...... 

 

The Altera Nios II GCC supports the following builtin functions: https://gcc.gnu.org/onlinedocs/gcc/altera-nios-ii-built-in-functions.html 

 

I guess I am a curious person that likes to try to understand things.
0 Kudos
Altera_Forum
Honored Contributor II
1,491 Views

i use quartus 15.0

0 Kudos
Altera_Forum
Honored Contributor II
1,491 Views

this is what my public.mk shows : 

 

# Hardware Divider present.  

# setting HARDWARE_DIVIDE is false 

ALT_CFLAGS += -mno-hw-div 

 

# Hardware Multiplier present.  

# setting HARDWARE_MULTIPLY is true 

ALT_CFLAGS += -mhw-mul 

 

# Hardware Mulx present.  

# setting HARDWARE_MULX is false 

ALT_CFLAGS += -mno-hw-mulx 

...... 

 

 

# Enable BSP generation to query if SOPC system floating point custom  

# instruction with a divider is present. If true ignores export of 'ALT_CFLAGS  

# += -mcustom-fpu-cfg=60-2' and 'ALT_LDFLAGS += -mcustom-fpu-cfg=60-2' to  

# public.mk if the custom instruction is found in the system. none  

# setting hal.make.ignore_system_derived.hardware_fp_cust_inst_divider_present is false 

 

there are no 

"# Hardware Floating Point Custom Instruction without Divider present.  

ALT_CFLAGS += -mcustom-fpu-cfg=60-1 

ALT_LDFLAGS += -mcustom-fpu-cfg=60-1" 

 

in my project , the bsp editor=>settings=>advanced=>hal.make.ignore_system_derived 

i found the "hardware_fp_cust_inst_no_divider_present" is checked, 

i think maybe that is the reason, so i uncheck it, then regenerate the bsp 

rebuild the project, change the system.h again(the soft ware generate the "__builtin_custom_inii" again, i have to change it to "__builtin_custom_fnff" manually) 

then i find in objdump file: 

 

c=a*b; 

1082c0: e0fffd17 ldw r3,-12(fp) 

1082c4: e0bffe17 ldw r2,-8(fp) 

1082c8: 1885ff32 custom 252,r2,r3,r2 

1082cc: e0bfff15 stw r2,-4(fp) 

 

so problem solved! 

 

so it seems like if i want to use the floating point hardware in nios II custom instructions, i have to add the ip in Qsys=>change the bsp setting=>change the system.h 

 

why the user guide did tell me? "tt_floating_point_custom_instructions.pdf"
0 Kudos
Altera_Forum
Honored Contributor II
1,491 Views

okey, i think the problem is the "hardware_fp_cust_inst_no_divider_present" option 

if i uncheck it, the objdump file is  

c=a*b; 

1082c4: e0fffc17 ldw r3,-16(fp) 

1082c8: e0bffd17 ldw r2,-12(fp) 

1082cc: 1885ff32 custom 252,r2,r3,r2 

1082d0: e0bffe15 stw r2,-8(fp) 

 

c=ALT_CI_NIOS_CUSTOM_INSTR_FLOATING_POINT_0(0,b,a); //* 

108324: e13ffd17 ldw r4,-12(fp) 

108328: 01084000 call 108400 <__fixsfsi> 

10832c: 1021883a mov r16,r2 

108330: e13ffc17 ldw r4,-16(fp) 

108334: 01084000 call 108400 <__fixsfsi> 

108338: 8085ff32 custom 252,r2,r16,r2 

10833c: 1009883a mov r4,r2 

108340: 01084780 call 108478 <__floatsisf> 

108344: 1007883a mov r3,r2 

108348: e0fffe15 stw r3,-8(fp) 

 

at this time, the system.h file is still "__builtin_custom_inii",  

it leads to c=ALT_CI_NIOS_CUSTOM_INSTR_FLOATING_POINT_0(0,b,a); returns wrong value, like mikedsouze said . 

but that is not important, because the c=a*b works.
Reply