Re: NIOS2 HW & DIV Instructions + FPU - Page 2

Altera_Forum · ‎08-12-2011

Hello i am in charge of designing a system that does a lot of complex sin/cos/* instructions and i need them to be fast.

I am using NIOS2-F at 100mhz in a Cyclone IV.

I've enabled HW and DIV by hardware at NIOS2-F and i am using 64K data and instruction cache.

My software is being compiled with these flags

CFLAGS = -Wall  -DNOCRYPT -mhw-div -mhw-mul -mcustom-fpu-cfg=60-1 -mcustom-fpu-cfg=60-2 -O3

I tried to use the Custom FPU instruction at QSys but my performace is slower with Custom FPU instruction attached to NIOS2-F at QSYS (why?)

Is there anything else i can do? Any suggestions?

Altera_Forum · ‎08-18-2011

Presumably there is a separate compiler option than can be turned back off? (without doing a compiler rebuild).

Even if that does mean you have to explicitly mark constants as 'float'.

You might also need to set another option to let the compiler do certain FP arithmetic itself - instead of generating code to do it. The issue here is that the compiler might not generate exactly the same bit-pattern as the target.

Altera_Forum · ‎08-18-2011

Aprado - your last example can be optimised away again ....

Altera_Forum · ‎08-18-2011

--- Quote Start ---

Aprado - your last example can be optimised away again ....

--- Quote End ---

Yup however i compiled without the -o flag and took a look at the ASM and it is calling a custom instruction to do the float operation

Altera_Forum · ‎08-18-2011

Hmmm... the code altera added to gcc to default FP constants to 'float' is trully borked.

The option is normally selectable as -f[no-]single-precision-constant. However, rather than set this when the -mcustom-fpu-cfg options is seen, it is done at the end of option processing so cannot be turned off from the command line.

Worse still, the generation of fp custom instructions can be enabled by a pragma - this will also force single precision constants from then on!

Seems tempting to rebuild the compiler with the assignment "flag_single_precision_constant = 1;" moved into the argument saving code (if not deleted completely).

Altera_Forum · ‎08-18-2011

That's correct, I looked around as well and couldn't find a clean way to disable having constants treated as single precision.

One way is to use the suffix of 'l' for doubles and 'f' for floats but that can be a pain if you have a lot of constants scattered around in the code.

The way I would probably do it would be to generate the FPU, feed the HDL component editor, then pass the appropriate flags to the compiler for +, -, *, / without using the 60-1 or 60-2 flags. I have never done this since I typically use "YAFPU" that I posted over in the alterawiki since it has more operators in it.

Aprado I have not use the configurable FPU before but I think I know which one it is, so when using that one if you have to pass in compiler flags for each floating point operator then you probably don't need to worry about constants being treated as single precision values.

Altera_Forum · ‎08-18-2011

Yes i need to pass in compiler flags for each floating point operator.

That's great news then. Is the YAFPU faster than the configurable one? I will try it.

Thanks for the help BadOmen and DSL.

Altera_Forum · ‎08-18-2011

I'm not sure which one is faster, the latency counts are visible somewhere in the verilog I think. Also as a heads up YAFPU is old as dirt and is .ptf based so I'm not sure if it works with the tools still. One of these days I'll give it a facelift and add double precision support.

Altera_Forum · ‎08-19-2011

Unless something very obscure goes on, it is only the -custom-fpu-cfg option that forces single precision constants.

The C functions I found are the same horrid ones an arm system I used many years ago ended up using - they are very slow at the best of times [1].

For an embedded system you might get away with:

- No NaN, infinity or -0

- non-ieee rounding (maybe just truncate)

It is also a shame that altera didn't think through the custom instruction interface a little further.

- Separate opcode for FP

- Allow an instruction to disable interrupts before the following instruction. This would allow, for example, a 64bit result to be recovered.

[1] I spent a week or so writing them in arm asm - not that hard.