Hi everybody.I'm working on a design using Nios II/f that heavily relies on floating point math. In Qsys I addedd Floating Point Hardware and I connected it to the Custom Instruction Master of the Nios II. Apparently everything works. My design is a Neural Network, so basically I have a loop of multiplication\addition and a loop of exponential calculus. The first loop gets faster (about 10x), the second one doesen't change at all. I tried to make a benchmark code, something like:
Performance Start float a = 10; for(i = 0; i < 1000; i++) a = exp(a); Performance EndAnd run it with and without the following pragmas: # pragma no_custom_fadds# pragma no_custom_fsubs# pragma no_custom_fmuls# pragma no_custom_fdivs The performance (about 6Mcycles if I remember correctly) doesen't change. Hardware divide is enabled both in the Floating Point Hardware module and in the Nios II/f. My doubt is that since i'm using qsys and not sopc i didn't find the "custom instructions" tab in the Nios II module where to specify the usage of FP Hardware, I assumed that in Qsys when you connect the module the FP hardware is always used unless de-activating pragmas are declared. Maybe I was wrong here? These are my includes:
# include <stdio.h># include "io.h"# include <sys/alt_alarm.h># include <altera_avalon_performance_counter.h># include "system.h"# include <math.h># include <float.h>Thanks for the help! :)
Yes I tried expf too, the global performance increase but I still don't see any difference with and without pragmas.About the compiler options... are you talking about the BSP settings? I'm using the settings from the "hello world" template. Any particular option I should check?
You need to make sure the float instructions are used when the BSP is built.I might be worth looking at the symbol table for the final image - you probably don't want the float match functions in your image. (You probably don't want the double ones either, but printf() might pull those in.) You might also decide that you only need a limited accuracy exp() function - and write one that is faster but less accurate.
In the BSP editor I've searched for some reference on the FP custom functions.Under Advanced -> Hal -> Make
I tried to turn these options On and Off but with no effect. I'm not quite following you on the symbol table. Where can I find it? About the exp() accuracy, since I need to compute a much more complex function (a TanSig) I'll surely build a custom instruction using a LUT. Still I'd like to be sure to have implemented the FP Hardware correctly ;)
You might need to just make sure that the BSP is all recompiled. Somewhere there is a box for its compiler options (IMHO it should default to -O3, but it doesn't). I think you have to pass the custom instruction numbers for the fp instructions on the compiler command line - so they probably have to go in there.Alternatively, find the code for expf() and compile it as part of your program.
I have a same trouble with divide in Floating Point Hardware. But i don't known why,too. :cry:Hardware divide is enabled both in the Floating Point Hardware module and in the Nios II/f. And i sure divide avalible in Floating Point Hardware because i compare total logic elements in 2 cases (turn on/off dividion Floating Point Hardware): Turn on: 9724 LEs Turn off: 5041 LEs i sure Floating Point Hardware is used because it work with multiply float. Who is resolved it? Can you help me?
1) Check that you are doing 'float' division, not 'double' division (would call divdf3).2) Check that the relevant file(s) have been compiled to use the custom instruction, not calling divsf3.
I 'm starter with Custom Instruction and the document for it is only Nios II Custom Instruction User Guide. It's not enought. Have you got more document or/and tutorial... for it?
I've not seen any 'useful' docs ...The only ones I found just explained how to tick the boxes to enable the FP instructions. I've written a couple of combinatorial custom instructions, what I realised is that the nios doesn't really have an instruction 'decoder', just a great big mux that selects the required result. The A and B register values are read for every instruction, even tightly coupled data memory is read every clock - the value is discarded unless the instruction is a memory read for the required address range. So a combinatorial instuction creates a result value every clock that can be based on all 32bits of the instruction word and the the 32bit values read from the register file.