A question about floating pointing timing

Altera_Forum · ‎06-02-2011

I am tring to find time cost for floating point number calculations.

I have followed the tutorial "Using Nios II Floating-Point Custom Instructions" and tested results on the DE2 board, but the result is different to the tutorial. The tutorial shows a 14 clock cycle for each Custom Instruction.

The tutorial said the result is varied by the hardware setting.

Does any one know:

is there anywhere to find the time cost for floating point custom instruction for de2?

Thank you.

the result i got is about 47 clock cycle...:blink:

--- Quote Start ---

--Performance Counter Report--

Total Time: 0.0869525 seconds (8695248 clock-cycles)

+---------------+-----+-----------+---------------+-----------+

+---------------+-----+-----------+---------------+-----------+

|FP CI ADD |0.541| 0.00047| 47000| 1000|

+---------------+-----+-----------+---------------+-----------+

|FP SW ADD | 54| 0.04697| 4696646| 1000|

+---------------+-----+-----------+---------------+-----------+

--Performance Counter Report--

Total Time: 0.0939769 seconds (9397685 clock-cycles)

+---------------+-----+-----------+---------------+-----------+

+---------------+-----+-----------+---------------+-----------+

|FP CI SUBTRACT | 0.5| 0.00047| 47000| 1000|

+---------------+-----+-----------+---------------+-----------+

|FP SW SUBTRACT | 48.5| 0.04560| 4559580| 1000|

+---------------+-----+-----------+---------------+-----------+

--Performance Counter Report--

Total Time: 0.252263 seconds (25226350 clock-cycles)

+---------------+-----+-----------+---------------+-----------+

+---------------+-----+-----------+---------------+-----------+

|FP CI MULTIPLY |0.194| 0.00049| 49000| 1000|

+---------------+-----+-----------+---------------+-----------+

|FP SW MULTIPLY | 79.9| 0.20154| 20153700| 1000|

+---------------+-----+-----------+---------------+-----------+

--Performance Counter Report--

Total Time: 0.109719 seconds (10971917 clock-cycles)

+---------------+-----+-----------+---------------+-----------+

+---------------+-----+-----------+---------------+-----------+

|FP CI DIVIDE |0.656| 0.00072| 72000| 1000|

+---------------+-----+-----------+---------------+-----------+

|FP SW DIVIDE | 49.7| 0.05450| 5449516| 1000|

+---------------+-----+-----------+---------------+-----------+

--- Quote End ---

The hardware setting is attached at bottom.

Altera_Forum · ‎06-03-2011

I don't know what the numbers should be for a DE2 board but I'll explain how the custom instructions work and how other things play a role.

So the floating point custom instructions perform +, -, *, and optional /. Any time the Nios II processor executes the custom instruction ... instruction it can potentially become a blocking operation. So if a custom instruction took 6 clock cycles to complete then the processor pipeline stalls for 6 clock cycles waiting for the result from the custom instruction.

I forget how the tutorial software is written but I assume it performs a series of floating point operations over an array of data inside a loop (one floating point operator per loop). So if all the data is cached this should be fairly quick, if not then the memory access times will play a role in the inefficiencies for the floating point operator. Another thing that will play a significant role is the optimization level of the compiler.

Since I don't know much about the system you are running the FPU tutorial on my hunch is that there is a lot of latency between the processor and the memory that is used to store the code. If the main memory is on a different clock domain then I would expect the inefficiencies that you are seeing.