Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

Performance of FPU emulation

steven_jennings
Beginner
552 Views
I am looking ta using the IXP465 in a project where I would have ideally wanted a FPU. Could any one point me in the direction of some benchmarks for the performance of the xscale using the fixed point libraries compared to using a real FPU ? i.e. how much cpu time/performance can I expect to lose.
Thanks

Steve
0 Kudos
4 Replies
Vladimir_Dudnik
Employee
552 Views
Hi,
I don't see emulation in the question:

"fixed point" compared "real FPU"

Or you mean anything else? Do you think "fixed point" means implementation of FP operations not implementation of algorithm?

Regards,
Vladimir

0 Kudos
steven_jennings
Beginner
552 Views
Hi vladimir,
sorry about the very badly phrased question.
There are two parts two it, firstly,

1) I would like to know how processor intensive FPU emulation is on the IXP465.

2) I would also like to know how much performance I lose when I take an algorithm which currently uses floating point math, and then re-write this algorithm to use fixed point math.

In order to do this I assume that the floating point algorithm is run on a processor comperable to the 266MHz xscale core, but with an FPU, and that the fixed point algorithm is run on the xscale 266MHz core (either by means of hand coding, or by using the intel fixed point libraries).

Phew!

Steve
0 Kudos
Vladimir_Dudnik
Employee
552 Views
Hi, there is answer from our expert
Here we can do comparison of optimized IXP code based on fixed point calculations and the same function optimized for P4 processor and based on SSE float calculations.

There is an example on the base of the one point FIR function with 32f coefficients and 16s data (for IXP coefficients are converted to the Q15 fixed point format), performance numbers are in cpMACs number of processor clocks per one multiply and accumulate:

order

sx

S2

px(P4)

w7(P4)

2

137.7

98.0

67.0

40.0

8

33.5

28.9

22.0

22.0

32

13.1

7.0

9.5

5.9

128

8.3

3.5

6.4

2.5

Where: sx fixed point C code for IXP

s2 fixed point asm code for IXP
px FPU code generated by C compiler for P4
w7 SSE2 asm code for P4
order number of taps.

As anyone can see there is no so big difference in cpM AC performance, but if we take into account CPU frequency

Several words about emulation: if we use float emulation on IXP, performance degradation can be hundred times.

Regards,
Vladimir

0 Kudos
steven_jennings
Beginner
552 Views
Fantastic! Thanks Vladimir,
I shall need to sit down and digest this information.
Steve
0 Kudos
Reply