Intel SIMD float precision

pilarsor · ‎03-07-2006

Hi all,

I'mplaying around withSIMD optimizations using intrinsics, andI'm currently running intoa precisionproblem, when calculating with large floating pointnumbers.

I use a __m128 data type, and addas many 1's in each of the 4 float registers, that the sum ( a single float ) should contain 1e+009.

in code:

-----------

__m128 sum4 = _mm_set_ps1( 0.0f );

for ( int i=0; i<2.5e+008; ++i )

{

__m128 tmp4 = _mm_set_ps1( 1.0f );

sum4 = _mm_add_ps( sum4, tmp4 );

}

float* res = (float*)&sum4;

float sum = res[0] + res[1] + res[2] + res[3];

It seems, that the largest possible float number is 6.71089e+007, instead of 1e+009. When I reduce the loop limit to 2.5e+005, then the sum correctly results in 1e+006.

Is there that much precision loss, when computing with __m128 packed data?

Can this precisionproblem be solved somehow?

Ormight somethingbe wrongin this code snippet?

Thank you very much in advance!

Kind regards,

Oliver

TimP · ‎03-07-2006

Yes, if you repeatedly add 1 to a float, you can expect it to "saturate" at 2/FLT_EPSILON. This is in accordance with the definition of FLT_EPSILON in "float.h".

jim_dempsey · ‎03-07-2006

The 4-byte floats have 23-bits of mantissa and an implied 1 in the 24'th bit. Therefore the largest wholenumber (incrimenting from 0) you can represent has 24 bits of binary 1's or 2^24-1 = 16777215

As you go beyond this the 1.0 you attempt to add is beyond the precision represented by the whole numbers. Single precision is a tad over 7 places

Jim Dempsey

TimP · ‎03-08-2006

That would be the largest odd whole number.

pilarsor · ‎03-08-2006

Hi,

Thank youfor your valuable answers so far, but one thing I don't quite understand...

Isn't there an 8 bit exponent in this float format, like in the IEEE 754 standard?

Like: 23 bit mantissa, 1 bit sign, 8 bit exponent

So there should be an approximatevalue of (+/-)1....*2^(127-exponent). That is in the range of 10^+38 and 10^-38.

Why is the number computed as 2^mantissa and not as it is done in the IEEE format... is there a reason for it to do it differently in the SIMD float type?

Oliver

TimP · ‎03-08-2006

Yes, the xmm register format is standard IEEE754, with no extended precision magic. If you use the compiler's facility to vectorize sum reduction, by batching into 8 sums, you should see 8 times as large a threshold for this roundoff error.
Let me spell out the implications of the hints given in previous postings. Your sum proceeds with exact results until you reach the largest possible odd whole number, and one beyond. The next addition incurs roundoff error. As there are no more odd whole numbers, it rounds to an even value. The next result after that is made exact by rounding down. Beyond that, the addition of 1 is only a quarter of the way to the next representable number, so it has to be ignored.

pilarsor · ‎03-08-2006

I assume, the largest odd number is 1.6777215e+007 per float element inside the __m128 packed structure.

I made an additional test, that added 1000000 times the value 1.125 to the sum.

The non-SIMD version, that uses standard float types gives a result of: 1.42383e+006

The SIMD version, using __m128 types gives a result of:

1.42616e+006

This is an error of 0.00233e+006=2.33e+003, which I think is quite a lot...

So it seems it'snot working in full precision, while below the largest representable number, ... but wheremight the error come from?

thx.

Oliver

jim_dempsey · ‎03-08-2006

Yes, it is the largest whole odd number. However, note that if you have a larger even number e.g. 16777216*2 then adding 1.0 would be adding a number benieth the precision of the floating point representation.

The epsilon mentioned in the other post was in regards to the fractional bits maintained beyond the precision during the internal FPU computation up until the point where the value gets stored and then the excess bits get lost.

Jim Dempsey