Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

Intel SIMD float precision

pilarsor
Beginner
847 Views
Hi all,
I'mplaying around withSIMD optimizations using intrinsics, andI'm currently running intoa precisionproblem, when calculating with large floating pointnumbers.
I use a __m128 data type, and addas many 1's in each of the 4 float registers, that the sum ( a single float ) should contain 1e+009.
in code:
-----------
__m128 sum4 = _mm_set_ps1( 0.0f );
for ( int i=0; i<2.5e+008; ++i )
{
__m128 tmp4 = _mm_set_ps1( 1.0f );
sum4 = _mm_add_ps( sum4, tmp4 );
}
float* res = (float*)&sum4;
float sum = res[0] + res[1] + res[2] + res[3];
It seems, that the largest possible float number is 6.71089e+007, instead of 1e+009. When I reduce the loop limit to 2.5e+005, then the sum correctly results in 1e+006.
Is there that much precision loss, when computing with __m128 packed data?
Can this precisionproblem be solved somehow?
Ormight somethingbe wrongin this code snippet?
Thank you very much in advance!
Kind regards,
Oliver
0 Kudos
7 Replies
TimP
Honored Contributor III
847 Views
Yes, if you repeatedly add 1 to a float, you can expect it to "saturate" at 2/FLT_EPSILON. This is in accordance with the definition of FLT_EPSILON in "float.h".
0 Kudos
jim_dempsey
Beginner
847 Views
The 4-byte floats have 23-bits of mantissa and an implied 1 in the 24'th bit. Therefore the largest wholenumber (incrimenting from 0) you can represent has 24 bits of binary 1's or 2^24-1 = 16777215
As you go beyond this the 1.0 you attempt to add is beyond the precision represented by the whole numbers. Single precision is a tad over 7 places
Jim Dempsey
0 Kudos
TimP
Honored Contributor III
847 Views
That would be the largest odd whole number.
0 Kudos
pilarsor
Beginner
847 Views
Hi,
Thank youfor your valuable answers so far, but one thing I don't quite understand...
Isn't there an 8 bit exponent in this float format, like in the IEEE 754 standard?
Like: 23 bit mantissa, 1 bit sign, 8 bit exponent
So there should be an approximatevalue of (+/-)1....*2^(127-exponent). That is in the range of 10^+38 and 10^-38.
Why is the number computed as 2^mantissa and not as it is done in the IEEE format... is there a reason for it to do it differently in the SIMD float type?
Oliver
0 Kudos
TimP
Honored Contributor III
847 Views
Yes, the xmm register format is standard IEEE754, with no extended precision magic. If you use the compiler's facility to vectorize sum reduction, by batching into 8 sums, you should see 8 times as large a threshold for this roundoff error.
Let me spell out the implications of the hints given in previous postings. Your sum proceeds with exact results until you reach the largest possible odd whole number, and one beyond. The next addition incurs roundoff error. As there are no more odd whole numbers, it rounds to an even value. The next result after that is made exact by rounding down. Beyond that, the addition of 1 is only a quarter of the way to the next representable number, so it has to be ignored.
0 Kudos
pilarsor
Beginner
847 Views
I assume, the largest odd number is 1.6777215e+007 per float element inside the __m128 packed structure.
I made an additional test, that added 1000000 times the value 1.125 to the sum.
The non-SIMD version, that uses standard float types gives a result of: 1.42383e+006
The SIMD version, using __m128 types gives a result of:
1.42616e+006
This is an error of 0.00233e+006=2.33e+003, which I think is quite a lot...
So it seems it'snot working in full precision, while below the largest representable number, ... but wheremight the error come from?
thx.
Oliver
0 Kudos
jim_dempsey
Beginner
847 Views

Yes, it is the largest whole odd number. However, note that if you have a larger even number e.g. 16777216*2 then adding 1.0 would be adding a number benieth the precision of the floating point representation.

The epsilon mentioned in the other post was in regards to the fractional bits maintained beyond the precision during the internal FPU computation up until the point where the value gets stored and then the excess bits get lost.

Jim Dempsey

0 Kudos
Reply