Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Eswar_Reddy_K_
Beginner
210 Views

precision issues

Jump to solution

Hello Sir,
I have a precision issue with the below code. If I do the calculations for the same input in my calculator I get -13421772.8
Whereas with compiler I get -13421773.0, and this is a considerable difference for us.
The variable used for the above observation is ‘tmp’.
Please help us in resolving this.
Thanks in-advance.

void convert(__m128 &vrz /*inout*/, int art)
{
unsigned int _rounding_mode;
if(1)
{
_rounding_mode = _MM_GET_ROUNDING_MODE();
_MM_SET_ROUNDING_MODE(_MM_ROUND_TOWARD_ZERO);
}
__m128 tmp, scale_vr;
const float scale = (float)((unsigned int)1<<(31-(art)));
scale_vr = _mm_set1_ps(scale);
tmp = _mm_mul_ps(vrz, scale_vr);
vrz = _mm_insert_ps(vrz, _mm_castsi128_ps(_mm_cvtps_epi32(tmp)) , ((1)<<6) | ((1)<<4));
if(1)
{
_MM_SET_ROUNDING_MODE(_rounding_mode);
}
}

void main()
{
float a =( float) -0.8;
m128 vrz;
vrz = _mm_set1_ps(a);
Convert(vrz,7)
}

Thanks,
Eswar Reddy K

0 Kudos
1 Solution
SergeyKostrov
Valued Contributor II
210 Views
The issue you've experienced is Not related to any C++ compiler or command line options, etc. It is related to limitations of Single-Precision arithmetics. In order to improve the precision of your calculations a change to Double-Precision arithmetics needs to be done. Try these simple tests: 16777216.0f + 1.0f = 16777216.0f - !!! - It is Not 16777217.0 due to limitation of Single-Precision arithmetics 16777216.0f + 2.0f = 16777218.0f 16777216.0f + 3.0f = 16777220.0f - !!! - It is Not 16777219.0 due to limitation of Single-Precision arithmetics

View solution in original post

15 Replies
SergeyKostrov
Valued Contributor II
210 Views
I'll take a look at the issue. Could you provide some additional technical details, like: - OS version? 32-bit or 64-bit? - Compiler version and a complete set of command line options?
Eswar_Reddy_K_
Beginner
210 Views
64-bit OS, Visual Studio 2010 (32-bit mode) I am running from VS 2010, debug mode, these are my compiler options: Disabled AVX2 Default false Precise Default
Eswar_Reddy_K_
Beginner
210 Views

sorry display proble... below are my compiler options:

WarningLevel: Level3
Optimization: Disabled
UseProcessorExtensions:AVX2
BasicRuntimeChecks : Default
AdditionalOptions : /fp:precise
FlushDenormalResultsToZero : false
FloatingPointModel: Precise
FloatingPointExpressionEvaluation: Default

jimdempseyatthecove
Black Belt
210 Views

Eswar,

At issue here may be:

float a = (float)-0.8;

Where a does not use the same rounding mode (round down). As a quick test, compile as Debug build. After setting a=, open a Memory window and examine "&a". View as unsigned 1-byte integer. You should see "205 204 76 191". Subtract 1 from the 205 to undo the round up. Had this been zero, then 0-1 produces 255 with borrow propigating to next byte (i.e. subtract 1 from next byte). There will be some cases where the exponent will need to be adjusted, but this is not necessary for this experiment.

Once the value of a has been adjusted, continue and check the result.

For a formal fix, you will have to be careful as to how you preset your parameters that contain fractional values that cannot be precisely represented in binary. 0.1 is one such fraction as is 0.8.

Jim Dempsey

SergeyKostrov
Valued Contributor II
210 Views
Sorry for some delay with my investigation. Hi Eswar & Jim, I just completed tests and reproduced the problem in several configurations, like Debug and Release, 32-bit and 64-bit, with Intel C++ compilers ( versions 12.x & 13.x ) and Microsoft C++ compilers ( VS 2005 & VS 2008 ), with rounding and without rounding, with Floating Point Model set to Precise ( /fp:precise ) or Fast ( /fp:fast ) or Strict ( /fp:strict ). In essence, it doesn't matter what configuration ( or settings ) is selected the _mm_mul_ps ( actually, MULPS instruction ) rounds the results (!). I've created my own test-case and debugged it. Here are some details: Note: 16777216 = 2^24 Correct Result ( True ): 16777216 * 0.8 = 13421772.8 - everything is correct / _mm_mul_ps is Not used Incorrect Result: 16777216 * 0.8 = 13421773.0 - something is wrong / _mm_mul_ps is used / rounding is done by MULPS instruction I will spend some additional time this week however I would consider a workaround since I really do not expect that Intel will release a microcodes patch for the MULPS instruction unless we understand what is wrong.
SergeyKostrov
Valued Contributor II
210 Views
... [ Debug ] Test-Case 1 ( 16777216 * -0.8 ) Expected Values : -13421772.800000 -13421772.800000 -13421772.800000 -13421772.800000 Calculated Values: -13421773.000000 -13421773.000000 -13421773.000000 -13421773.000000 Test-Case 2 ( 16777216 * 0.8 ) Expected Values : 13421772.800000 13421772.800000 13421772.800000 13421772.800000 Calculated Values: 13421773.000000 13421773.000000 13421773.000000 13421773.000000 ... [ Release ] Test-Case 1 ( 16777216 * -0.8 ) Expected Values : -13421772.800000 -13421772.800000 -13421772.800000 -13421772.800000 Calculated Values: -13421773.000000 -13421773.000000 -13421773.000000 -13421773.000000 Test-Case 2 ( 16777216 * 0.8 ) Expected Values : 13421772.800000 13421772.800000 13421772.800000 13421772.800000 Calculated Values: 13421773.000000 13421773.000000 13421773.000000 13421773.000000 ... Note: Intrinsic function _mm_mul_ps is used for Calculated Values.
Eswar_Reddy_K_
Beginner
210 Views

Thanks Sergey & Jim !

I have obsrved same behaviour irrespective of the configuration.

SergeyKostrov
Valued Contributor II
210 Views
Eswar, your results are Absolutely correct and there is Nothing wrong. Also, I've done another set of tests and here are results without rounding issues: ... Test-Case 5 Expected Values : 13421772.800000 13421772.800000 13421772.800000 13421772.800000 Calculated Values: -13421772.800000 -13421772.800000 -13421772.800000 -13421772.800000 Test-Case 6 Expected Values : 13421772.800000 13421772.800000 13421772.800000 13421772.800000 Calculated Values: 13421772.800000 13421772.800000 13421772.800000 13421772.800000 ...
Eswar_Reddy_K_
Beginner
210 Views

Sergey Kostrov,

The results looks ok for test cases 5 & 6.

Can please provide compiler options and other options if any for the test cases 5 & 6

Thanks,

Eswar Reddy K

SergeyKostrov
Valued Contributor II
211 Views
The issue you've experienced is Not related to any C++ compiler or command line options, etc. It is related to limitations of Single-Precision arithmetics. In order to improve the precision of your calculations a change to Double-Precision arithmetics needs to be done. Try these simple tests: 16777216.0f + 1.0f = 16777216.0f - !!! - It is Not 16777217.0 due to limitation of Single-Precision arithmetics 16777216.0f + 2.0f = 16777218.0f 16777216.0f + 3.0f = 16777220.0f - !!! - It is Not 16777219.0 due to limitation of Single-Precision arithmetics

View solution in original post

Bernard
Black Belt
210 Views

Actually rounding is probably done by micro-operation control signal (mulps decoded into corresponding uop).It is interesting what triggers the execution of rounding mode(some control bit being set when mulps is decoded)by SIMD FPU.

Eswar_Reddy_K_
Beginner
210 Views

Thank you!

SergeyKostrov
Valued Contributor II
210 Views
>>...Actually rounding is probably done by micro-operation control signal (mulps decoded into corresponding uop). It is interesting >>what triggers the execution of rounding mode(some control bit being set when mulps is decoded)by SIMD FPU... Take into account that there are only 24 bits to hold a mantissa value and it is not enough to represent 13421772.8 exactly. IEEE 754 Standard describes all that stuff and take a look at it. The most accurate representation of 13421772.8 is 13421773.0. In a binary form both numbers look like: 13421772.8 = 13421773.0 = 0x4B4CCCCD = 0 10010110 10011001100110011001101 Note 1: 1st digit is a Sign ( 0 is for positive ), followed by Exponent, followed by Mantissa. Note 2: Use Debugger to verify it.
SergeyKostrov
Valued Contributor II
210 Views
Eswar, Since you will need to do some processing using Double-Precision arithmetics take a look at a collection of very useful threads related to that subject: Forum topic: Support of 'long double' floating point data type on Intel CPUs ( A collection of threads ) Web-link: http://software.intel.com/en-us/node/375459
Eswar_Reddy_K_
Beginner
210 Views

Thanks for the insight.

Reply