Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Highlighted
##

Bevin_H_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-09-2016
03:53 PM

1,933 Views

floating point errors

Hi,

We have code that computes the sum of squares from MKL cblas_sgemm and also cblas_dgemm and found large differences in the values. We have replicated the issue with a simple program that uses a sum of squares on the same data and also reproduced the differences with the GCC compiler. We are running ICC on Linux using Intel Xeons.

The code is:

#include <stdio.h> #include <stdlib.h> #include <math.h> #include <stdarg.h> int main(int argc, const char * argv[]) { float weight; double dsumwts = 0; float fsumwts = 0; double diffsum = 0; double dwt; int count = 0; FILE *datafile; if((datafile=fopen(argv[1],"r"))==NULL){ printf("Cannot fopen file %s\n", argv[1]); exit(1); } float a =1.0; while(fscanf(datafile,"%g",&weight)!=EOF){ //weight = ((float)rand()/(float)(RAND_MAX)) * a; dwt=weight; dwt=sqrt(dwt); weight=sqrtf(weight); dsumwts=dsumwts+(dwt*dwt); fsumwts=fsumwts+(weight*weight); diffsum=diffsum+(dwt*dwt-weight*weight); printf("Error Record %5d %12.7f %12.7f %12.7f %12.7f\n",count+1,dsumwts-fsumwts, dwt*dwt, weight*weight, (dwt*dwt-weight*weight)); count++; } printf("Double Sum = %12.5f\n",dsumwts); printf("Float Sum = %12.5f\n",fsumwts); printf("Diff Sum = %12.5f\n",diffsum); fclose(datafile); return 0; }

when run the code we get:

tail float.txt Error Record 43653 52.4184895 1.2222650 1.2222650 0.0000000 Error Record 43654 52.4220045 1.2222650 1.2222650 0.0000000 Error Record 43655 52.4255195 1.2222650 1.2222650 0.0000000 Error Record 43656 52.4290345 1.2222650 1.2222650 0.0000000 Error Record 43657 52.4325495 1.2222650 1.2222650 0.0000000 Error Record 43658 52.4325495 1.0000000 1.0000000 0.0000000 Error Record 43659 52.4325495 1.0000000 1.0000000 0.0000000 Double Sum = 91187.02630 Float Sum = 91134.59375 Diff Sum = 0.00000

The sum of differences from the floating point and double are zero but the difference between the sums is approx. 52.

When we uncomment random weight and rerun we get:

tail float.txt Error Record 43653 0.0709642 0.6583248 0.6583249 -0.0000001 Error Record 43654 0.0716534 0.3483455 0.3483455 -0.0000000 Error Record 43655 0.0719731 0.9007103 0.9007103 -0.0000000 Error Record 43656 0.0717455 0.0622724 0.0622724 0.0000000 Error Record 43657 0.0718161 0.4180393 0.4180393 -0.0000000 Error Record 43658 0.0724126 0.8658309 0.8658308 0.0000001 Error Record 43659 0.0725415 0.1895820 0.1895820 0.0000000 Double Sum = 21757.27762 Float Sum = 21757.20508 Diff Sum = 0.00001

which is what we would expect. We have done a random sort on the data and still get large differences. The data contains approximately 40,000

floating point numbers ranging from 0.5 to 19 with lots of repeats on 1 or 2 values. The data is attached.

Any thoughts on what could be causing these differences.

Thanks

Bevin

10 Replies

Highlighted
##

Yuan_C_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-10-2016
03:59 AM

1,933 Views

Hi, Bevin

I can reproduce the differences you reported.

I have expanded the print precision to 20.10f so that we can see how the difference increases from very beginning.

count+1 dsumwts-fsumwts dwt*dwt weight*weight (dwt*dwt-weight*weight) dsumwts fsumwts Error Record 1 -0.0000000000 6.4950919151 6.4950919151 -0.0000000000 6.4950919151 6.4950919151 Error Record 2 0.0000004768 8.6349782944 8.6349782944 0.0000000000 15.1300702095 15.1300697327 Error Record 3 0.0000014305 8.0962800980 8.0962800980 0.0000000000 23.2263503075 23.2263488770

Output with random input:

Error Record 1 0.0000000000 0.8401877284 0.8401877284 0.0000000000 0.8401877284 0.8401877284 Error Record 2 0.0000000298 0.3943829238 0.3943829238 0.0000000000 1.2345706522 1.2345706224 Error Record 3 -0.0000000298 0.7830992341 0.7830992341 0.0000000000 2.0176698864 2.0176699162

I think this relates to different number of significant digits for float and double. For float only 7~8 significant decimal digits, the bottom order digits after that are all rounded up. So you can see the difference starting from 0.000000* in the second record which should be expected. The difference rolls up after sqrt/sqrtf, multiply and sum in more than 4000 rounds and grows big. The random inputs are smaller number(less than 1) which saves at least 1 decimal significant digit comparing to the data from input file. Also noticed that values of dwt*dwt - weight*weight for random input are almost all zero.

You may refer my previous article on understanding floating point values. Though it's based on Fortran, the problem is similar:

Hope this helps.

Thanks.

Highlighted
##

>>...
>>printf("Double Sum = %12.5f\n",dsumwts);
>>printf("Float Sum = %12.5f\n",fsumwts);
>>printf("Diff Sum = %12.5f\n",diffsum);
Results of Single-Precision Data type ( SP / float ) calculations will never be the same as Double-Precision Data type ( DP / double ) calculations.
When matrices are big, or even worse huge, than product of multiplication of two matrices could be very different ( SP vs. DP ) because of rounding errors! I could give an example of multiplication of two very small matrices ( ~10x10 !!! ) and you will see how these rounding errors affect the final result.
Take a look at IEEE-754 Standard for more details and don't forget that SP arithmetic uses 24-bits for mantissa, and DP arithmetic uses 53-bits for mantissa and because of this accuracy is always better when DP calculations are used.
Another this is, printf-like functions also do rounding when they output results.

SKost

Valued Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-10-2016
09:48 AM

1,933 Views

Highlighted
##

Take a look at these two test cases for SP and DP Data types.
...
// Sub-Test 6.1 - Limitations of IEEE754 Standard for SP ( 24-bit ) arithmetics
{
RTfloat fV0 = ( 16777216.0f + 0.0f ); // 16777216.0
RTfloat fV1 = ( 16777216.0f + 1.0f ); // 16777216.0 !!! ( expected 16777217.0 )
RTfloat fV2 = ( 16777216.0f + 2.0f ); // 16777218.0
RTfloat fV3 = ( 16777216.0f + 3.0f ); // 16777220.0 !!! ( expected 16777219.0 )
RTfloat fV4 = ( 16777216.0f + 4.0f ); // 16777220.0
CrtPrintf( RTU("Limitations of IEEE754 Standard for SP ( 24-bit ) arithmetics:\n") );
CrtPrintf( RTU("\t( 16777216.0f + 0.0f )=%.1f\n"), fV0 );
CrtPrintf( RTU("\t( 16777216.0f + 1.0f )=%.1f - expected 16777217.0\n"), fV1 );
CrtPrintf( RTU("\t( 16777216.0f + 2.0f )=%.1f\n"), fV2 );
CrtPrintf( RTU("\t( 16777216.0f + 3.0f )=%.1f - expected 16777219.0\n"), fV3 );
CrtPrintf( RTU("\t( 16777216.0f + 4.0f )=%.1f\n"), fV4 );
}
...

SKost

Valued Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-10-2016
11:13 AM

1,933 Views

Highlighted
##

...
// Sub-Test 6.3 - Limitations of IEEE754 Standard for DP ( 53-bit ) arithmetics
{
RTdouble dV0 = ( 9007199254740992.0L + 0.0L ); // 9007199254740992.0
RTdouble dV1 = ( 9007199254740992.0L + 1.0L ); // 9007199254740992.0 !!! ( expected 9007199254740993.0 )
RTdouble dV2 = ( 9007199254740992.0L + 2.0L ); // 9007199254740994.0
RTdouble dV3 = ( 9007199254740992.0L + 3.0L ); // 9007199254740996.0 !!! ( expected 9007199254740995.0 )
RTdouble dV4 = ( 9007199254740992.0L + 4.0L ); // 9007199254740996.0
CrtPrintf( RTU("Limitations of IEEE754 Standard for DP ( 53-bit ) arithmetics:\n") );
CrtPrintf( RTU("\t( 9007199254740992.0L + 0.0L )=%.1f\n"), dV0 );
CrtPrintf( RTU("\t( 9007199254740992.0L + 1.0L )=%.1f - expected 9007199254740993.0\n"), dV1 );
CrtPrintf( RTU("\t( 9007199254740992.0L + 2.0L )=%.1f\n"), dV2 );
CrtPrintf( RTU("\t( 9007199254740992.0L + 3.0L )=%.1f - expected 9007199254740995.0\n"), dV3 );
CrtPrintf( RTU("\t( 9007199254740992.0L + 4.0L )=%.1f\n"), dV4 );
}
...

SKost

Valued Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-10-2016
11:14 AM

1,933 Views

Highlighted
##

Bevin_H_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-10-2016
07:36 PM

1,933 Views

Hi,

Thanks for the input.

I agree that the DP will never equal FP, however I would expect them to "close". I have explored our situation more. The following graph shows the weird behaviour.

after approximately 24K of the data the error in the sum increases by 0.0036 per addition

which is greater than you would get because of floating point errors. Note that the additions are

by all the same number when the slope dramatically increases. If we perturb that number by a small random value the

discrepancy disappears -- I may be wrong but this looks like a computational bug.

Cheers

Bevin

Highlighted
##

**[ 1 ]**
>>...program that uses a sum of squares on the same data and also **reproduced the differences** with the GCC compiler
If results from **GCC** and **ICC** the same, and this is what you've just mentioned, than it looks like a correct result.
**[ 2 ]**
>>...however I would expect them to "close"...
Once again, **IEEE 754** Standard clearly defines differences between SP and DP Floating Point ( FP ) data types. What does it mean "close"? Did you calculate absolute or relative errors?
**[ 3 ]**
If your software is a mission critical, for example related to Health Care ( X-Ray or MRI imaging ), or Aerospace, or Defense, and 100% compliance with **ISO 9001** or some **MIL**Standards is required, then all important calculations **must be done** with DP FP data type.
**[ 4 ]**
>>...I may be wrong but this looks like a computational bug...
In cases like yours Software Engineers usually call these **IEEE 754** Standard limitations as "Rounding Errors" and this is Not the correct expression because it has to be called as "Quantization Errors". It is simply **Not** possible to represent some number **exactly** when a limited number of bits is used for mantissa ( 24-bits for SP and 53-bits for DP FP data types ).
Try to do a test on a different platform, like Windows or Unix.
Also, take into account that **Intel FPU** could be initialized by Default, this is what all C++ compilers do for us, or some Custom Initializations could be completed with CRT function **_control87**.

SKost

Valued Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-10-2016
09:50 PM

1,933 Views

Highlighted
##

>>...The following **graph** shows the weird behaviour...
It doesn't make sense for me because it is not clear what Y axis represents.
PS:
If your team has a peson with MS or Phd degree in Mathematics than all investigations and final decisions should be done by that person.

SKost

Valued Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-10-2016
10:07 PM

1,933 Views

Highlighted
##

>>...
>>Double Sum = **91187.02630**
>>Float Sum = **91134.59375**
>>...
Absolute Error is: **AbsErr** = ( FP-value - DP-value ) = 91134.59375 - 91187.02630 = **-52.43255**
Relative Error is: **RelErr** = ( FP-value - DP-value ) / DP-value = ( 91134.59375 - 91187.02630 ) / 91187.02630 = **-0.000575000108321**
Percentage Error is: **PctErr** = **RelErr ** * 100 = -0.000575000108321 * 100 = **-0.057500010832100%**
**Note 1**: In case of an **ISO 9001** regulated project the **-0.0575%** PctErr would satisfy all harsh requirements of an **X-Ray** imaging software.
**Note 2**: **DP**-value is considered as a **True**-value when calculating **Absolute** and **Relative** Errors.

SKost

Valued Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-10-2016
10:29 PM

1,933 Views

Highlighted
##

Let's say SP data-type needs to be used. In that case accuracy of computations could be improved as follows:
- Make sure that **/fp:precise** Intel C++ compiler option is used;
- All intermediate variables could be declared as **double** ( expect some performance decrease ). In that case all intermediate results will be stored with double precision and number of roundings will be reduced;
- An **FMA**-like technique could be used ( **FMA** - Fused-Multiply-and-Add ). **FMA** instructions are faster and compute a product of ( A * B ) + C with one rounding instead of two roundings when **FMA** instructions are not used.

SKost

Valued Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-27-2016
05:00 PM

1,933 Views

Highlighted
##

Here is a 2013 post that describes a method to improve accuracy of computations:
**Mixing of Floating-Point Types ( MFPT ) when performing calculations. Does it improve accuracy?**
.
https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring...

SKost

Valued Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-27-2016
09:14 PM

1,933 Views

For more complete information about compiler optimizations, see our Optimization Notice.