Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
7033 Discussions

New Floating Point Math Error in Subtraction for Intel Core Processors?


To duplicate type the following in to an Excel spreadsheet.


I get: 0.999999999999446 if I allow the decimal places to be shown. That is three significant digits that are in error, way too many for rounding error.


I get a true rounding  error using Google (0.99999999999)


I understand rounding error, but look at the significant digits and this occurs in subtraction. Second of all there are only four significant digits to begin with there should not be any rounding error at all.


The error exists before multiplying by 1000, that is done, just for convenience.


This error occurs on the latest generation of Intel Processors (i7-8550U as well as my older i7-4770).


I had to track down this error from a 1 > 1 problem (when the number was put into a logical statement it failed).

0 Kudos
3 Replies
Honored Contributor III

Real numbers are represented and processed in base 2, not 10. Many of your expectations and arguments are incorrect, partly because of this.

Double precision reals (64-bits total, 52 of which hold the mantissa) can hold the equivalent of about 15 decimal digits. If you subtract two numbers whose first three digits are the same, that leaves 12 digits. Thus, you should not attach any significance to the thirteenth and later digits. 

Learning about such things is best done by working with pencil and paper. You can find and read articles that describe loss of precision caused by taking the difference of two nearly equal numbers.

0 Kudos

Consider how this computation would actually be done on a processor using IEEE conforming binary floating point arithmetic versus basic decimal arithmetic. For example:

6.377 is likely stored internally as 4019820c49ba5e35 in IEEE hex double precision format or roughly 6.3769999999999998e+00 before the computation begins. Note that this value may or may not be identical to the original 6.377 due to roundoff error.going from decimal to internal IEEE double precision format.

Likewise, 6.376 is likely stored internally as 4019810624dd2f1b in IEEE hex double precision format or roughly 6.3760000000000003e+00 before the computation begins. Note that this value is may or may not be identical to the original 6.376 due to roundoff error going from decimal to internal IEEE double precision format.

The difference between these two "internal" IEEE double precision values is 3f50624dd2f1a000 in IEEE hex double precision or 9.9999999999944578e-04 when printed.

0 Kudos

Thank you.

My apologies for not checking this in Hex (or binary) first. I knew it was a double precision floating point calculation, and that the numbers would be converted to binary then subtracted, but did not understand how it could be so far off.

Logically the last decimal place would be off due to conversion to binary for each number, so I understood that the last digits in decimal would be mostly worthless. I also understood that the next digit could even be affected a bit. However, I did not see how the result could be off by a full three "significant" (it is correctly pointed out that they are not actually significant) digits.  

I am staring at it in Hex and still can hardly believe such a large error could result, but at least I know not to blame the hardware or software.  So during (or just after) subtraction it creates additional "significant" digits in the mantissa by left shifting the mantissa in order to correct for the new exponent. Since the left shift fills the least significant bits with zeros we end up with 3f50624dd2f1a000 in stead of the more math mathematically correct 0x3f50624dd2f1a9fc.

The one thing I possibly would have done differently would be to place a binary one in the first (or all but the first) newly added most significant binary digits to avoid some of the rounding errors.  This would complicate the operation, but would reduce rounding error on average.  Since you need to assume that any newly created digit would be more likely closer to correct on average if created at 1/2 the base (5 for decimal, .5 or binary, 8 for Hex), for multiple digits you can simply fill with zero's (unless you are in a non even number base or filled the first digit with a zero).  In my case 7FF and 800 are both much closer to 9FC than 000 is.  That is something for the computer scientists and mathematicians to contemplate, changing the algorithm now seem like a bad idea.

0 Kudos