Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Daniel_D
Beginner
161 Views

Documentation for IEEE 754

Im developing a financial application and want to use IEEE 754-2008 types to make sure that calculations are correct and without binary calculation problems. Unfortunately I could not found much documentation and samples how to use BFP numbers. I also found a header for DFB (dfp754.h), but this seems not to work with C++ and there is no documentation at all ;(

How can I get more information how to use these IEEE 754 libraries in my C++ application???

Daniel

0 Kudos
9 Replies
SergeyKostrov
Valued Contributor II
161 Views

Hi Daniel,

Regarding IEEE 754 specs and FP-precision related APIs I would definetely recommend to look at:

- en.wikipedia.org/wiki/Single_precision
- www.binaryconvert.comand www.binaryconvert.com/convert_float.html
- MSDN - CRT functions that control precision of FPU: _control87, _controlfp, _control87_2
FPU - Floating-Point Unit
- and, of course, "float.h" header file

Best regards,
Sergey
Daniel_D
Beginner
161 Views

Thanks for the links Sergey. Most of them I already know. My point here is NOT to use float or double since they cannot calculate 3.05 + 0.05 without problems. If I use IEEE 754 coded values the computer should make this calculation without problems! So I'm looking for a good and useful implementation of this specs. Intel provides this, but the documentation is not very rich. They have something line _Decimal32 (64/128) (my guess is that Decimal32 types are NOT available for C++! - only for pure C-apps) and also bid64, but again, the documentation does not really show how to use them.

SergeyKostrov
Valued Contributor II
161 Views

>>...3.05 + 0.05 without problems...

Could youprovide more details regarding problems withyour3.05 + 0.05 test case? What was wrong?

In case of float data type ( single precision )and 24-bit precision setup in FPU a loss of accuracy is expected if a mantissa is greater than 2^24 = 16777216. Here is an example:

16968000(Base10) => 0 10010111 00000010111010010100000(Base2\IEEE754)
16968001(Base10) => 0 10010111 00000010111010010100000(Base2\IEEE754)
16968002(Base10) => 0 10010111 00000010111010010100001(Base2\IEEE754)
16968003(Base10) => 0 10010111 00000010111010010100010(Base2\IEEE754)
16968004(Base10) => 0 10010111 00000010111010010100010(Base2\IEEE754)
16968005(Base10) => 0 10010111 00000010111010010100010(Base2\IEEE754)
16968006(Base10) => 0 10010111 00000010111010010100011(Base2\IEEE754)
16968007(Base10) => 0 10010111 00000010111010010100100(Base2\IEEE754)
16968008(Base10) => 0 10010111 00000010111010010100100(Base2\IEEE754)
16968009(Base10) => 0 10010111 00000010111010010100100(Base2\IEEE754)
16968010(Base10) => 0 10010111 00000010111010010100101(Base2\IEEE754)

Can you see that three differentnumbers have the same binary representation inIEEE 754 format? If I need a better precision I use double or long double data types.

I understand that you want to use an external library to do all FP-based calculations. Would you be able to upload docs, headers and libs for what you have?

>>...Decimal32 types are NOT available for C++!..

Why?..

Judith_W_Intel
Employee
161 Views


>>...Decimal32 types are NOT available for C++!..

>> Why?..

Decimal Floating Point (as specified by ISO/IEC TR 24732), which was a technical report from the C standards committee. The C++ standardscommittee has not issued a technical report on Decimal Floating Point. So there is no description of how it should be implemented in C++.

Daniel_D
Beginner
161 Views

Thanks Sergey for your explanation. But my point is NOT the number of decimal places, it is to gwet correct results. Please see this sample:

double l_dbfirst = 3.05;
double l_dbSecond = 0.05;
double l_dbSum = l_dbfirst + l_dbSecond;
BOOL l_fIsCorrect = l_dbfirst + l_dbSecond == 3.1;

Do you expectl_fIsCorrect to be TRUE (not equal to zero)? Or l_dbSum to be 3.1???? You can expect l_fIsCorrect to be 0 and l_sbSum will be something like 3.099999999999...etc. That is my point to use IEEE 754 numbers - not the count of decimal places. Even using long double here will not make any difference in the result.

I think the using IEEE 754 will solve this problem. On the Intel Website I read that there is some dupport if this data-type. Please see this:




This waqs my reason to try the compiler - but there is not much documentation for IEEE 754 Binary and Decimal FP ;(

Thanks you for your help anyway.

Daniel












SergeyKostrov
Valued Contributor II
161 Views

Agree with 3.099999999 example.

But,this is a common problem whenthere isa question like: Could I trust the data?

Aconcept of Epsilon partially resolves it. Look, here are some consolidatedresults of my investigation how different libraries and compilers declare an Epsilon:

...

Epsilon for Floats - smallest such that 1.0+FLT_EPSILON != 1.0
Epsilon for Doubles - smallest such that 1.0+DBL_EPSILON != 1.0
Epsilon for Long Doubles - smallest such that 1.0+LDBL_EPSILON != 1.0

// Intel IPL -Doesn't specify DBL, FLT or LDBL
#define IPL_EPS1.0E-12

// Intel IPP -Nothing for LDBL
#define IPP_EPS_32F1.192092890e-07f
#define IPP_EPS_64F2.2204460492503131e-016

// STL -Uses default DBL_EPSILON, FLT_EPSILON and
LDBL_EPSILON values defined by a
C/C++ compiler

// OpenGL - Nothing!

// NVIDIA SDK
#define GLH_REALfloat -No fractions and Nothing for DBL and LDBL
#define GLH_EPSILONGLH_REAL(10e-6)

// Microsoft C++ compiler- Desktop
#define DBL_EPSILON 2.2204460492503131e-016
#define FLT_EPSILON 1.192092896e-07F
#define LDBL_EPSILONDBL_EPSILON

// Microsoft C++ compiler- Mobile
#define DBL_EPSILON 2.2204460492503131e-016
#define FLT_EPSILON 1.192092896e-07F
#define LDBL_EPSILONDBL_EPSILON

// Borland C++ v5.x.x compiler
#define DBL_EPSILON 2.2204460492503131E-16
#define FLT_EPSILON 1.19209290E-07F
#define LDBL_EPSILON1.084202172485504434e-019L

// Turbo C++ v3.x.xcompiler
#define DBL_EPSILON 2.2204460492503131E-16
#define FLT_EPSILON 1.19209290E-07F
#define LDBL_EPSILON 1.084202172485504E-19

// Turbo C++ v1.x.xcompiler
#define DBL_EPSILON 2.2204460492503131E-16
#define FLT_EPSILON 1.19209290E-07F
#define LDBL_EPSILON1.084202172485504E-19

// MinGW v3.4.xcompiler -Uses magic __DBL_EPSILON__,
__FLT_EPSILON__ and __LDBL_EPSILON__

Could be verified with a simple piece of code:
...
printf( "%.48f\n", ( float )__FLT_EPSILON__ );
printf( "%.48f\n", ( double )__DBL_EPSILON__ );
printf( "%.48f\n", ( long double )__LDBL_EPSILON__ );
...

Output is:
0.000000119209289550781250000000000000000000000000- Close to Microsoft's values
0.000000000000000222044604925031310000000000000000-Exact match with everybody
0.000000000000000000000000000000000000000000000000-Oops!

Telnov__Alex
Beginner
161 Views

Daniel:

For scientific and financial applications where high performance is an overriding requirement, I am afraid there will never be a way around the fact that floating-point arithmetic is imprecise and non-associative. One has to absorb this fact and learn to live with it... the same way as a student of quantum physics needs to spend at least a semester absorbing the fact that various physical quantities and objects are not infinitely divisible.

For safe conversions from double to int (or to dollars and cents), I suggest you

#define ONEPLUS 1.0+10.0*DBL_EPSILON

Then,

float a,b;
...
int n = a*b*ONEPLUS;
SergeyKostrov
Valued Contributor II
161 Views

Take a look at enclosedIntel's W_bigmulexample. Good luck!
Daniel_D
Beginner
161 Views

Thanks all for your help. I will take a look into the sampe - but still wounder about the prominsed IEEE 754 support of the compiler.

Daniel
Reply