Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

Intel® IEEE 754-2008 Binary Floating-Point Conformance Library

James_B_11
Beginner
1,771 Views

I have a question about the compliance of the binary floating point library (libbfp754). In the documentation, a "Note" exists for all of the Quiet Computational Operations (abs, copy, copysign, negate) that states the following:

"When the input is a signaling NaN, two different outcomes are allowed by the standard. The operation could either signal invalid exception with quieted signaling NaN as output, or deliver signaling NaN as output without signaling any exception."

But I believe this is in contradiction with section 5.5.1 of the IEEE 754-2008 standard (not reprintable here), which explicitly states that these functions will only ever affect the sign bit and will not throw exceptions.

I have only seen the bfp754 library perform correctly (passing through a signaling NaN and not signaling an exception) on the two different systems I have compile and run my code on, but I need to know if the other behavior (returning a quiet NaN and signaling an exception) is actually allowed by the library, potentially making my code non-compliant on certain machines.

0 Kudos
23 Replies
James_B_11
Beginner
1,559 Views

Also, I believe I've found a bug in the library, where calling __binary32_from_hexstring("0x0.000001P-126") should return a single precision representation of the minimum subnormal number 1.40129846e-45, but instead returns 0 (0x0).

Another bug I've foun is that the __binary*_to_hexstring(*) functions are undefined. Using them and compiling causes the following errors:

error: identifier "__binary32_to_hexstring" is undefined
error: argument of type "float" is incompatible with parameter of type "char *"

Any assistance you could provide regarding these issues with the IEEE754-2008 library are greatly appreciated.

0 Kudos
SergeyKostrov
Valued Contributor II
1,559 Views
>>Another bug I've foun is that the __binary*_to_hexstring(*) functions are undefined. Using them and compiling >>causes the following errors: >> >>error: identifier "__binary32_to_hexstring" is undefined >>error: argument of type "float" is incompatible with parameter of type "char *" I just verified and here are all references to a binary32_to_hexstring string: [ bfp754.h ] ... BFP754_EXTERN_C __BFP754_RETURN_TYPE_VOID ___binary32_to_hexstring ( __BFP754_RESULT_POINTER_TYPE_STRING __BFP754_ARG_TYPE_BINARY32 __BFP754_STATUS_ARGS_TYPE ); ... [ bfp754_functionnames.h ] ... #define ___binary32_to_hexstring \ __binary32_to_hexstring ... [ libbfp754.lib ] ... Name of the function ( ___binary32_to_hexstring ) was found in the library ... Could you provide your test case, please? Thanks in advance.
0 Kudos
James_B_11
Beginner
1,559 Views

My attached file contains calls to all the functions I'm having issues with. I compile it with the line below.

icpc -fp-model source -fp-model except -g -O0 -std=c++0x -D__GXX_EXPERIMENTAL_CXX0X__ test.c -o cogeTest -lbfp754

The functions definitions you've given are correct and I included a call to one in my code. Those definitions, however, contradict the documentation at http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/composerxe/compiler/cpp-lin/index.htm

Is this the correct location for the most recent documentation?

Which says that the string output is a return value and not an argument. Now that I know that the strings are arguments to the functions, myissues are as follows.

(1) Providing the correct HEX string for minimal subnormal number does not produce the binary 32-bit minimal subnormal number.
(2) The documentation for the Quiet Computational Operations does not match the IEEE standard, but so far the behavior of those functions does.
(3) The documenation for the *_to_*string functions is incorrect.

Could you please point me to correct documentation, and address the functional bug in (1)

0 Kudos
SergeyKostrov
Valued Contributor II
1,559 Views

I don't see an example of your codes and could you upload it again?

0 Kudos
James_B_11
Beginner
1,559 Views

Sorry, here it is.

0 Kudos
SergeyKostrov
Valued Contributor II
1,559 Views
Thanks and I have the test case now. I'll also take a look at online docs.
0 Kudos
SergeyKostrov
Valued Contributor II
1,559 Views
I do a verification on a Windows platform and I have a different compilation error: ... ..\Composer XE 2011 SP1\compiler\include\bfp754_types.h(45): error: fenv_access cannot be enabled except in precise, source, double, and extended modes #pragma fenv_access(on) ^ compilation aborted for Test4.cpp (code 2) ... My question is did you have a compilation error or linker error?
0 Kudos
SergeyKostrov
Valued Contributor II
1,559 Views
Please verify and it looks like I finally reproduced a linker error: ... ..\Test>icl.exe /fp:precise Test4.cpp Intel(R) C++ Compiler XE for applications running on IA-32, Version 12.1.7.371 Build 20120928 Copyright (C) 1985-2012 Intel Corporation. All rights reserved. Test4.cpp Microsoft (R) Incremental Linker Version 8.00.50727.762 Copyright (C) Microsoft Corporation. All rights reserved. -out:Test4.exe Test4.obj Test4.obj : error LNK2019: unresolved external symbol ___binary32_to_hexstring referenced in function _main Test4.obj : error LNK2019: unresolved external symbol ___binary32_from_hexstring referenced in function _main Test4.exe : fatal error LNK1120: 2 unresolved externals ... and it happens because libbfp754.lib is Not specified.
0 Kudos
James_B_11
Beginner
1,559 Views

These are compilation errors that will show up if you uncomment the part of the sample code I sent that is commented out. I'm also compiling this on Linux using the compilation command I gave above. The error you get seems like it's related to the "-fp-model source" flag. If you uncomment the code in the sample I included, the following list of errors occurs:

-bash-4.1$ icpc -fp-model source -fp-model except -g -O0 -std=c++0x -D__GXX_EXPERIMENTAL_CXX0X__ -c test.c -o test.o -lbfp754
test.c(27): error: argument of type "float" is incompatible with parameter of type "char *"
hexStr32 = __binary32_to_hexstring(rand32);
^

test.c(27): error #165: too few arguments in function call
hexStr32 = __binary32_to_hexstring(rand32);
^

test.c(27): error: a value of type "void" cannot be assigned to an entity of type "char *"
hexStr32 = __binary32_to_hexstring(rand32);
^

test.c(30): error: argument of type "float" is incompatible with parameter of type "char *"
hexStr32 = __binary32_to_hexstring(min32SubNorm);
^

test.c(30): error #165: too few arguments in function call
hexStr32 = __binary32_to_hexstring(min32SubNorm);
^

test.c(30): error: a value of type "void" cannot be assigned to an entity of type "char *"
hexStr32 = __binary32_to_hexstring(min32SubNorm);
^

test.c(35): error: a value of type "const char *" cannot be used to initialize an entity of type "char"
char min64SubNormHex = "0x0.0000000000001P-1022";
^

test.c(45): error: expression must have pointer-to-object type
hexStr64 = &(min64SubNormHex[0]);
^

test.c(49): error: argument of type "float" is incompatible with parameter of type "char *"
hexStr64 = __binary64_to_hexstring(rand64);
^

test.c(49): error #165: too few arguments in function call
hexStr64 = __binary64_to_hexstring(rand64);
^

test.c(49): error: a value of type "void" cannot be assigned to an entity of type "char *"
hexStr64 = __binary64_to_hexstring(rand64);
^

test.c(52): error: argument of type "double" is incompatible with parameter of type "char *"
hexStr64 = __binary64_to_hexstring(min64SubNorm);
^

test.c(52): error #165: too few arguments in function call
hexStr64 = __binary64_to_hexstring(min64SubNorm);
^

test.c(52): error: a value of type "void" cannot be assigned to an entity of type "char *"
hexStr64 = __binary64_to_hexstring(min64SubNorm);

0 Kudos
SergeyKostrov
Valued Contributor II
1,559 Views
>>...and it happens because libbfp754.lib is Not specified. If the library is properly specified, and found by the linker, then everything is working. Take a look: ... ..\Test>icl.exe /fp:precise Test4.cpp libbfp754.lib Intel(R) C++ Compiler XE for applications running on IA-32, Version 12.1.7.371 Build 20120928 Copyright (C) 1985-2012 Intel Corporation. All rights reserved. Test4.cpp Microsoft (R) Incremental Linker Version 8.00.50727.762 Copyright (C) Microsoft Corporation. All rights reserved. -out:Test4.exe Test4.obj libbfp754.lib As you can see there are No any errors and Output is as follows: ..\Test>Test4.exe 1225945344.000000 --> 0x1.2449c4p030 -0x1.24E2BBP19 --> -599829.875000 0x0.000001P-126 --> 0.000000 Is that what you've expected to see?
0 Kudos
SergeyKostrov
Valued Contributor II
1,559 Views
>>...These are compilation errors that will show up if you uncomment the part of the sample code I sent that is commented out... I'll take a look at it as well.
0 Kudos
James_B_11
Beginner
1,559 Views

Yes, that is the correct output for the part of the sample that is not commented out. And that part uses the correct syntax for the *_to_hexstring functions. That correct syntax is not what is specified in the documentation. If you were to uncomment out the rest of the sample, which uses the syntax that is specified by the documentation, the errors will occur. So that shows issue #3 from my prior post.

The third line of output "0x0.000001P-126 --> 0.000000" show's issue #1 from my prior post. The string "0x0.000001P-126" should produce the minimum positive subnormal floating point number, which is "1.40129846e-45", but it instead produces 0.0.

0 Kudos
SergeyKostrov
Valued Contributor II
1,559 Views
Finally reproduced these compilation errors: ... ..\Test>icl.exe /fp:precise Test4.cpp libbfp754.lib Intel(R) C++ Compiler XE for applications running on IA-32, Version 12.1.7.371 Build 20120928 Copyright (C) 1985-2012 Intel Corporation. All rights reserved. Test4.cpp Test4.cpp(27): error: argument of type "float" is incompatible with parameter of type "char *" hexStr32 = __binary32_to_hexstring(rand32); ^ Test4.cpp(27): error #165: too few arguments in function call hexStr32 = __binary32_to_hexstring(rand32); ^ Test4.cpp(27): error: a value of type "void" cannot be assigned to an entity of type "char *" hexStr32 = __binary32_to_hexstring(rand32); ^ Test4.cpp(30): error: argument of type "float" is incompatible with parameter of type "char *" hexStr32 = __binary32_to_hexstring(min32SubNorm); ^ Test4.cpp(30): error #165: too few arguments in function call hexStr32 = __binary32_to_hexstring(min32SubNorm); ^ Test4.cpp(30): error: a value of type "void" cannot be assigned to an entity of type "char *" hexStr32 = __binary32_to_hexstring(min32SubNorm); ^ Test4.cpp(35): error: a value of type "const char *" cannot be used to initialize an entity of type "char" char min64SubNormHex = "0x0.0000000000001P-1022"; ^ Test4.cpp(45): error: expression must have pointer-to-object type hexStr64 = &(min64SubNormHex[0]); ^ Test4.cpp(49): error: argument of type "float" is incompatible with parameter of type "char *" hexStr64 = __binary64_to_hexstring(rand64); ^ Test4.cpp(49): error #165: too few arguments in function call hexStr64 = __binary64_to_hexstring(rand64); ^ Test4.cpp(49): error: a value of type "void" cannot be assigned to an entity of type "char *" hexStr64 = __binary64_to_hexstring(rand64); ^ Test4.cpp(52): error: argument of type "double" is incompatible with parameter of type "char *" hexStr64 = __binary64_to_hexstring(min64SubNorm); ^ Test4.cpp(52): error #165: too few arguments in function call hexStr64 = __binary64_to_hexstring(min64SubNorm); ^ Test4.cpp(52): error: a value of type "void" cannot be assigned to an entity of type "char *" hexStr64 = __binary64_to_hexstring(min64SubNorm); ^ compilation aborted for Test4.cpp (code 2) Note: Something is wrong ( on my side! ) with except option and as you can see I used precise option instead. Please don't pay attention for that at the moment. This is simply for information.
0 Kudos
James_B_11
Beginner
1,559 Views

An additional informative test now that the _to_hexstring functions are working, the following lines of code

float min32SubNorm = 1.40129846e-45;
__binary32_to_hexstring(test,min32SubNorm);
printf("%e --> %s\n", min32SubNorm, test);

will produce the following output ...

1.401298e-45 --> 0x0.000002p-126

This indicates that the intel FP library interprets the minimun subnormal floating point value as being represented with a significant of 2, instead of 1.

0 Kudos
SergeyKostrov
Valued Contributor II
1,559 Views
I just verified it and you're right. Here is the output: ... 1225945344.000000 --> 0x1.2449c4p030 -0x1.24E2BBP19 --> -599829.875000 0x0.000001P-126 --> 0.000000 1.401298e-045 --> 0x0.000002p-126 ... James, you've mentioned some web-page with description of these functions and could you specify a Topic ( more details, please ). When I followed the link it opens a web-page Using the OpenMP* Libraries.
0 Kudos
James_B_11
Beginner
1,559 Views

The link I use is 
http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/composerxe/compiler/cpp-lin/index.htm
 
It is the online documentation for Composer XE 2013, the chapter on the C++ compiler, and the sub-chapter of documentation referring to the Intel IEEE 754 Floating Point Conformance Library.

0 Kudos
James_B_11
Beginner
1,559 Views

OK, I've further characterized the problem I've been having with the Intel FP library _to_hexstring and _from_hexstring functions. My files are attached to this post, and are compiled with the command line below.
icpc -fp-model source -fp-model except -g -O0 -std=c++0x -D__GXX_EXPERIMENTAL_CXX0X__ test.c -o cogeTest -lbfp754 

The output of running the code is below

Min Pos Normal:
1.175494e-38 (0x00800000) --> 0x1.000000p-126
0x1.000000P-126 --> 1.175494e-38 (0x00800000)

Min Neg Normal:
-1.175494e-38 (0x80800000) --> -0x1.000000p-126
-0x1.000000P-126 --> -1.175494e-38 (0x80800000)

Max Pos Normal:
3.402823e+38 (0x7F7FFFFF) --> 0x1.fffffep127
0x1.7FFFFFP127 --> 2.552118e+38 (0x7F400000)
0x1.FFFFFEP127 --> 3.402823e+38 (0x7F7FFFFF)

Max Neg Normal:
-3.402823e+38 (0xFF7FFFFF) --> -0x1.fffffep127
-0x1.7FFFFFP127 --> -2.552118e+38 (0xFF400000)
-0x1.FFFFFEP127 --> -3.402823e+38 (0xFF7FFFFF)

Min Pos SubNormal:
1.401298e-45 (0x00000001) --> 0x0.000002p-126
0x0.000001P-127 --> 0.000000e+00 (0x00000000)
0x0.000002P-126 --> 1.401298e-45 (0x00000001)

Min Neg SubNormal:
-1.401298e-45 (0x80000001) --> -0x0.000002p-126
-0x0.000001P-127 --> -0.000000e+00 (0x80000000)
-0x0.000002P-126 --> -1.401298e-45 (0x80000001)

Max Pos SubNormal:
1.175494e-38 (0x007FFFFF) --> 0x0.fffffep-126
0x0.7FFFFFP-127 --> 2.938736e-39 (0x00200000)
0x0.FFFFFEP-126 --> 1.175494e-38 (0x007FFFFF)

Max Neg SubNormal:
-1.175494e-38 (0x807FFFFF) --> -0x0.fffffep-126
-0x0.7FFFFFP-127 --> -2.938736e-39 (0x80200000)
-0x0.FFFFFEP-126 --> -1.175494e-38 (0x807FFFFF)

There are a few issues ...

Issue #1: Shifting of hexadecimal bits in floating point significand
The first issue is related to how the {hexSignificant} hexidecimal digits are interpreted when converting between binary floating point formats and hexidecimal character strings. Referencing section 5.12.3 of the IEEE754-2008 standard, the {hexSignificand} string of hexidecimal digits is interpreted as a character sequence described by the ISO C99 standard. The significand part of the hexidecimal character sequence is up to 6 hexidecimal characters representing the 23-bit significand, where (from the IEEE754 standard) "the first (leftmost) character is the most significant". As you can see from the output of all of the test cases (but most obviously in the test cases for Min/Max Normal numbers), the hexidecimal character sequences that are produced by the Intel FP library uses and interprets hexadecimal character sequences where the first (leftmost) bit is the most significant. This is obviously bad practice and confusing that the 23-bits would be represented in hex as (0bXXXXXXXXXXXXXXXXXXXXXXX << 1 ), and this is not how a 23-bit number would be represented in hex according to the C99 standard that the IEEE754 standard references to for this type of formatting. This is apparent when looking at the hexidecimal representations of the floating point numbers I've included in the code output. A significand of all 1's, printed as hex is 0x7FFFFF, not 0xFFFFFE. So this, as far as I can tell, conflicts with the standard. This problem exists for the single precision functions only because in double precision the significand is 52 bits and can be cleanly packed into 13 hex digits. 

Issue #2: Incorrect exponent biasing/debiasing for subnormal numbers
This issue is demonstrated in the code output pertaining to the SubNormal numbersl. SubNormal numbers, as specified in the IEEE754-2008 standard, have a biased exponent value of 0x0. The bias for binary single precision is 127, so that the unbiased exponent of a SubNormal number is -127. However, the Intel FP library expects and interprets the unbiased exponent for SubNormal numbers to be -126. This problem, like the first, does not exist for double precision.

These issues could be related. Using 24-bits to represent a 23-bit significand could cause the exponent to not be handled correctly, but both issues (and the code I included) demonstrate how the *_hexstring functions do not meet the IEEE754-2008 standard.

0 Kudos
SergeyKostrov
Valued Contributor II
1,559 Views
>>Min Pos Normal: >>1.175494e-38 ( 0x00800000 ) --> 0x1.000000p-126 >>0x1.000000P-126 --> 1.175494e-38 ( 0x00800000 ) >>... I used www.binaryconvert.com for verification and it gives different numbers: 1.175494e-38 -> 0x007FFFFD = 00000000 01111111 11111111 11111101 and 0x007FFFFD -> 1.1754939304327482105236152601E-38 -> Most accurate representation = 1.1754939304327482105236152601E-38 0x00800000 -> 1.17549435082228750796873653722E-38 -> Most accurate representation = 1.17549435082228750796873653722E-38 Please take a look.
0 Kudos
SergeyKostrov
Valued Contributor II
1,559 Views
>>Min Pos Normal: >>1.175494e-38 ( 0x00800000 ) --> 0x1.000000p-126 >>0x1.000000P-126 --> 1.175494e-38 ( 0x00800000 ) >>... I used binaryconvert DOT com for verification and it gives different numbers: 1.175494e-38 -> 0x007FFFFD = 00000000 01111111 11111111 11111101 and 0x007FFFFD -> 1.1754939304327482105236152601E-38 -> Most accurate representation = 1.1754939304327482105236152601E-38 0x00800000 -> 1.17549435082228750796873653722E-38 -> Most accurate representation = 1.17549435082228750796873653722E-38 Please take a look.
0 Kudos
James_B_11
Beginner
1,344 Views

The numbers printed to the screen don't show the entire precision of the numbers being used in the code, which is a result of the default precision used in printf. The attached file displays more precision.

1.17549435e-38 is the min normal used
1.17549421e-38 is the max subnormal used
both yield the results I've shown on binaryconvert.com as well 


0 Kudos
Reply