I am using 5.3 gcc with libimf.a (have but can't use icc due to other reasons) and find that the exp() function returns different values for certain numbers when the same build is run on Sandy Bridge and Haswell running CentOS6. The library comes with pxtudioxe both 2016 and 2017. For example,
const double v = 3.3990833195703927;
printf("exp(v) = %.20e\n", exp(v));
on the two architectures. The code is compiled with "gcc main.cc .../libimf.a .../libirc.a". Is there a way to use the library with gcc such that it is architecture-independent?
You appear to be asking about the special math function entry points associated with icc imf arch consistency option. As you ask about big cpu, this forum is not among the most suitable.
I think you intended to print in hexadecimal format
printf( "exp(v) = %.20e, %0llX\n", r, r ); // both double and Hex (64 bits)
I also agree with your statement that the difference may be attributable to the conversion of the double to the text string printed. Should the Hex formats show the same number, then this would indicate that the issue involves the conversion of the double to the text string.
Using the utility at http://www.binaryconvert.com/convert_double.html, it is clear that these two values differ by one least significant bit.
This is not at all surprising, since Sandy Bridge must use separate Add and Multiply operations, while Haswell can use the more accurate FMA instruction.
The GNU C library does not aim to provide exactly the same results on all platforms. The goals of the project and error estimates for many of the available functions are at https://www.gnu.org/software/libc/manual/html_node/Errors-in-Math-Functions.html. ; On that page, the first goal of the project is to "provide a result that is within a few ULP of the mathematically correct value of the function". A difference of one ULP across platforms is consistent with this definition.
If the goal is to get exactly the same bits on all platforms, it may be possible to trick the library into using the Sandy Bridge code path on the Haswell platform. This will eliminate the FMA instructions and might provide identical results. I don't know how to do this with the GNU compilers and libraries, but there is probably enough information at https://lwn.net/Articles/691932/ to get started on understanding how this works and how it might be controlled.
gcc -mno-fma -march=native are frequently used options to avoid introducing fma along with avx2, but this will not influence library math function code. That is one of the features of intel arch-consistency math function entry points. There is of course no documentation or assured support for calling those entry points by name. Also there appears to be no documentation about when avx2 math functions may be the more accurate.
The AVX2 math functions that are able to use FMA operations should probably be assumed to be more accurate than the corresponding implementations that require rounding between the multiply and add operations. (There will always be point-wise counter-examples, but unless the implementation is bad, any reasonable norm on a distribution of results should show the FMA-based results to be more accurate on "average".)
Unfortunately, emulating the increased accuracy of the Fused Multiply-Add is quite expensive on machines that only support separate operations, so if you want bit-wise reproducibility, you almost certainly need to try to reproduce the non-FMA results.
Bitwise reproducibility will also require identical ordering of operations, which for some algorithms means that all results must be computed with the same vector length on all platforms. A vector length of 1 is a convenient value, but a vector length of 2 doubles is probably supported by all of the platforms of interest.