Solved: For compiled code, the

jkwi · ‎01-14-2014

Have a large code, Fortran/CC, Linux. What options can be used to maximize the computatational consistnency between the cpu's. Basically we compile/run on two different machines one with AMD and one with Intel CPUs and get slightly different answers in some fields, most are identical. Suspect this is due to rounding/thresholding (>0.5 do A else do B) Are there optimized math libraries used for Intel cpus? etc. Any compile options we could use to force maximum compatibility?

We compile now with -O0 on both. Using 14.0.1.

Would different versions of CentOS result in differences due to any changes in GCC installed on the machine?

Thanks.

Martyn_C_Intel · ‎01-14-2014

For compiled code, the simplest way is to build with the options -fp-model precise -fimf-arch-consistency=true . This should work, even if you choose different optimization levels and instruction sets for the different processors, such as -xavx for Intel and -msse3 for the AMD processor.

See the article attached at http://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler/ for (much) more detail.

As Tim says, just -fimf-arch-consistency=true may be sufficient if you use the same optimization switches on each processor, including targeting the same instruction set. However, in exceptional circumstances, results could vary, even between consecutive runs on the same processor, see also http://software.intel.com/en-us/articles/run-to-run-reproducibility-of-floating-point-calculations-for-applications-on-intel-xeon . That's one of the reasons why Tim suggested the use of -align array32byte in addition. This switch is Fortran only, however.

View solution in original post

TimP · ‎01-14-2014

If you don't wish to stick with -O0, choose a single architecture switch which is compatible with all your CPUs, e.g. -msse3 (not -axhost....)

-prec-div -prec-sqrt will then avoid rounding differences due to the (less accurate on AMD64, more accurate on AMD32) divide and sqrt approximations.

-fimf-arch-consistency=true will choose math library calls which should not vary with CPU architecture (and should not make mistakes with untested AMD variants). This would be needed regardless of other options.

-align array32byte may help, particularly with MKL. If you use MKL, look up the consistency options.

gcc defaults are very different between 32-bit and 64-bit mode; it will be difficult to satisfy your request if you actually use gcc in different modes and don't set consistent options. On the same architecture, CentOS will make large changes in libraries only between major versions, e.g. CentOS 5.x to CentOS 6.x. CentOS 5.x doesn't support AVX but the sse3 suggestion above would avoid problems there.

Martyn_C_Intel · ‎01-14-2014

For compiled code, the simplest way is to build with the options -fp-model precise -fimf-arch-consistency=true . This should work, even if you choose different optimization levels and instruction sets for the different processors, such as -xavx for Intel and -msse3 for the AMD processor.

See the article attached at http://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler/ for (much) more detail.

As Tim says, just -fimf-arch-consistency=true may be sufficient if you use the same optimization switches on each processor, including targeting the same instruction set. However, in exceptional circumstances, results could vary, even between consecutive runs on the same processor, see also http://software.intel.com/en-us/articles/run-to-run-reproducibility-of-floating-point-calculations-for-applications-on-intel-xeon . That's one of the reasons why Tim suggested the use of -align array32byte in addition. This switch is Fortran only, however.

TimP · ‎01-14-2014

-fp-model precise, as Martyn suggested, includes setting of -prec-div -prec-div, as well as disabling some optimizations which may introduce numerical differences depending on alignment.

jkwi · ‎01-15-2014

Thanks for all the suggestions, at least I have some things to play with now.

Compiler options to maximize consistency of results btwn Intel/AMD CPUs