Optimization flags, floating points and portability (Pentium D to Xeon)

fbisetti · ‎05-09-2007

Hi all

I'm writing to seek some insight on a peculiar floating point problem that I stumbled across when porting a code (w/ some old F77 sections) from a Pentium D to a Xeon (Quad-core E5345) based machine.

All that follows assumes ifort v9.1.039 on the Pentium D and ifort v9.1.041 on the Xeon.

Also, uname -a returns the following:

+ Pentium D system: Linux box3 2.6.15-gentoo-r1 #1 SMP PREEMPT Sun Oct 1 23:04:59 PDT 2006 i686 Intel Pentium D CPU 3.20GHz GenuineIntel GNU/Linux

+ Xeon: Linux s84 2.6.18-3-amd64 #1 SMP Mon Dec 4 17:04:37 CET 2006 x86_64 GNU/Linux

The "problem".

In short, the code (a poisson solver) works fine and is stable during the time integration of the algorithm on the Pentium D with the flags "-O3 -fpe0 -fpp -warn all -std95", while it produces numerical oscillations, which eventually lead to NaNs and FPEs, on the Xeon.

If instead the code is compiled with "-O0" or "-O1" on the Xeon, the problem disappears. As the optimization is stepped up to "-O2" or "-O3" (even using "-mp" to try to control the floating point accuracy), the code becomes unstable and blows up.

Is this a problem in the code? Is it a problem connected to the floating point precision treatment when upgrading from -O1 to -O2 on the Xeon (as opposed to a Pentium D)? Or both?

I guess a good question would be: what extra optimizations are done from -O1 to -O2 which might trigger problems with floating points? Are these optimizations different from a Pentium D to a Xeon? Why does the problem occur on the Xeon and not on the Pentium D?

I'm trying to learn a lesson, so any insight or pointers would be welcomed.

Thanks,
Fabrizio

Steven_L_Intel1 · ‎05-10-2007

Unless you are using the -x or -ax options, you get the same compiled code running on Pentium D as you would on the newer Xeon. It is not possible to give a list of "additional optimizations", but it is often the case that any time the instruction sequence changes you have the possibility for low-bit differences in floating point computations.

The best thing would be to determine what computation is giving a different result between the two systems. This can be difficult in some cases, I know.

TimP · ‎05-10-2007

If you are looking for numerical consistency, with ifort 9.1 you would normally use options like -xW -fp-model precise. ifort doesn't pick any options according to the machine you compile on. You didn't give many details, such as whether you are using the 64-bit ifort on one machine and 32-bit on the other.

Did you try any of the following?

copy the working build over to the other machine and run it.

using the same mode (32- or 64-bit), make separate .o for each subroutine, link and run enough combinations to find out where the trouble occurs

fbisetti · ‎05-10-2007

Tim,

thanks for the reply. I'll experiment with your suggestions and objs combinations and try to pin point the troublesome object file.

In the meantime, here are the specifications invoking ifort -V

+ Xeon: Intel Fortran Compiler for Intel EM64T-based applications, Version 9.1

+ Pentium D: Intel Fortran Compiler for 32-bit applications, Version 9.1

I'll post a followup if I figure out more on the actual cause of the problem.

Cheers,
Fabrizio

TimP · ‎05-10-2007

Fabrizio,

In line with what Steve said, your 32-bit compilation without -xWis not using SSE and will evaluate all expressions in double precision. Your 64-bit compilation will not do any extra precision promotion of intermediate results; the default is SSE like -xW but with no vectorization. You could make the two more alike by setting '-xW -fp-model precise' for both builds, for more speed without extra precision, or by setting both to -mp -long-double -noftz, for less speed with extra precision.

If you have time to spend on it, you might be able to isolate where your application gains from extra precision by building sets of .o files with both option sets, linking and running combinations. If it depends on double precision evaluation of expressions with single precision operands, you should write in the double precision promotion as requiredin your source to avoid undependable results.