Different results with -O0 -openmp or -O3 -openmp

Jack_S_ · ‎04-12-2014

Hi all,

I developed a FORTRAN (F90) code (its a large model) with the following compilation flags :

ifort -g -O0 -openmp -openmp_report -threads -ipo

When running this code with the above flags, I keep the results within 15 digits after the dot when running serial or parallel (OpenMP). I have also checked with Intel Inspector 2013 - and I do not have any data race condition in either if the subroutines.

However, when I change the optimization compilation flag to -O2 or -O3, I get small error which growth with time (its a simulation which integrates with time) from the 15 digit toward larger numbers. I would like to prevent this from happening. The results with either -O2 or -O3 are different (up to the fifth digits after the dot).

Can anyone advise on how can I, in general, improve my code (or some other compilation flags) in order it to run with the same precision (double precision) as with -O0 flag ?

Thanks in advance,

Jack.

Izaak_Beekman · ‎04-12-2014

There are a few things which can manifest themselves as a "divergence" of the solution from one run to the next. I must note that, IMO, you shouldn't lose undue sleep over this, so long as the results are accurate to a sufficient number of significant digits.

One of the most common sources of differences from run to run is the fact that numerical addition of floating point numbers is not commutative, unlike symbolic/mathematical addition. [(a + b) + c /= a + (b + c) in floating point arithmetic.] This will manifest itself in parallel codes, usually where some sort of reduction sum is occurring because different MPI ranks or OpenMP tasks will reach the reduction at different times, so the order in which the elements of the set are added together to form the sum will differ from run to run, causing small discrepancies due to roundoff error. One can avoid this by enforcing a consistent summation order, however, you will no longer be able to amortize communication overhead by using the data as it becomes ready, and introduces a synchronization barrier.

A more thorough discussion of this topic, as well as pertinent compiler flags can be found here: http://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler

One final note: You mention that the error grows as you integrate it. Does it grow unbounded and blow up your solution? If so, have you performed the stability analysis to ensure that you're using a viable numeric integration scheme for your model? Richardson's method was used as the time integration scheme for the first numerical weather studies, and decently correct results were calculated; it was only later that it was discovered that the spatial and temporal scheme he used were unstable! I'm sure you know all of this, but I thought I'd mention it anyway.

jimdempseyatthecove · ‎04-13-2014

Try adding option -fltconsistency.

Note, the above option may need to only apply to the convergence routine where you suspect the error (difference) is produced.

You might experiment with other -fp-model... options too.

Jim Dempsey

TimP · ‎04-13-2014

It could be as simple as -assume protect_parens That's one of the options set by -standard-semantics.

The options such as -fp-model source are meant to eliminate optimizations where results vary slightly with data alignment, as well as observing parentheses and restoring IEEE gradual underflow and IEEE divide and sqrt.

Also, if results are affected by bugs such as race conditions, those effects may change with optimization.

Jack_S_ · ‎04-13-2014

Hi guys,

Zak, Jim and Tim,

What worked for me is : using OpenMP with optimization (i.e., -O3 -openmp) with -fp-model strict.

It keeps the full double precision (15 digits after the point).

Thanks very much for your comments !

Best regards,

Jacob.