Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software Development Tools (Compilers, Debuggers, Profilers & Analyzers)
- Intel® Fortran Compiler
- The identity of simulation results in debug vs release mode

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Highlighted
##

A__King

New Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-20-2019
11:02 AM

2 Views

The identity of simulation results in debug vs release mode

I am running a simulation using a Fortran codebase which is claimed to be entirely Fortran-standard compliant and fully deterministic (it does contain random numbers, but the seed is properly initialized to a deterministic value for reproducability). When the code is compiled in debug (no optimization) and release (with almost all of the optimization flags on) modes, there are occasionally, once every few hundred steps in the simulation, tiny differences in the simulation results. These differences are mostly at the last precision digits of the output real numbers (all real numbers in the simulation are declared as real64). Whether the last precision is the 4th or the 8th precision digit in the output seems to be irrelevant, so it seems like the issue is related to the rounding of real numbers at the time of IO.

My broader question: In general, is it reasonable to expect exactly same output results from a deterministic simulation written in standard-compliant Fortran and compiled with different optimization levels of ifort switched on or off? or does any difference between release and debug modes indicate some non-compliance with Fortran standard or some hidden bug in the code?

These are the release mode ifort flags used:

/fast /O3 /Qip /Qipo /Qunroll /Qunroll-aggressive

and these are the debug mode ifort flags used:

/debug:full /Zi /CB /Od /Qinit:snan,arrays /warn:all /gen-interfaces /traceback /check:all /check:bounds /fpe-all:0 /Qdiag-error-limit:10 /Qtrapuv

4 Replies

Highlighted
##

IanH

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-21-2019
04:00 AM

2 Views

It is not reasonable to

It is not reasonable to expect exactly reproducible results.

The nature of floating point operations means that the order of execution matters, and the order of execution is something that may be changed by optimization.

From a standard perspective, within a statement a processor can replace an expression by something that is mathematically equivalent (subject to honoring parentheses) - but evaluating a mathematical equivalent expression with floating point arithmetic may not give the same result.

Specifically with your compile options, /fast enables floating point calculations that trade off precision for increase speed.

You could have a hidden bug in your code too!

Highlighted
##

jimdempseyatthecove

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-21-2019
05:36 AM

2 Views

In addition to IanH's

In addition to IanH's comments, consider that in Debug build (/Od) the generated code will use scalar operations on sections of code that can be vectorized. Whereas in Release build (/O3) those sections of code will be vectorized. Consider the case of a CPU with AVX2 and real64 perfroming a sum of an array. In scalar mode the operation is:

sum = A(1) + A(2) + ... + A(N) ! left to right, one after the other

In vector mode:

sum_lane_0 = A(1) + A(5) + ... + A(N-3) ! the following 3 lines occur in same instruction(s)

sum_lane_1 = A(2) + A(6) + ... + A(N-2)

sum_lane_3 = A(3) + A(7) + ... + A(N-1)

sum_lane_4 = A(4) + A(8) + ... + A(N)

sum = (sum_lane_0 + sum_lane_1) + (sum_lane_2 + sum_lane_3) ! horizontal sum of lanes

Each of the lanes will (may) experience round off errors at different points of the array as are produced in the scalar summation of the same array.

IOW while the summations are reproducible with multiple runs within the same build, the results may not necessarily be the same between builds.

Additionally, the precision of intrinsic functions such as sqrt, sin, cos vary depending on optimization options selected

Jim Dempsey

Highlighted
##

FortranFan

Valued Contributor III

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-21-2019
06:35 AM

2 Views

@A. King,

@A. King,

See Dr Fortran's paper **Improving Numerical Reproducibility in C/C++/Fortran**

Its message - "The Three Objectives • Accuracy • Reproducibility • Performance Pick two" - is generally useful to keep in mind.

Highlighted
##

A__King

New Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-13-2019
01:52 AM

2 Views

Thank you all for your great

Thank you all for your great responses. and for the slides by FortranFan, very helpful. For future reference, I also include a response by Tim, who, for some reason, could not post on the new forum and had to respond via direct message:

The forum is not permitting me to log in normally. Even if you aren't setting the goals of reproducibility,-fast is a dangerous option that I would never use. For reproducibility, you would turn off optimization which is known to cause variation in results by setting /fp: source . (precise is the same for ifort) The one part of /fast which you would want for performance is /Qxhost . That said, your results are so close that any optimization could make the difference.