Ver 19 -On Optimization Alters Numerical Results

Key__Samuel · ‎02-26-2019

Intel Fortran Compiler Results Difference???

I have an explicit transient dynamics Fortran90/95 program Fma-3D with
204,506 lines of code (count includes comment lines).

The program has 151+ verification/regression test cases.

PROBLEM: Quite different simualtion results from two different
INTEL FORTRAN compilers based on exactly the same FORTRAN code.

======================================================
CASE: Good Results.
======================================================
With Intel's --

Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running
on Intel(R) 64, Version 18.0.3.210 Build 20180410 Copyright (C)
1985-2018 Intel Corporation. All rights reserved.

Problem VP80 gets the results it always has, namely,

GOOD Output results for problem VP80
------------------------------------------------------
FMA-3D Verification Log

Date Of Execution: Wed, Feb 20, 2019 10:27:38 AM

Problems Requested: short

    #PASS# Equals Success
    *FAIL* Equals Failure

80 Beam with a 90 Degree Spiral Twist (Paired 4-Node Plates, KG Form.)
   File: reference/vp80.cro
   Line: 963
   Value: 1.0588E-02
   File: vp80.cro
   Line: 2009
   Value: 1.0588E-02
   #PASS#"

------------------------------------------------------
GOOD Computed Results 'snap-shoot.'
------------------------------------------------------
    Time T = 2.00010E-02   Time Step =   19464
0   NP    X-DISPLACE   Y-DISPLACE   Z-DISPLACE   X-VELOCITY   Y-VELOCITY   Z-VELOCITY X-ACCELERATE Y-ACCELERATE Z-ACCELERATE     NP
    37    3.5090E-03 -6.4205E-04   1.0588E-02 -2.5293E-01   6.5876E-02 -6.6740E-01 -1.3692E+03 -1.1300E+03   1.6509E+03      37
    38    3.5090E-03 -6.0086E-06   1.0588E-02 -2.5376E-01   9.3550E-04 -6.6778E-01 -1.3620E+03 -4.1982E+00   1.6682E+03      38
    39    3.5088E-03   6.3003E-04   1.0589E-02 -2.5510E-01 -6.4011E-02 -6.6756E-01 -1.3304E+03   1.1218E+03   1.6535E+03      39
------------------------------------------------------
======================================================

======================================================
CASE: Bad Results.
======================================================
However, with Intel's new Whiz-Bang

Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running
on Intel(R) 64, Version 19.0.2.190 Build 20190117 Copyright (C)
1985-2019 Intel Corporation. All rights reserved.

Problem VP80 gets results it never has before, namely,

BAD Output results for problem VP80
------------------------------------------------------
FMA-3D Verification Log

Date Of Execution: Wed, Feb 20, 2019 11:35:07 AM

Problems Requested: short

    #PASS# Equals Success
    *FAIL* Equals Failure

80 Beam with a 90 Degree Spiral Twist (Paired 4-Node Plates, KG Form.)
   File: reference/vp80.cro
   Line: 963
   Value: 1.0588E-02
   *FAIL*

------------------------------------------------------
BAD Computed Results 'snap-shoot.'
------------------------------------------------------
    Time T = 2.00010E-02   Time Step =   19474
0   NP    X-DISPLACE   Y-DISPLACE   Z-DISPLACE   X-VELOCITY   Y-VELOCITY   Z-VELOCITY X-ACCELERATE Y-ACCELERATE Z-ACCELERATE     NP
    37    3.1320E-01 -1.2398E-01   9.2295E-01   5.5841E+03 -1.7370E+03   3.2168E+03   2.4400E+08 -1.1449E+08   7.6017E+07      37
    38    4.4459E-01 -9.2594E-02   9.3223E-01   1.5940E+03 -4.6621E+01   1.5206E+03   4.9748E+07 -1.9775E+07   1.6839E+07      38
    39    5.8434E-01 -5.9188E-02   9.4708E-01 -3.9777E+03   1.4123E+02 -8.1344E+02 -2.9865E+08 -2.0920E+07 -1.2048E+08      39
------------------------------------------------------
======================================================

Notes:

(1) These two calculations are based on the exact same Fortran Coding!!
(2) These two calculations are based on the exact same input file!
(3) These two results evaluations use   the exact same awk-script!
(4) Start by comparing the Uz displacement column; header "Z-DISPLACE"
(5) Compile line for likely routine where difference occurs ==>
    ifort -nologo -fpp -D_NETLIB -D_CVFNT -compile-only -QxHost -O3 -inline:speed -align:dcommons -check:bounds   -extend_source -traceback   zzz.f -object:platq.o
(6) For compiler Version 19.0.2.190 Build 20190117
(7) The ratio wrong:right = 88:1
    Optimization -O1 produces correct numerical results;
    Optimization -O2 produces bad     numerical results;
    Optimization -O3 produces bad     numerical results.

JAlexiou · ‎02-27-2019

Just a comment on verification of dynamic simulations. Such systems are inherently chaotic, which means that small variations (example roundoff errors) quickly propagate into gross differences after several simulation steps. To expect the results of two compilations to be exactly the same is unrealistic. The question becomes to what extent are you allowing the differences to be ok and when do shall they legitimately fail the tests.

Are there any significant differences in the calculation of the state vector derivative between compilations? Do you test your integration scheme on a simpler model with known analytical results in order to asses the drift error? How does that compare between IVF 18 and IVF 19?

Steve_Lionel · ‎02-27-2019

Pleas read Improving Numerical Reproducibility in C/C++/Fortran for additional insights.

mecej4 · ‎02-27-2019

Samuel, although the output results that you posted may convey meaning to you or to others who are familiar with the program, the numbers mean nothing to most of the readers here. You attribute the differences solely to the compiler version and compiler options, but it is also possible that errors in the program can cause the results to vary in a way that is compiler dependent. The results may be affected, for instance, by mismatched subprogram arguments, uninitialized variables, wrong EOLs in data files, etc.

If you want help here, you will have to put some effort into creating a compact reproducer, I am afraid.

jimdempseyatthecove · ‎02-27-2019

To add to mecej4's comment, I notice that the "GOOD Computed Results 'snap-shoot.'" and the "BAD Computed Results 'snap-shoot.'" are printouts at different time steps... 19464 verses 19474.

Also note another discrepancy:

GOOD Computed Results 'snap-shoot.' occurs at Time T = 2.00010E-02 Time Step = 19464 (time step of 1.0275893958076448828606658446363e-6)
BAD Computed Results 'snap-shoot. occurs at Time T = 2.00010E-02 Time Step = 19474 (time step of 1.0270617233234055663962206018281e-6)

Same Time T but different number of time steps

One of the potential areas of problems often found in simulation programs in improperly advancing the time step.

! bad programming
IntegrationTimeInterval = YourPreferredSmallNumber ! that is not exactly representable
...
TimeStep = 0
T = 0.0
DO WHILE(T < TerminalPoint)
   ... ! code
  TimeStep = TimeStep + 1
  T = T + IntegrationTimeInterval 
END DO

! correct (better) programming
IntegrationTimeInterval = YourPreferredSmallNumber ! that is not exactly representable
...
TimeStep = 0
T = 0.0
DO WHILE(T < TerminalPoint)
   ... ! code
  TimeStep = TimeStep + 1
  T = TimeStep * IntegrationTimeInterval 
END DO

When you program the "bad" way .AND. the time integration interval is not an exact representable number (real or real(8)...) then the error in T increases at each increment (as T gets larger, additional bits of IntegrationTimeIntervalget shifted out lsb end in preparation for summation with T)

When you program the "good" way .AND. the time integration interval is not an exact representable number (real or real(8)...) then the error in T is a multiple of the original error in IntegrationTimeInterval at each increment.

Jim Dempsey

Key__Samuel · ‎02-27-2019

John,

Thanks for taking a look at my dilemma. This software (Fma-3D) has been some 25+ years in the making. All floating point variables are configured for double precision. It has been 'run through' numerous compilers from various vendors, that is, neither the software, nor myself are new to this problem. Many of the verification problems have analytic/closed-form solutions. Several of the verification problems have 0.0 as the "exact answer," and, of course, the "computed result" is round-off. This does not bother me when the round-off values shift. Several of the awk scripts that do the comparisons between the reference value and the today's value to be tested incorporate a tolerance test.

I am not certain what you mean by "calculation of the state vector derivative between compilations." The program has an energy balance error calculation that reports the quality of the equation "External Energy = Internal Energy + Kinetic Energy" where each energy term is independently calculated and the "truth" (error in % wrt External Energy) of the required equality for all-time is reported.

For Fma-3D's test problem number 80,

For IVF19 using optimization -o1 compilation, the Energy Balance error, from start to finish, is between 0.0 and +3.6E-03

For IVF19 using optimization -o3 compilation, the Energy Balance error, from start to finish, is between +/- 3.4E+04

Regarding, IVF18,

For IVF18 using optimization -o1 compilation, the Energy Balance error, from start to finish, is between 0.0 and +3.7E-03

For IVF18 using optimization -o3 compilation, the Energy Balance error, from start to finish, is between 0.0 and +3.7E-03

As to drift error, test problem 80 starts undeformed and with zero Kinetic Energy. The error bracket above "between 0.0 and +3.6E-03" reflects the drift error as the object oscillates.

As you can see, IVF19 using optimization -03 is the odd man out. I am attaching a *.pdf file with a plot of Error vs time in the off chance that the file will be displayed or can be downloaded."

I am a little surprised that a following commenter felt that my FORTRAN programming might be at fault for this disparity in results given that the IVF18 with -03 has no problems, but IVF19 with -03 does?

Key__Samuel · ‎02-27-2019

Jim Dempsey,

Thanks for pointing out the two variants for updating Time and Time Step.

In my case, Fma-3D uses an explicit central difference time integrator; it is only conditionally stable. It has a critical time step. The calculations need to use a time step smaller than the critical time step. (As an aside, the accuracy of the integrator is optimal when the time step is just epsilon below the critical time step.) The critical time step is the acoustic transit time between the two closest nodal points in the discrete mesh. When one is doing large deformation calculations with non-linear materials with changing stiffness, the critical time step can either increase or decrease over time. Hence, Fma-3D re-evaluates the critical time step continually, and the program is stuck with your Method 1. (For output occurring at user prescribe times, one can look ahead and either in-line or off-line use a 'reduced' time step to get output (to within round off) for the time requested.)

In this case, what you noticed about the integer time step counter being so different between the "good" results and the "bad" results is due to the IVF19 -O3 compiled code going awry.

Regards

jimdempseyatthecove · ‎02-28-2019

>>Fma-3D re-evaluates the critical time step continually... is due to the IVF19 -O3 compiled code going awry

Not necessarily. Maybe the older compilers were more wrong (farther from more exacting result).

This anomaly should be looked at by looking at where the next critical time step varies between compiler versions. (e.g. log the critical time steps in your "good" results, and compare the "bad" version critical time steps with the log).

>>The critical time step is the acoustic transit time between the two closest nodal points

In calculation of the distance you will be using a sqrt function. I suspect the difference is in the sqrt function selected between the two compilers combined with option switches. There are a few different sqrt function choices depending on a tradeoff between speed and precision.

Jim Dempsey

gib · ‎02-28-2019

If, as has been suggested, you are simulating a "chaotic" system, i.e. one that is very sensitive to the initial conditions, this casts a shadow of doubt over the results.

TimP · ‎03-01-2019

On your way to making a reproducer, you will likely need to isolate where a change in computation makes the difference in your results. This might start with a binary search using combinations of a set of .obj files which produces the expected result and a set of those which don't. If you are able to isolate to one or two .obj files, if you need it, you can try directives within those source files to turn optimization on or off, such as novector directives. Finding such a directive might be a satisfactory temporary solution, if you can't determine that the source code produces a vulnerability. A small but unsatisfactory numerical departure might be produced where there is a requirement for parentheses to specify the order of operations, along with a compile option such as -standard-semantics which requires the compiler to implement the parens (possibly stricter than the Fortran standard requires, as opposed to ignoring parentheses which appear algebraically redundant).

Key__Samuel · ‎03-05-2019

To all of the commenters up to now;

This is a progress report in my search to locate the offending code
where IFORT Version 19 with -O3 optimization produced an executable
leading to an Fma-3D regression test case that failed.

I have completed a three-part compilation of my code Fma-3D.

Background Info:

In the Makefile, the software is managed as three
separate "blocks" of code, that is, the grouping is

(1,USE) All of the USE modules,

(2,FMA) All of the routines that pertain to the solid and
structural finite element behavior model, and

(3,MAT) All of the routines that implement the material models.

With minimal modification to the Makefile, I separated the compile
directive into three distinct commands. I was able to run
(-Oi, -Oj, -Ok) tests where (i,j,k) was assigned the following
optimization triples in turn (3,1,1), (1,3,1), and (1,1,3).

The Compile command line is -fpp -D_NETLIB -D_CVFNT -compile-only -QxHost -O3 -inline:speed -align:dcommons -check:bounds -Qprotect-parens -stand:f08 -extend_source -traceback zzz.f -object:polyh.o

Outcome: The Fma-3D regression test problem results, using
Version 19.0.2.190 Build 20190117, were

ALL Test (1,1,1) => PASS
USE Test (3,1,1) => PASS
FMA Test (1,3,1) => FAIL
MAT Test (1,1,3) => PASS

Just to be clear:

(3) The IFORT Version 18 executable has no problems with (3,3,3).

Next Step:

Try to locate the "FMA" routine(s) that while seemingly compiling
with no trouble, leads to a failed result (runs a small regression
test of a vibrating beam that produces incorrect answers).

Four Days Later:

After enabling -stand:f95 followed with -stand:f08 and chasing out the
non-conformance coding, I have the following test results:

ALL Test (1,1,1) => PASS

USE Test (2,1,1) => PASS
FMA Test (1,2,1) => FAIL
MAT Test (1,1,2) => PASS

USE Test (3,1,1) => PASS \
FMA Test (1,3,1) => FAIL |<== Same as before 'cleaning' the coding?
MAT Test (1,1,3) => PASS /

Just to be clear again, with the cleaner code:

(4) The IFORT Version 18 executable has no problems, that is,

ALL TEST (3,3,3) => PASS

Next Step:

Cycle through (probably a binary search) the 'FMA' subroutines and functions
looking for the offending routine.

Regards,

Samuel Key

P.S. For Gib: This verification/regression problem is a cantilever beam with a constant tip force applied a time=0.0. the mesh L/W/T=24x1x2. The beam responds primarily in the 1st mode (longest wave length). The material model is linear elastic. Pretty hard for this response to be chaotic.

gib · ‎03-05-2019

"Pretty hard for this response to be chaotic."

Agreed.

Key__Samuel · ‎03-16-2019

To all of the commenters up to now;

This is a 2nd progress report in my search to locate the offending
code where IFORT Version 19 with -O3 optimization produced an
executable leading to an Fma-3D regression test case that failed.

Please note the Question at the bottom of this progress report.

I have completed a four-part compilation of the code Fma-3D.

Background Info:

Again, with minimal modification to the Makefile, I separated the
compile directive this time into four distinct commands. I was able to
run (-Oi, -Oj, -Ok, -On) tests where (i,j,k,n) was assigned the
following optimization quadruples in turn (3,1,1,1), (1,3,1,1),
(1,1,3,1) et cetera.

(1,USE) All of the USE modules,

(2,FMAUpper) First "half" of the FORTRAN routines that pertain to
the solid and structural finite element behavior model,

(3,FMALower) Second "half" of the FORTRAN routines that pertain to
the solid and structural finite element behavior model, and

(4,MAT) All of the FORTRAN routines that implement the material models.

Outcome: The Fma-3D regression test problem results, using
Version 19.0.2.190 Build 20190117, were

1st Partition of FMA into FMAU "Upper" and FMA "Lower" blocks
ALL Test (1,1,1,1) => PASS
USE Test (3,1,1,1) => PASS
FMAU Test (1,3,1,1) => FAIL ( in upper FMAU "block")
FMAL Test (1,1,3,1) => PASS
MAT Test (1,1,1,3) => PASS

2nd Partition of FMA: move 'one-line' of FMA *.o
from FMAU upper block down to FMAL lower block
FMAU Test (3,1,3,3) => PASS

5th Partition of FMA; move another line of FMA *.o
from FMAU upper block down to FMAL lower block
FMAL Test 3,1,3,3) => PASS

Without going into further detail, we have isolated the file, platq.f;
it contains 7 different 4-node finite element quadrilateral plate
formulations. In particular, platq.f contains the formulation we are
using in the Fma-3D verification/regression test problem here, the
results of which have consistently failed with -O3, -O2 and passed
with -O1 with the ifort Version 19.0.2.190 Build 20190117 compiler.

But, succeeds with Intel(R) 64, Version 18.0.3.210 Build 20180410 with
-O2 and -O3 in the platq.f file. (Russian grammatical construction: But, ... .)

Next Step:
Isolate Finite Element Formulation in platq.f that fails with -O3 compilation.

Outcome:
Isolated Finite Element Formulation in platq.f that fails with -O3 compilation.

Next Step:
Try newly released ifort: Intel(R) 64, Version 19.0.3.203 Build 20190206

Outcome:
Same as before; -O1 compiles, Fma-3D runs, provides good simulation results.
Same as before; -O3 compiles, Fma-3D runs, provides bad simulation results.

NOTE: I am talking about an obvious difference in two simulation results,
not round-off differences.

Next Step: Try to find offending code. (90% of my coding is dumber
than a stick. I have seen it referred to as "fall through" coding --
minimal number of do-loops, limited if-then-else-endif constructs.

It does have a plethora of f90 TYPE definitions related to finite
elements and numerous BC and loading options. (Lots of arrays of
TYPE'd datum items related to individual finite elements. I love it;
it has allowed me to manage much more code than without the TYPE'd
structures.)

Next Step: Lots of review, cleaning, and polishing of code in Fma-3D's
platq.f file and associated USE MODULES clear back to where the input
is read. (I did find one CASE(Iformulation) calling the wrong subroutine,
that did not clear up the -O1 works, but -O3 does not work)

Problem Identified?

Within the file platq.f, I have isolated one formulation, KEYGRUDA,
where the computed results depend on -O1 vs -O3. (The result "Uz(38)"
refers to the free-end tip deflection of a dynamically loaded
cantilevered beam. Ipts & Jpts specifies the number of Gaussian
integration points.)

NOTE: The 0.00001E-02 difference due to 2x2 vs 3x3 I do not consider
to be a problem.

--Ifort, Version 18-------------------------------------------------
Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64,
Version 18.0.3.210 Build 20180410 compiling with -O1 or -O3 gives
IDENTICAL results, and excellent Energy balance, for the test problem.

-O1, Jpts:2x2, Ipts:4, Computed Value for Uz(38)= 1.0731E-02
-O1, Jpts:3x3, Ipts:4, Computed value for Uz(38)= 1.0732E-02

-O3, Jpts:2x2, Ipts:4, Computed Value for Uz(38)= 1.0731E-02
-O3, Jpts:3x3, Ipts:4, Computed value for Uz(38)= 1.0732E-02

--Ifort, Version 19-------------------------------------------------
Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64,
Version 19.0.3.203 Build 20190206 compiling with -O1 and -O3 gives
DIFFERENT results, but excellent Energy balance, for the test problem.

-O1, Jpts:2x2, Ipts:4, Computed Value for Uz(38)= 1.0731E-02
-O1, Jpts:3x3, Ipts:4, Computed value for Uz(38)= 1.0732E-02

-O3, Jpts:2x2, Ipts:4, Computed Value for Uz(38)= 1.1424E-02 (+6.46%)
-O3, Jpts:3x3, Ipts:4, Computed value for Uz(38)= 1.1425E-02 (+6.46%)

--------------------------------------------------------------------

Possibly, the optimization of this chunk of coding is where the
problem is located???

(Integer Parameter P2TH = 1.0/2.0; P3RD = 1.0/3.0; P4TH = 1.0/4.0; ...)

<snip>===============================================================
!!
!! Loop on element integration points and compute velocity (and stress)
!! polynomial coefficients C(1:7,1:7). (C(7,7) is a symmetric matrix.)
!!
      C00 = 0; C01 = 0; C02 = 0; C03 = 0; C04 = 0; C05 = 0; C06 = 0;
                C11 = 0; C12 = 0; C13 = 0; C14 = 0; C15 = 0; C16 = 0;
                          C22 = 0; C23 = 0; C24 = 0; C25 = 0; C26 = 0;
                                    C33 = 0; C34 = 0; C35 = 0; C36 = 0;
                                              C44 = 0; C45 = 0; C46 = 0;
                                                        C55 = 0; C56 = 0;
                                                                  C66 = 0;
!!
!! Back calculate "8-node brick" vertex coordinates.
!!
      Qr(1:4) = Pr(1:4) - (P2TH * Pr(5:8))
      Qs(1:4) = Ps(1:4) - (P2TH * Ps(5:8))
      Qt(1:4) = Pt(1:4) - (P2TH * Pt(5:8))
      Qr(5:8) = Pr(1:4) + (P2TH * Pr(5:8))
      Qs(5:8) = Ps(1:4) + (P2TH * Ps(5:8))
      Qt(5:8) = Pt(1:4) + (P2TH * Pt(5:8))
!!
!! Note: A volume numerical quadrature is created on-the-fly by combining
!! the thickness rule (confined to (0,0,Xi3)) with the area rule (confined
!! to (Xi1,Xi2,0)). It is easy enough to do and provides a quality C(7,7)
!! transformation matrix between the oblique Xi-coordinates and the r,s,t
!! orthogonal physical coordinates. We also need to access the derivatives
!! of the shape functions in the order that they were created in section.f
!! Hence, the use of IJP and the nesting of the JP-loop inside the IP-loop.
!!
      IJP = 0
      DO IP = 1,Ipts
        Xi3 = Xi(3,IP)
        DO JP = 1,Jpts
          Xi1 = Xi(1,JP)
          Xi2 = Xi(2,JP)
!!
!! Determinant of the coordinate transformation from the isoparametric
!! space to the physical space: dV = DetF dXi1 dXi2 dXi3
!!
          IJP = IJP + 1
          Fxx = DOT_PRODUCT (Qr,B1(1:8,IJP))
          Fyx = DOT_PRODUCT (Qs,B1(1:8,IJP))
          Fzx = DOT_PRODUCT (Qt,B1(1:8,IJP))
          Fxy = DOT_PRODUCT (Qr,B2(1:8,IJP))
          Fyy = DOT_PRODUCT (Qs,B2(1:8,IJP))
          Fzy = DOT_PRODUCT (Qt,B2(1:8,IJP))
          Fxz = DOT_PRODUCT (Qr,B3(1:8,IJP))
          Fyz = DOT_PRODUCT (Qs,B3(1:8,IJP))
          Fzz = DOT_PRODUCT (Qt,B3(1:8,IJP))

          DetF(IJP) = Fxx*Fyy*Fzz + Fxy*Fyz*Fzx + Fxz*Fyx*Fzy
     &              - Fzx*Fyy*Fxz - Fzy*Fyz*Fxx - Fzz*Fyx*Fxy
!!
!! Compute transformation matrix C(1:7,1:7) entries.
!!
          Weight = SECTION_2D(SecID)%A_Weight(IP) * Area_Weight(JP)

WDF = Weight*DetF(IJP)

          C00 = C00 + WDF
          C01 = C01 + (Xi1 * WDF)
          C02 = C02 + (Xi2 * WDF)
          C03 = C03 + (Xi3 * WDF)
          C04 = C04 + (Xi1 * (Xi2 * WDF))
          C05 = C05 + (Xi1 * (Xi3 * WDF))
          C06 = C06 + (Xi2 * (Xi3 * WDF))

          C11 = C11 + Xi1 * (Xi1 * WDF)
          C12 = C12 + (Xi1 * (Xi2 * WDF))
          C13 = C13 + (Xi1 * (Xi3 * WDF))
          C14 = C14 + Xi1 * (Xi1 * (Xi2 * WDF))
          C15 = C15 + Xi1 * (Xi1 * (Xi3 * WDF))
          C16 = C16 + Xi1 * (Xi2 * (Xi3 * WDF))

          C22 = C22 + Xi2 * (Xi2 * WDF)
          C23 = C23 + (Xi2 * (Xi3 * WDF))
          C24 = C24 + (Xi2 * (Xi1 * (Xi2 * WDF)))
          C25 = C25 + (Xi2 * (Xi1 * (Xi3 * WDF)))
          C26 = C26 + (Xi2 * (Xi2 * (Xi3 * WDF)))

          C33 = C33 + Xi3 * (Xi3 * WDF)
          C34 = C34 + Xi3 * (Xi1 * (Xi2 * WDF))
          C35 = C35 + (Xi3 * (Xi1 * (Xi3 * WDF)))
          C36 = C36 + (Xi3 * (Xi2 * (Xi3 * WDF)))

          C44 = C44 + Xi1 * (Xi2 * (Xi1 * (Xi2 * WDF)))
          C45 = C45 + Xi1 * (Xi2 * (Xi1 * (Xi3 * WDF)))
          C46 = C46 + Xi1 * (Xi2 * (Xi2 * (Xi3 * WDF)))

C55 = C55 + Xi1 * (Xi3 * (Xi1 * (Xi3 * WDF)))
C56 = C56 + Xi1 * (Xi3 * (Xi2 * (Xi3 * WDF)))

          C66 = C66 + Xi2 * (Xi3 * (Xi2 * (Xi3 * WDF)))
        ENDDO
      ENDDO
<snip>===============================================================

QUESTION: Is there a way to tell the compiler to change to a different
optimization level for an 'isolated' block of coding?

Regards to all who have waded through this material,

andrew_4619 · ‎03-17-2019

https://software.intel.com/en-us/fortran-compiler-developer-guide-and-reference-fprotect-parens-qprotect-parens

If you haven't tried this option it might be interesting to make a comparison

jimdempseyatthecove · ‎03-17-2019

>>QUESTION: Is there a way to tell the compiler to change to a different optimization level for an 'isolated' block of coding?

!DIR$ OPTIMIZE:n ! n= 0, 1, 2, 3

This applies to the program unit. Place at the top of a procedure program unit.

Jim Dempsey

Key__Samuel · ‎03-17-2019

To Andrew-4619; Thanks for scanning the snippet of code. Here is the compile line for the file platq.f -- ifort -nologo -fpp -D_NETLIB -D_CVFNT -compile-only -QxHost -O1 -inline:speed -align:dcommons -check:bounds -Qprotect-parens -stand:f08 -extend_source -traceback zzz.f -object:platq.o The compiler directive "-Qprotect-parens" is applied to all of the *.f files. Do you have a suggestion for the next step? Regards,

gib · ‎03-17-2019

If you have the time and energy to track this down further: my inclination would be to write out many intermediate values in the code that you posted above. This might reveal where the difference arises as a result of optimisation.

Key__Samuel · ‎03-29-2019

To all of the commenters up to now;

I would like to close out this thread.

I used the suggested compiler directive: !DIR$ OPTIMIZE:n ! n= 0, 1,
2, 3 suggested by Jim Dempsey. However, systematically setting the
value of n to "3" for the each of the routines in the file platq.f
used by the test verification/regression problem, I was not able
to discover any anomalous behavior.

If the Intel FORTRAN compiler staff concerned with optimization
activity triggered by -O1, -O2, or -O3 is interested in the
differences in Ifort Version 18 and Ifort Version 19, I am
happy to work with them to get a testing version of my software.
It will likely not be a 7-line "Subroutine X ... End Subroutine X"
program.

For the time being, I will stay with Ifort Version 18, since it
has no problem with my test input case using -O3.

jimdempseyatthecove · ‎03-30-2019

>> However, systematically setting the value of n to "3" for the each of the routines in the file platq.f used by the test verification/regression problem, I was not able to discover any anomalous behavior.

Global -O3 without !DIR$ OPTIMIZE:n fails, but with
Global -O3 with !DIR$ OPTIMIZE:3 succeeds.

This (unfoundedly) implies to me that:

!DIR$ OPTIMIZE:n disables InterProcedural Optimizations (IPO)
.AND. the problem relates to IPO.

Have you tried -O3 without !DIR$ OPTIMIZE:n .AND. with IPO disabled?

With the behavior your have observed, I'd be hesitant in running production code - even though regressing testing passes.

Jim Dempsey

TimP · ‎03-30-2019

You have applied parentheses carefully in many of your expressions so that the association of operations can't vary with optimization, provided you use the compiler option to apply them in accordance with the standard. In an expression such as your

Fxx*Fyy*Fzz + Fxy*Fyz*Fzx + Fxz*Fyx*Fzy
& - Fzx*Fyy*Fxz - Fzy*Fyz*Fxx - Fzz*Fyx*Fxy

the order of operations may be important, and you have left the compiler a great deal of freedom. It may choose an order with unfavorable cancellation effects for terms of similar magnitude. It would be best to find and specify an accurate order; a possibility might be

Fyy*(Fxx*Fzz- Fzx*Fxz) + (Fyz*(Fxy*Fzx- Fzy*Fxx) + Fyx*(Fxz*Fzy- Fzz*Fxy))

particularly in the event that the terms in brackets are relatively small. If you are compiling for a target instruction set with fused multiply-add, you should try -Qfma- to see if you can find a grouping which is not sensitive to the change.

The expression I wrote still doesn't control e.g. whether

(Fxx*Fzz)- Fzx*Fxz or Fxx*Fzz- (Fzx*Fxz) ; those will be the same with -Qfma- but differ without that option, if you use -QxHost for a recent CPU. I'm not certain whether ifort will observe these parens, although it should.

If you can find any discussions of this fma difference, they may use the linux spelling -mno-fma

Daniel_Dopico · ‎04-01-2019

Steve Lionel (Ret.) (Blackbelt) wrote:
Pleas read Improving Numerical Reproducibility in C/C++/Fortran for additional insights.

Thank you for sharing.