- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Intel Fortran Compiler Results Difference???

I have an explicit transient dynamics Fortran90/95 program Fma-3D with

204,506 lines of code (count includes comment lines).

The program has 151+ verification/regression test cases.

PROBLEM: Quite different simualtion results from two different

INTEL FORTRAN compilers based on exactly the same FORTRAN code.

======================================================

CASE: Good Results.

======================================================

With Intel's --

Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running

on Intel(R) 64, Version 18.0.3.210 Build 20180410 Copyright (C)

1985-2018 Intel Corporation. All rights reserved.

Problem VP80 gets the results it always has, namely,

GOOD Output results for problem VP80

------------------------------------------------------

FMA-3D Verification Log

Date Of Execution: Wed, Feb 20, 2019 10:27:38 AM

Problems Requested: short

#PASS# Equals Success

*FAIL* Equals Failure

80 Beam with a 90 Degree Spiral Twist (Paired 4-Node Plates, KG Form.)

File: reference/vp80.cro

Line: 963

Value: 1.0588E-02

File: vp80.cro

Line: 2009

Value: 1.0588E-02

#PASS#"

------------------------------------------------------

GOOD Computed Results 'snap-shoot.'

------------------------------------------------------

Time T = 2.00010E-02 Time Step = 19464

0 NP X-DISPLACE Y-DISPLACE Z-DISPLACE X-VELOCITY Y-VELOCITY Z-VELOCITY X-ACCELERATE Y-ACCELERATE Z-ACCELERATE NP

37 3.5090E-03 -6.4205E-04 1.0588E-02 -2.5293E-01 6.5876E-02 -6.6740E-01 -1.3692E+03 -1.1300E+03 1.6509E+03 37

38 3.5090E-03 -6.0086E-06 1.0588E-02 -2.5376E-01 9.3550E-04 -6.6778E-01 -1.3620E+03 -4.1982E+00 1.6682E+03 38

39 3.5088E-03 6.3003E-04 1.0589E-02 -2.5510E-01 -6.4011E-02 -6.6756E-01 -1.3304E+03 1.1218E+03 1.6535E+03 39

------------------------------------------------------

======================================================

======================================================

CASE: Bad Results.

======================================================

However, with Intel's new Whiz-Bang

Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running

on Intel(R) 64, Version 19.0.2.190 Build 20190117 Copyright (C)

1985-2019 Intel Corporation. All rights reserved.

Problem VP80 gets results it never has before, namely,

BAD Output results for problem VP80

------------------------------------------------------

FMA-3D Verification Log

Date Of Execution: Wed, Feb 20, 2019 11:35:07 AM

Problems Requested: short

#PASS# Equals Success

*FAIL* Equals Failure

80 Beam with a 90 Degree Spiral Twist (Paired 4-Node Plates, KG Form.)

File: reference/vp80.cro

Line: 963

Value: 1.0588E-02

*FAIL*

------------------------------------------------------

BAD Computed Results 'snap-shoot.'

------------------------------------------------------

Time T = 2.00010E-02 Time Step = 19474

0 NP X-DISPLACE Y-DISPLACE Z-DISPLACE X-VELOCITY Y-VELOCITY Z-VELOCITY X-ACCELERATE Y-ACCELERATE Z-ACCELERATE NP

37 3.1320E-01 -1.2398E-01 9.2295E-01 5.5841E+03 -1.7370E+03 3.2168E+03 2.4400E+08 -1.1449E+08 7.6017E+07 37

38 4.4459E-01 -9.2594E-02 9.3223E-01 1.5940E+03 -4.6621E+01 1.5206E+03 4.9748E+07 -1.9775E+07 1.6839E+07 38

39 5.8434E-01 -5.9188E-02 9.4708E-01 -3.9777E+03 1.4123E+02 -8.1344E+02 -2.9865E+08 -2.0920E+07 -1.2048E+08 39

------------------------------------------------------

======================================================

Notes:

(1) These two calculations are based on the exact same Fortran Coding!!

(2) These two calculations are based on the exact same input file!

(3) These two results evaluations use the exact same awk-script!

(4) Start by comparing the Uz displacement column; header "Z-DISPLACE"

(5) Compile line for likely routine where difference occurs ==>

ifort -nologo -fpp -D_NETLIB -D_CVFNT -compile-only -QxHost -O3 -inline:speed -align:dcommons -check:bounds -extend_source -traceback zzz.f -object:platq.o

(6) For compiler Version 19.0.2.190 Build 20190117

(7) The ratio wrong:right = 88:1

Optimization -O1 produces correct numerical results;

Optimization -O2 produces bad numerical results;

Optimization -O3 produces bad numerical results.

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Just a comment on verification of dynamic simulations. Such systems are inherently chaotic, which means that small variations (example roundoff errors) quickly propagate into gross differences after several simulation steps. To expect the results of two compilations to be *exactly* the same is unrealistic. The question becomes to what extent are you allowing the differences to be ok and when do shall they legitimately fail the tests.

Are there any significant differences in the calculation of the state vector derivative between compilations? Do you test your integration scheme on a simpler model with known analytical results in order to asses the drift error? How does that compare between IVF 18 and IVF 19?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Samuel, although the output results that you posted may convey meaning to you or to others who are familiar with the program, the numbers mean nothing to most of the readers here. You attribute the differences solely to the compiler version and compiler options, but it is also possible that errors in the program can cause the results to vary in a way that is compiler dependent. The results may be affected, for instance, by mismatched subprogram arguments, uninitialized variables, wrong EOLs in data files, etc.

If you want help here, you will have to put some effort into creating a compact reproducer, I am afraid.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

To add to mecej4's comment, I notice that the "GOOD Computed Results 'snap-shoot.'" and the "BAD Computed Results 'snap-shoot.'" are printouts at different time steps... 19464 verses 19474.

Also note another discrepancy:

GOOD Computed Results 'snap-shoot.' occurs at Time T = 2.00010E-02 Time Step = 19464 (time step of 1.0275893958076448828606658446363e-6)

BAD Computed Results 'snap-shoot. occurs at Time T = 2.00010E-02 Time Step = 19474 (time step of 1.0270617233234055663962206018281e-6)

Same Time T but different number of time steps

One of the potential areas of problems often found in simulation programs in improperly advancing the time step.

! bad programming IntegrationTimeInterval = YourPreferredSmallNumber ! that is not exactly representable ... TimeStep = 0 T = 0.0 DO WHILE(T < TerminalPoint) ... ! code TimeStep = TimeStep + 1 T = T + IntegrationTimeInterval END DO ! correct (better) programming IntegrationTimeInterval = YourPreferredSmallNumber ! that is not exactly representable ... TimeStep = 0 T = 0.0 DO WHILE(T < TerminalPoint) ... ! code TimeStep = TimeStep + 1 T = TimeStep * IntegrationTimeInterval END DO

When you program the "bad" way .AND. the time integration interval is not an exact representable number (real or real(8)...) then the error in T increases at each increment (as T gets larger, additional bits of IntegrationTimeIntervalget shifted out lsb end in preparation for summation with T)

When you program the "good" way .AND. the time integration interval is not an exact representable number (real or real(8)...) then the error in T is a multiple of the original error in IntegrationTimeInterval at each increment.

Jim Dempsey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

John,

Thanks for taking a look at my dilemma. This software (Fma-3D) has been some 25+ years in the making. All floating point variables are configured for double precision. It has been 'run through' numerous compilers from various vendors, that is, neither the software, nor myself are new to this problem. Many of the verification problems have analytic/closed-form solutions. Several of the verification problems have 0.0 as the "exact answer," and, of course, the "computed result" is round-off. This does not bother me when the round-off values shift. Several of the awk scripts that do the comparisons between the reference value and the today's value to be tested incorporate a tolerance test.

I am not certain what you mean by "calculation of the state vector derivative between compilations." The program has an energy balance error calculation that reports the quality of the equation "External Energy = Internal Energy + Kinetic Energy" where each energy term is independently calculated and the "truth" (error in % wrt External Energy) of the required equality for all-time is reported.

For Fma-3D's test problem number 80,

For IVF19 using optimization -o1 compilation, the Energy Balance error, from start to finish, is between 0.0 and +3.6E-03

For IVF19 using optimization -o3 compilation, the Energy Balance error, from start to finish, is between +/- 3.4E+04

Regarding, IVF18,

For IVF18 using optimization -o1 compilation, the Energy Balance error, from start to finish, is between 0.0 and +3.7E-03

For IVF18 using optimization -o3 compilation, the Energy Balance error, from start to finish, is between 0.0 and +3.7E-03

As to drift error, test problem 80 starts undeformed and with zero Kinetic Energy. The error bracket above "between 0.0 and +3.6E-03" reflects the drift error as the object oscillates.

As you can see, IVF19 using optimization -03 is the odd man out. I am attaching a *.pdf file with a plot of Error vs time in the off chance that the file will be displayed or can be downloaded."

I am a little surprised that a following commenter felt that my FORTRAN programming might be at fault for this disparity in results given that the IVF18 with -03 has no problems, but IVF19 with -03 does?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Jim Dempsey,

Thanks for pointing out the two variants for updating Time and Time Step.

In my case, Fma-3D uses an explicit central difference time integrator; it is only conditionally stable. It has a critical time step. The calculations need to use a time step smaller than the critical time step. (As an aside, the accuracy of the integrator is optimal when the time step is just epsilon below the critical time step.) The critical time step is the acoustic transit time between the two closest nodal points in the discrete mesh. When one is doing large deformation calculations with non-linear materials with changing stiffness, the critical time step can either increase or decrease over time. Hence, Fma-3D re-evaluates the critical time step continually, and the program is stuck with your Method 1. (For output occurring at user prescribe times, one can look ahead and either in-line or off-line use a 'reduced' time step to get output (to within round off) for the time requested.)

In this case, what you noticed about the integer time step counter being so different between the "good" results and the "bad" results is due to the IVF19 -O3 compiled code going awry.

Regards

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

>>*Fma-3D re-evaluates the critical time step continually... is due to the IVF19 -O3 compiled code going awry*

Not necessarily. Maybe the older compilers were more wrong (farther from more exacting result).

This anomaly should be looked at by looking at where the next critical time step varies between compiler versions. (e.g. log the critical time steps in your "good" results, and compare the "bad" version critical time steps with the log).

>>*The critical time step is the acoustic transit time between the two closest nodal points*

In calculation of the distance you will be using a sqrt function. I suspect the difference is in the sqrt function selected between the two compilers combined with option switches. There are a few different sqrt function choices depending on a tradeoff between speed and precision.

Jim Dempsey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

If, as has been suggested, you are simulating a "chaotic" system, i.e. one that is very sensitive to the initial conditions, this casts a shadow of doubt over the results.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

On your way to making a reproducer, you will likely need to isolate where a change in computation makes the difference in your results. This might start with a binary search using combinations of a set of .obj files which produces the expected result and a set of those which don't. If you are able to isolate to one or two .obj files, if you need it, you can try directives within those source files to turn optimization on or off, such as novector directives. Finding such a directive might be a satisfactory temporary solution, if you can't determine that the source code produces a vulnerability. A small but unsatisfactory numerical departure might be produced where there is a requirement for parentheses to specify the order of operations, along with a compile option such as -standard-semantics which requires the compiler to implement the parens (possibly stricter than the Fortran standard requires, as opposed to ignoring parentheses which appear algebraically redundant).

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

To all of the commenters up to now;

This is a progress report in my search to locate the offending code

where IFORT Version 19 with -O3 optimization produced an executable

leading to an Fma-3D regression test case that failed.

I have completed a three-part compilation of my code Fma-3D.

Background Info:

In the Makefile, the software is managed as three

separate "blocks" of code, that is, the grouping is

(1,USE) All of the USE modules,

(2,FMA) All of the routines that pertain to the solid and

structural finite element behavior model, and

(3,MAT) All of the routines that implement the material models.

With minimal modification to the Makefile, I separated the compile

directive into three distinct commands. I was able to run

(-Oi, -Oj, -Ok) tests where (i,j,k) was assigned the following

optimization triples in turn (3,1,1), (1,3,1), and (1,1,3).

The Compile command line is -fpp -D_NETLIB -D_CVFNT -compile-only -QxHost -O3 -inline:speed -align:dcommons -check:bounds -Qprotect-parens -stand:f08 -extend_source -traceback zzz.f -object:polyh.o

Outcome: The Fma-3D regression test problem results, using

Version 19.0.2.190 Build 20190117, were

ALL Test (1,1,1) => PASS

USE Test (3,1,1) => PASS

FMA Test (1,3,1) => FAIL

MAT Test (1,1,3) => PASS

Just to be clear:

(3) The IFORT Version 18 executable has no problems with (3,3,3).

Next Step:

Try to locate the "FMA" routine(s) that while seemingly compiling

with no trouble, leads to a failed result (runs a small regression

test of a vibrating beam that produces incorrect answers).

Four Days Later:

After enabling -stand:f95 followed with -stand:f08 and chasing out the

non-conformance coding, I have the following test results:

ALL Test (1,1,1) => PASS

USE Test (2,1,1) => PASS

FMA Test (1,2,1) => FAIL

MAT Test (1,1,2) => PASS

USE Test (3,1,1) => PASS \

FMA Test (1,3,1) => FAIL |<== Same as before 'cleaning' the coding?

MAT Test (1,1,3) => PASS /

Just to be clear again, with the cleaner code:

(4) The IFORT Version 18 executable has no problems, that is,

ALL TEST (3,3,3) => PASS

Next Step:

Cycle through (probably a binary search) the 'FMA' subroutines and functions

looking for the offending routine.

Regards,

Samuel Key

P.S. For Gib: This verification/regression problem is a cantilever beam with a constant tip force applied a time=0.0. the mesh L/W/T=24x1x2. The beam responds primarily in the 1st mode (longest wave length). The material model is linear elastic. Pretty hard for this response to be chaotic.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

"Pretty hard for this response to be chaotic."

Agreed.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

To all of the commenters up to now;

This is a 2nd progress report in my search to locate the offending

code where IFORT Version 19 with -O3 optimization produced an

executable leading to an Fma-3D regression test case that failed.

Please note the Question at the bottom of this progress report.

I have completed a four-part compilation of the code Fma-3D.

Background Info:

Again, with minimal modification to the Makefile, I separated the

compile directive this time into four distinct commands. I was able to

run (-Oi, -Oj, -Ok, -On) tests where (i,j,k,n) was assigned the

following optimization quadruples in turn (3,1,1,1), (1,3,1,1),

(1,1,3,1) et cetera.

(1,USE) All of the USE modules,

(2,FMAUpper) First "half" of the FORTRAN routines that pertain to

the solid and structural finite element behavior model,

(3,FMALower) Second "half" of the FORTRAN routines that pertain to

the solid and structural finite element behavior model, and

(4,MAT) All of the FORTRAN routines that implement the material models.

Outcome: The Fma-3D regression test problem results, using

Version 19.0.2.190 Build 20190117, were

1st Partition of FMA into FMAU "Upper" and FMA "Lower" blocks

ALL Test (1,1,1,1) => PASS

USE Test (3,1,1,1) => PASS

FMAU Test (1,3,1,1) => FAIL ( in upper FMAU "block")

FMAL Test (1,1,3,1) => PASS

MAT Test (1,1,1,3) => PASS

2nd Partition of FMA: move 'one-line' of FMA *.o

from FMAU upper block down to FMAL lower block

FMAU Test (3,1,3,3) => PASS

<Rinse and Repeat>

5th Partition of FMA; move another line of FMA *.o

from FMAU upper block down to FMAL lower block

FMAL Test 3,1,3,3) => PASS

Without going into further detail, we have isolated the file, platq.f;

it contains 7 different 4-node finite element quadrilateral plate

formulations. In particular, platq.f contains the formulation we are

using in the Fma-3D verification/regression test problem here, the

results of which have consistently failed with -O3, -O2 and passed

with -O1 with the ifort Version 19.0.2.190 Build 20190117 compiler.

But, succeeds with Intel(R) 64, Version 18.0.3.210 Build 20180410 with

-O2 and -O3 in the platq.f file. (Russian grammatical construction: But, ... .)

Next Step:

Isolate Finite Element Formulation in platq.f that fails with -O3 compilation.

Outcome:

Isolated Finite Element Formulation in platq.f that fails with -O3 compilation.

Next Step:

Try newly released ifort: Intel(R) 64, Version 19.0.3.203 Build 20190206

Outcome:

Same as before; -O1 compiles, Fma-3D runs, provides good simulation results.

Same as before; -O3 compiles, Fma-3D runs, provides bad simulation results.

NOTE: I am talking about an obvious difference in two simulation results,

not round-off differences.

Next Step: Try to find offending code. (90% of my coding is dumber

than a stick. I have seen it referred to as "fall through" coding --

minimal number of do-loops, limited if-then-else-endif constructs.

It does have a plethora of f90 TYPE definitions related to finite

elements and numerous BC and loading options. (Lots of arrays of

TYPE'd datum items related to individual finite elements. I love it;

it has allowed me to manage much more code than without the TYPE'd

structures.)

Next Step: Lots of review, cleaning, and polishing of code in Fma-3D's

platq.f file and associated USE MODULES clear back to where the input

is read. (I did find one CASE(Iformulation) calling the wrong subroutine,

that did not clear up the -O1 works, but -O3 does not work)

Problem Identified?

Within the file platq.f, I have isolated one formulation, KEYGRUDA,

where the computed results depend on -O1 vs -O3. (The result "Uz(38)"

refers to the free-end tip deflection of a dynamically loaded

cantilevered beam. Ipts & Jpts specifies the number of Gaussian

integration points.)

NOTE: The 0.00001E-02 difference due to 2x2 vs 3x3 I do not consider

to be a problem.

--Ifort, Version 18-------------------------------------------------

Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64,

Version 18.0.3.210 Build 20180410 compiling with -O1 or -O3 gives

IDENTICAL results, and excellent Energy balance, for the test problem.

-O1, Jpts:2x2, Ipts:4, Computed Value for Uz(38)= 1.0731E-02

-O1, Jpts:3x3, Ipts:4, Computed value for Uz(38)= 1.0732E-02

-O3, Jpts:2x2, Ipts:4, Computed Value for Uz(38)= 1.0731E-02

-O3, Jpts:3x3, Ipts:4, Computed value for Uz(38)= 1.0732E-02

--Ifort, Version 19-------------------------------------------------

Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64,

Version 19.0.3.203 Build 20190206 compiling with -O1 and -O3 gives

DIFFERENT results, but excellent Energy balance, for the test problem.

-O1, Jpts:2x2, Ipts:4, Computed Value for Uz(38)= 1.0731E-02

-O1, Jpts:3x3, Ipts:4, Computed value for Uz(38)= 1.0732E-02

-O3, Jpts:2x2, Ipts:4, Computed Value for Uz(38)= 1.1424E-02 (+6.46%)

-O3, Jpts:3x3, Ipts:4, Computed value for Uz(38)= 1.1425E-02 (+6.46%)

--------------------------------------------------------------------

Possibly, the optimization of this chunk of coding is where the

problem is located???

(Integer Parameter P2TH = 1.0/2.0; P3RD = 1.0/3.0; P4TH = 1.0/4.0; ...)

<snip>===============================================================

!!

!! Loop on element integration points and compute velocity (and stress)

!! polynomial coefficients C(1:7,1:7). (C(7,7) is a symmetric matrix.)

!!

C00 = 0; C01 = 0; C02 = 0; C03 = 0; C04 = 0; C05 = 0; C06 = 0;

C11 = 0; C12 = 0; C13 = 0; C14 = 0; C15 = 0; C16 = 0;

C22 = 0; C23 = 0; C24 = 0; C25 = 0; C26 = 0;

C33 = 0; C34 = 0; C35 = 0; C36 = 0;

C44 = 0; C45 = 0; C46 = 0;

C55 = 0; C56 = 0;

C66 = 0;

!!

!! Back calculate "8-node brick" vertex coordinates.

!!

Qr(1:4) = Pr(1:4) - (P2TH * Pr(5:8))

Qs(1:4) = Ps(1:4) - (P2TH * Ps(5:8))

Qt(1:4) = Pt(1:4) - (P2TH * Pt(5:8))

Qr(5:8) = Pr(1:4) + (P2TH * Pr(5:8))

Qs(5:8) = Ps(1:4) + (P2TH * Ps(5:8))

Qt(5:8) = Pt(1:4) + (P2TH * Pt(5:8))

!!

!! Note: A volume numerical quadrature is created on-the-fly by combining

!! the thickness rule (confined to (0,0,Xi3)) with the area rule (confined

!! to (Xi1,Xi2,0)). It is easy enough to do and provides a quality C(7,7)

!! transformation matrix between the oblique Xi-coordinates and the r,s,t

!! orthogonal physical coordinates. We also need to access the derivatives

!! of the shape functions in the order that they were created in section.f

!! Hence, the use of IJP and the nesting of the JP-loop inside the IP-loop.

!!

IJP = 0

DO IP = 1,Ipts

Xi3 = Xi(3,IP)

DO JP = 1,Jpts

Xi1 = Xi(1,JP)

Xi2 = Xi(2,JP)

!!

!! Determinant of the coordinate transformation from the isoparametric

!! space to the physical space: dV = DetF dXi1 dXi2 dXi3

!!

IJP = IJP + 1

Fxx = DOT_PRODUCT (Qr,B1(1:8,IJP))

Fyx = DOT_PRODUCT (Qs,B1(1:8,IJP))

Fzx = DOT_PRODUCT (Qt,B1(1:8,IJP))

Fxy = DOT_PRODUCT (Qr,B2(1:8,IJP))

Fyy = DOT_PRODUCT (Qs,B2(1:8,IJP))

Fzy = DOT_PRODUCT (Qt,B2(1:8,IJP))

Fxz = DOT_PRODUCT (Qr,B3(1:8,IJP))

Fyz = DOT_PRODUCT (Qs,B3(1:8,IJP))

Fzz = DOT_PRODUCT (Qt,B3(1:8,IJP))

DetF(IJP) = Fxx*Fyy*Fzz + Fxy*Fyz*Fzx + Fxz*Fyx*Fzy

& - Fzx*Fyy*Fxz - Fzy*Fyz*Fxx - Fzz*Fyx*Fxy

!!

!! Compute transformation matrix C(1:7,1:7) entries.

!!

Weight = SECTION_2D(SecID)%A_Weight(IP) * Area_Weight(JP)

WDF = Weight*DetF(IJP)

C00 = C00 + WDF

C01 = C01 + (Xi1 * WDF)

C02 = C02 + (Xi2 * WDF)

C03 = C03 + (Xi3 * WDF)

C04 = C04 + (Xi1 * (Xi2 * WDF))

C05 = C05 + (Xi1 * (Xi3 * WDF))

C06 = C06 + (Xi2 * (Xi3 * WDF))

C11 = C11 + Xi1 * (Xi1 * WDF)

C12 = C12 + (Xi1 * (Xi2 * WDF))

C13 = C13 + (Xi1 * (Xi3 * WDF))

C14 = C14 + Xi1 * (Xi1 * (Xi2 * WDF))

C15 = C15 + Xi1 * (Xi1 * (Xi3 * WDF))

C16 = C16 + Xi1 * (Xi2 * (Xi3 * WDF))

C22 = C22 + Xi2 * (Xi2 * WDF)

C23 = C23 + (Xi2 * (Xi3 * WDF))

C24 = C24 + (Xi2 * (Xi1 * (Xi2 * WDF)))

C25 = C25 + (Xi2 * (Xi1 * (Xi3 * WDF)))

C26 = C26 + (Xi2 * (Xi2 * (Xi3 * WDF)))

C33 = C33 + Xi3 * (Xi3 * WDF)

C34 = C34 + Xi3 * (Xi1 * (Xi2 * WDF))

C35 = C35 + (Xi3 * (Xi1 * (Xi3 * WDF)))

C36 = C36 + (Xi3 * (Xi2 * (Xi3 * WDF)))

C44 = C44 + Xi1 * (Xi2 * (Xi1 * (Xi2 * WDF)))

C45 = C45 + Xi1 * (Xi2 * (Xi1 * (Xi3 * WDF)))

C46 = C46 + Xi1 * (Xi2 * (Xi2 * (Xi3 * WDF)))

C55 = C55 + Xi1 * (Xi3 * (Xi1 * (Xi3 * WDF)))

C56 = C56 + Xi1 * (Xi3 * (Xi2 * (Xi3 * WDF)))

C66 = C66 + Xi2 * (Xi3 * (Xi2 * (Xi3 * WDF)))

ENDDO

ENDDO

<snip>===============================================================

QUESTION: Is there a way to tell the compiler to change to a different

optimization level for an 'isolated' block of coding?

Regards to all who have waded through this material,

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

If you haven't tried this option it might be interesting to make a comparison

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

>>*QUESTION: Is there a way to tell the compiler to change to a different optimization level for an 'isolated' block of coding?*

!DIR$ OPTIMIZE:n ! n= 0, 1, 2, 3

This applies to the program unit. Place at the top of a procedure program unit.

Jim Dempsey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

If you have the time and energy to track this down further: my inclination would be to write out many intermediate values in the code that you posted above. This might reveal where the difference arises as a result of optimisation.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

To all of the commenters up to now;

I would like to close out this thread.

I used the suggested compiler directive: !DIR$ OPTIMIZE:n ! n= 0, 1,

2, 3 suggested by Jim Dempsey. However, systematically setting the

value of n to "3" for the each of the routines in the file platq.f

used by the test verification/regression problem, I was not able

to discover any anomalous behavior.

If the Intel FORTRAN compiler staff concerned with optimization

activity triggered by -O1, -O2, or -O3 is interested in the

differences in Ifort Version 18 and Ifort Version 19, I am

happy to work with them to get a testing version of my software.

It will likely not be a 7-line "Subroutine X ... End Subroutine X"

program.

For the time being, I will stay with Ifort Version 18, since it

has no problem with my test input case using -O3.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

>> *However, systematically setting the value of n to "3" for the each of the routines in the file platq.f used by the test verification/regression problem, I was not able to discover any anomalous behavior.*

Global -O3 without !DIR$ OPTIMIZE:n fails, but with

Global -O3 with !DIR$ OPTIMIZE:3 succeeds.

This (unfoundedly) implies to me that:

!DIR$ OPTIMIZE:n disables InterProcedural Optimizations (IPO)

.AND. the problem relates to IPO.

Have you tried -O3 without !DIR$ OPTIMIZE:n .AND. with IPO disabled?

With the behavior your have observed, I'd be hesitant in running production code - even though regressing testing passes.

Jim Dempsey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

You have applied parentheses carefully in many of your expressions so that the association of operations can't vary with optimization, provided you use the compiler option to apply them in accordance with the standard. In an expression such as your

Fxx*Fyy*Fzz + Fxy*Fyz*Fzx + Fxz*Fyx*Fzy

& - Fzx*Fyy*Fxz - Fzy*Fyz*Fxx - Fzz*Fyx*Fxy

the order of operations may be important, and you have left the compiler a great deal of freedom. It may choose an order with unfavorable cancellation effects for terms of similar magnitude. It would be best to find and specify an accurate order; a possibility might be

Fyy*(Fxx*Fzz- Fzx*Fxz) + (Fyz*(Fxy*Fzx- Fzy*Fxx) + Fyx*(Fxz*Fzy- Fzz*Fxy))

particularly in the event that the terms in brackets are relatively small. If you are compiling for a target instruction set with fused multiply-add, you should try -Qfma- to see if you can find a grouping which is not sensitive to the change.

The expression I wrote still doesn't control e.g. whether

(Fxx*Fzz)- Fzx*Fxz or Fxx*Fzz- (Fzx*Fxz) ; those will be the same with -Qfma- but differ without that option, if you use -QxHost for a recent CPU. I'm not certain whether ifort will observe these parens, although it should.

If you can find any discussions of this fma difference, they may use the linux spelling -mno-fma

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Steve Lionel (Ret.) (Blackbelt) wrote:Pleas read Improving Numerical Reproducibility in C/C++/Fortran for additional insights.

Thank you for sharing.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page