optimizer question

Brian_Murphy · ‎03-13-2020

I have a program for which a Debug build runs ok, but a Release build doesn't (it runs, but results are wrong). So I changed Release Optimization from /O2 to /Od and then it runs ok. Is this an indication that the program has a bug somewhere?

A weird part about this is that the /O2 code fails to run correctly on one win7 system, but runs successfully on another win7 system. The program is built statically and its only dependencies are KERNEL32.DLL and IMAGEHLP.DLL

mecej4 · ‎03-14-2020

The /Od option disables optimization.

The /Debug option asks the compiler to place extra information in the generated code to enable symbolic debugging. If you specify /Debug without any optimization options, /Od is implied.

The symptoms you described usually imply bugs in the user's code, but it happens occasionally that an optimization bug is encountered.

Brian_Murphy · ‎03-14-2020

I tried to /O2 in a Debug build, but ifort.exe reported ifort: warning #10182: disabling optimization; runtime debug checks enabled

How does one debug an optimization issue?

mecej4 · ‎03-14-2020

First of all, one has to establish that the code is correct in the sense of the standard (with the addition of extensions to the standard provided by the compiler, if applicable). There should be no array overruns, use of uninitialized variables, mismatched subprogram arguments, etc. The mere building and running of the program without compile-time or run-time messages is not sufficient proof of correctness.

It is sometimes helpful to examine the machine level code to prove that an optimization bug is present, but this is usually feasible only after the source code has been pared down to a few scores of lines.

The difficulties are of such a nature that it is not unusual for optimizer bugs to lurk undetected for many years. On the other hand, Intel's Fortran compiler is a market leader and has a large number of users, which increases the probability that someone running into such an error will end up reporting it.

Here is an example:

https://software.intel.com/en-us/forums/intel-fortran-compiler/topic/685354

Steve_Lionel · ‎03-14-2020

It is often the case that errors in a program will go undetected without optimization. In your case, you have a straightforward path to identifying the problem.

You can debug optimized code - the message you got indicates that you have certain run-time checks enabled, usually stack checks. In your project's Debug configuration, go to Fortran > Run Time and turn off Check Stack Frame. But for something like this, it is not helpful at the beginning as the optimized code rarely correlates with source code.

I would also suggest turning off the other run-time checks. See if the problem remains. If it does...

"Instrument" your program by writing out intermediate results to a file, in a way that helps you identify where each value comes from in the code. Build the program with and without optimization and compare the results - where do they start to differ? Narrow your focus on the computations that changed to see if you can find one thing (hopefully) that triggered the wrong results. But do understand that very small differences in computations are to be expected with optimization, so don't look for "good to the last bit" values.

Some possible causes of differences include:

Different order of operations with cancellation errors. Adding parentheses in expressions can help, but you'll also want to add /assume:protect_parens as the optimizer is known to ignore these sometimes
Mismatched arguments causing read or write of the wrong data

Brian_Murphy · ‎03-14-2020

Excellent! I turned off Runtime Error Checking, and could then create a Debug build with /O2 that seems to be having the same problem as the Release built with /O2. So hopefully I can trace execution in the debugger to find where things go astray.

However, I'm encountering a problem with the debugger in Visual Studio 2020. The subject fortran project is a DLL project, and the Debug menu does not have a Start Debugging command, and F5 does nothing. The DLL is called from Excel, and I have set EXCEL.EXE to be the startup program. This works on another computer with visual studio 2010. What could I be missing? It's working now, but I don't know why.

Steve_Lionel · ‎03-14-2020

Don't get your hopes up too much for the debugger - with optimization, variables won't be where you think they are and stepping through the code will look insane, as the pointer bounces around your source. You might see if /O1 shows the problem - that would be a bit easier to debug. But I still think my suggestion of logging intermediate calculations is the best approach - I've used this many times.

Brian_Murphy · ‎03-14-2020

Indeed I am seeing what you described in the debugger, so I'm writing data to disk. I'm on to something because I'm seeing NaN appearing where it shouldn't.

Steve_Lionel · ‎03-14-2020

Ah - try setting Fortran > Floating Point > Floating Point Exception Handling > Underflow gives 0.0; Abort on other IEEE exceptions /fpe:0

Brian_Murphy · ‎03-14-2020

I tried the floating point exception thing, but that didn't tell me anything new.

However, I have tracked down very simple calculation which produces a different result with /O2 and /Od.

The program was written in about 1980 and use single precision. A pair of input values are 2.0, and when one is divided by the other the result is 1.000000 with /Od, but 0.9999999 with /O2. Is this normal? I'm not yet sure if this small difference is why /O2 eventually produces NaN, but it might be.

I don't know if this important, but the values input to the program are stored in real*8 variables. These are then copied by assignment statements to real*4 variables, and the *4 variables are used for all subsequent calculations.

Get this. This statement STEPW(i)=READANG(12,i)/readpad(5,i) is in a DO loop and is doing single precision 2.0/2.0 and producing STEPW(i)=0.9999999. If I put WRITE(iunit,*)'hello world' immediately after this statement, the result is 1.000000 instead of 0.9999999. That is a Debug build with /O2.

So the exact conditions causing 2.0/2.0=0.9999999 are fickle.

mecej4 · ‎03-14-2020

Sorry, I do not find your descriptions sufficient to lead to your conclusions and conjectures. Perhaps, the apocryphal narrative is the outcome of oversimplification.

It is a property of the X86/X64/X87/SSEn hardware that real numbers (single, double or extended precision) are represented in radix 2. Within the range of real numbers that the hardware can handle, all integer powers of 2 are represented exactly.

When numbers are converted from/to decimal representations in strings or text files, loss of precision can occur, but not for numbers such as 1.0 or 2.0? -- show us proof!

Here is an example code that I wrote based on your narrative. I didn't see any reason to get down to the nines yet.

program twobytwo
real*8 :: xd=2d0, yd=2d0, zd
real   :: x, y, z
character(8) :: str = '2d0, 2d0'
x = xd
y = yd
z = x/y
print '(ES22.15)',z
read(str,*)xd,yd
x = xd
y = yd
z = x/y
print '(ES22.15)',z
end

As far as this toy program is concerned, the level of optimization used is immaterial. Please tell what options to use, if any exist, to make the answer come out different from 1.

When some level of optimization has been used, and debugging has also been enabled, the debugger may deceive you often. A local variable or index may only exist in a register or the register may contain a value that has not been synchronized with memory. The debugger can show you the value from memory, but the program may be using the value in the register.

When you add PRINT statements to your code, some optimizations may be inhibited, as a result of which an optimizer bug may disappear when PRINT statements are added.

These are some of the reasons why a symbolic debugger should be used with an abundance of caution and skepticism.

Brian_Murphy · ‎03-15-2020

This is a difficult situation to explain because to me it doesn't make any sense. I am building the program with visual studio. The compiler command line is: /nologo /Od /module:"Win32\Release\\" /object:"Win32\Release\\" /Fd"Win32\Release\\vc160.pdb" /libs:static /threads /c. Although /Od is sometimes /O2.

	DO I=1,NPADS
		LDANG(I)=READANG(1,I)
		PVANG(I)=READANG(2,I)
		PADANG(I)=READANG(3,I)
		PRELOAD(I)=READANG(4,I)		!this variable holds the input value of preload, it will not change
		PRE(I)=PRELOAD(I)			!PRE will be the preload adjusted for pad deformation, if any
		!DELTH(I)=READANG(5,I)			! PDAM 2.0
		GRANG(I)=READANG(6,I)
		STEP(I,1)=MAX(0.0,READANG(8,I)/PADANG(I))
		STEP(I,2)=MIN(1.0,READANG(10,I)/PADANG(I))
		IF (STEP(I,2).LT.STEP(I,1)) STEP(I,2)=STEP(I,1)
		DEPTH(I,1)=READANG(9,I)/(CLEAR/2.)
		DEPTH(I,2)=READANG(11,I)/(CLEAR/2.)
		STEPW(I)=READANG(12,I)/readpad(5,I) !pad(i) step width over pad(i) axial length
		IT(I)=READPAD(4,I)
		DOL(i)=DIAM/LENGTH(i)							! in V3.2 
		NU(i)=readpad(7,i)							! in V3.2, variable profile exponent
		KOVER(i)=READPAD(8,i)							! in V3.2, hot_oil_carry_factor for each pad
	end do
write(222,*) readang(12,1), readpad(5,1),stepw(1),READANG(12,1)/readpad(5,1)

The above snippet is in the Main program unit. The entire code is about 5000 lines. All of the above array variables are declared as REAL, thereby taking the default size for REAL variables per the ifort.exe compiler options, which here is 4. The code produces the following two outputs, with the only change being the optimization option. NPADS is 4. So it is obvious that something is going wrong somewhere, and the optimization option has an effect on it.

/O2
   2.000000       2.000000      0.9999999       1.000000    
/Od
   2.000000       2.000000       1.000000       1.000000

But wait, there's more. If the write statement is moved into the DO loop immediately after the STEPW(I)= statement, the following output is produced with the /O2 option. So just moving this write statement changed the value in the STEPW variable. In the previous quote mecej4 pointed out that PRINT statements can influence optimization, which could explain this result.

   2.000000       2.000000       1.000000       1.000000    
   2.000000       2.000000       1.000000       1.000000    
   2.000000       2.000000       1.000000       1.000000    
   2.000000       2.000000       1.000000       1.000000

Steve_Lionel · ‎03-15-2020

Rather than use * for the format, do this again with '(4E17.7)'

Then go read Improving Numerical Reproducibility in C/C++/Fortran

mecej4 · ‎03-15-2020

Of the variables in the WRITE IOlist, only STEPW is changed inside the DO loop. Therefore, the discrepancies in READANG and READPAD must have existed before the DO loop. These discrepancies were possibly created as a result of the optimizations in parts of the code that preceded (in execution order, not lexical order) the lines that you showed.

Try using a format that shows more digits, as Steve recommended, remove STEPW from the IOlist and move the WRITE statement to the line above the DO statement. Now compile and run with different optimization levels. If you see a difference, trace the execution back to an earlier place where READANG and READPAD were written to.

I suspect that you will find that either or both READANG and READPAD are close to, but not exactly equal to, 2.000000.

jimdempseyatthecove · ‎03-15-2020

Try an experiment. Remove the PRINT after the STEPW(I)=. Then after the loop, add an additional WRITE statement with a FORMAT 2229 to output using 4(Z0,X) as the edit descriptor.

You may find that in the Debug and Release build, were you see 1.000000/0.9999999, you may have a number in either or both of readang(12,1), readpad(5,1) that is a tad higher or tad lower than an exact integer. The real cause observing the 9999999's is in the WRITE output formatting.

Jim Dempsey

Brian_Murphy · ‎03-15-2020

I put the following two write statements after the loop.

write(222,'(4E20.10)')  readang(12,1), readpad(5,1), READANG(12,1)/readpad(5,1), stepw(1)
write(222,'(4(Z0,1x))') readang(12,1), readpad(5,1), READANG(12,1)/readpad(5,1), stepw(1)

with /O2 I get this.

    0.2000000000E+01    0.2000000000E+01    0.1000000000E+01    0.9999999404E+00
40000000 40000000 3F800000 3F7FFFFF

Steve_Lionel · ‎03-15-2020

Ok, off by one LSB. That's not surprising with any change to order of operations. Is this the cause of the "wrong results" you see later? If so, your program is way too sensitive to small differences and might benefit from double precision.

jimdempseyatthecove · ‎03-15-2020

These results show that after the loop, the result stored in stepw(1) within the loop differs by 1 least significant bit from the result generated outside the loop. The division inside the loop should have produced an exact result (power of 2 divided by power of 2).

Can you make a simple reproducer?

program
variable declarations
initialize data
DO
...
END DO
write...
write...

Verify that it exhibits same symptom and if so, post here and also file a bug report (there is a button on page with create new thread).

Jim Dempsey

mecej4 · ‎03-15-2020

Here is a reproducer. I think that the optimizer is able to pre-calculate the result to be printed, and is off by 1 bit in the precomputed value, which is calculated at compile time, and stores that value in the EXE. I had a similar experience with another vendor's compiler several years ago. The results from floating point calculations done at compile time need not match the same calculations made at run time. (The code, if I remember, used 1E30 as an initial value, and the program had IF (x .eq. 1E30) THEN, and the test for equality failed.

program bmxr
implicit none
integer :: i, npads = 2
real, dimension(5) :: ldang, pre, grang, stepw, it, dol,nu,kover,length
real, dimension(2,5) :: readang
real, dimension(5)  :: readpad

readpad = 2.0
readang = 2.0
DO I=1,NPADS
    STEPW(I)=READANG(2,I)/readpad(I)
end do
write(*,'(4ES20.10)') readang(2,1), readpad(5),stepw(1),READANG(2,1)/readpad(5)
write(*,'(4Z20)') transfer(readang(2,1),i), transfer(readpad(5),i), &
   transfer(stepw(1),i),transfer(READANG(2,1)/readpad(5),i)
end program

The output with /Od, using the 19.1 32-bit compiler:

    2.0000000000E+00    2.0000000000E+00    1.0000000000E+00    1.0000000000E+00
            40000000            40000000            3F800000            3F800000

and with /O2:

    2.0000000000E+00    2.0000000000E+00    9.9999994040E-01    1.0000000000E+00
            40000000            40000000            3F7FFFFF            3F800000

Brian_Murphy · ‎03-15-2020

The eventual wrong results when using the /O2 option is due to poor programming practice at a later place in the code. It does one thing if stepw(1) is exactly 1.0 and something else if it's not. It's on my list to fix that.

I suppose it's a matter of opinion if the reproducer made by mecej4 reveals a flaw with the optimizer, or not. One LSB is certainly splitting hairs. However, most people might expect that calculating the ratio of two identical values ought to be exactly 1 regardless of the optimizer setting.

FortranFan · ‎03-15-2020

Brian Murphy wrote:
.. poor programming practice.. It does one thing if stepw(1) is exactly 1.0 and something else if it's not. It's on my list to fix that.

"Poor programming practice" will be an understatement if "if stepw(1) is exactly 1.0" is coded as "IF (STEPW(1) .EQ. 1.0)" or something similar. "It's on my list to fix that", yes, can't be soon enough, please look into EPSILON intrinsic.

Brian Murphy wrote:
.. most people might expect that calculating the ratio of two identical values ought to be exactly 1 ...

With floating-point arithmetic, not since 1969!!