I have a program for which a Debug build runs ok, but a Release build doesn't (it runs, but results are wrong). So I changed Release Optimization from /O2 to /Od and then it runs ok. Is this an indication that the program has a bug somewhere?
A weird part about this is that the /O2 code fails to run correctly on one win7 system, but runs successfully on another win7 system. The program is built statically and its only dependencies are KERNEL32.DLL and IMAGEHLP.DLL
The /Od option disables optimization.
The /Debug option asks the compiler to place extra information in the generated code to enable symbolic debugging. If you specify /Debug without any optimization options, /Od is implied.
The symptoms you described usually imply bugs in the user's code, but it happens occasionally that an optimization bug is encountered.
I tried to /O2 in a Debug build, but ifort.exe reported ifort: warning #10182: disabling optimization; runtime debug checks enabled
How does one debug an optimization issue?
First of all, one has to establish that the code is correct in the sense of the standard (with the addition of extensions to the standard provided by the compiler, if applicable). There should be no array overruns, use of uninitialized variables, mismatched subprogram arguments, etc. The mere building and running of the program without compile-time or run-time messages is not sufficient proof of correctness.
It is sometimes helpful to examine the machine level code to prove that an optimization bug is present, but this is usually feasible only after the source code has been pared down to a few scores of lines.
The difficulties are of such a nature that it is not unusual for optimizer bugs to lurk undetected for many years. On the other hand, Intel's Fortran compiler is a market leader and has a large number of users, which increases the probability that someone running into such an error will end up reporting it.
Here is an example:
It is often the case that errors in a program will go undetected without optimization. In your case, you have a straightforward path to identifying the problem.
You can debug optimized code - the message you got indicates that you have certain run-time checks enabled, usually stack checks. In your project's Debug configuration, go to Fortran > Run Time and turn off Check Stack Frame. But for something like this, it is not helpful at the beginning as the optimized code rarely correlates with source code.
I would also suggest turning off the other run-time checks. See if the problem remains. If it does...
"Instrument" your program by writing out intermediate results to a file, in a way that helps you identify where each value comes from in the code. Build the program with and without optimization and compare the results - where do they start to differ? Narrow your focus on the computations that changed to see if you can find one thing (hopefully) that triggered the wrong results. But do understand that very small differences in computations are to be expected with optimization, so don't look for "good to the last bit" values.
Some possible causes of differences include:
Excellent! I turned off Runtime Error Checking, and could then create a Debug build with /O2 that seems to be having the same problem as the Release built with /O2. So hopefully I can trace execution in the debugger to find where things go astray.
However, I'm encountering a problem with the debugger in Visual Studio 2020. The subject fortran project is a DLL project, and the Debug menu does not have a Start Debugging command, and F5 does nothing. The DLL is called from Excel, and I have set EXCEL.EXE to be the startup program. This works on another computer with visual studio 2010. What could I be missing? It's working now, but I don't know why.
Don't get your hopes up too much for the debugger - with optimization, variables won't be where you think they are and stepping through the code will look insane, as the pointer bounces around your source. You might see if /O1 shows the problem - that would be a bit easier to debug. But I still think my suggestion of logging intermediate calculations is the best approach - I've used this many times.
I tried the floating point exception thing, but that didn't tell me anything new.
However, I have tracked down very simple calculation which produces a different result with /O2 and /Od.
The program was written in about 1980 and use single precision. A pair of input values are 2.0, and when one is divided by the other the result is 1.000000 with /Od, but 0.9999999 with /O2. Is this normal? I'm not yet sure if this small difference is why /O2 eventually produces NaN, but it might be.
I don't know if this important, but the values input to the program are stored in real*8 variables. These are then copied by assignment statements to real*4 variables, and the *4 variables are used for all subsequent calculations.
Get this. This statement STEPW(i)=READANG(12,i)/readpad(5,i) is in a DO loop and is doing single precision 2.0/2.0 and producing STEPW(i)=0.9999999. If I put WRITE(iunit,*)'hello world' immediately after this statement, the result is 1.000000 instead of 0.9999999. That is a Debug build with /O2.
So the exact conditions causing 2.0/2.0=0.9999999 are fickle.
Sorry, I do not find your descriptions sufficient to lead to your conclusions and conjectures. Perhaps, the apocryphal narrative is the outcome of oversimplification.
It is a property of the X86/X64/X87/SSEn hardware that real numbers (single, double or extended precision) are represented in radix 2. Within the range of real numbers that the hardware can handle, all integer powers of 2 are represented exactly.
When numbers are converted from/to decimal representations in strings or text files, loss of precision can occur, but not for numbers such as 1.0 or 2.0? -- show us proof!
Here is an example code that I wrote based on your narrative. I didn't see any reason to get down to the nines yet.
program twobytwo real*8 :: xd=2d0, yd=2d0, zd real :: x, y, z character(8) :: str = '2d0, 2d0' x = xd y = yd z = x/y print '(ES22.15)',z read(str,*)xd,yd x = xd y = yd z = x/y print '(ES22.15)',z end
As far as this toy program is concerned, the level of optimization used is immaterial. Please tell what options to use, if any exist, to make the answer come out different from 1.
When some level of optimization has been used, and debugging has also been enabled, the debugger may deceive you often. A local variable or index may only exist in a register or the register may contain a value that has not been synchronized with memory. The debugger can show you the value from memory, but the program may be using the value in the register.
When you add PRINT statements to your code, some optimizations may be inhibited, as a result of which an optimizer bug may disappear when PRINT statements are added.
These are some of the reasons why a symbolic debugger should be used with an abundance of caution and skepticism.
This is a difficult situation to explain because to me it doesn't make any sense. I am building the program with visual studio. The compiler command line is: /nologo /Od /module:"Win32\Release\\" /object:"Win32\Release\\" /Fd"Win32\Release\\vc160.pdb" /libs:static /threads /c. Although /Od is sometimes /O2.
DO I=1,NPADS LDANG(I)=READANG(1,I) PVANG(I)=READANG(2,I) PADANG(I)=READANG(3,I) PRELOAD(I)=READANG(4,I) !this variable holds the input value of preload, it will not change PRE(I)=PRELOAD(I) !PRE will be the preload adjusted for pad deformation, if any !DELTH(I)=READANG(5,I) ! PDAM 2.0 GRANG(I)=READANG(6,I) STEP(I,1)=MAX(0.0,READANG(8,I)/PADANG(I)) STEP(I,2)=MIN(1.0,READANG(10,I)/PADANG(I)) IF (STEP(I,2).LT.STEP(I,1)) STEP(I,2)=STEP(I,1) DEPTH(I,1)=READANG(9,I)/(CLEAR/2.) DEPTH(I,2)=READANG(11,I)/(CLEAR/2.) STEPW(I)=READANG(12,I)/readpad(5,I) !pad(i) step width over pad(i) axial length IT(I)=READPAD(4,I) DOL(i)=DIAM/LENGTH(i) ! in V3.2 NU(i)=readpad(7,i) ! in V3.2, variable profile exponent KOVER(i)=READPAD(8,i) ! in V3.2, hot_oil_carry_factor for each pad end do write(222,*) readang(12,1), readpad(5,1),stepw(1),READANG(12,1)/readpad(5,1)
The above snippet is in the Main program unit. The entire code is about 5000 lines. All of the above array variables are declared as REAL, thereby taking the default size for REAL variables per the ifort.exe compiler options, which here is 4. The code produces the following two outputs, with the only change being the optimization option. NPADS is 4. So it is obvious that something is going wrong somewhere, and the optimization option has an effect on it.
/O2 2.000000 2.000000 0.9999999 1.000000 /Od 2.000000 2.000000 1.000000 1.000000
But wait, there's more. If the write statement is moved into the DO loop immediately after the STEPW(I)= statement, the following output is produced with the /O2 option. So just moving this write statement changed the value in the STEPW variable. In the previous quote mecej4 pointed out that PRINT statements can influence optimization, which could explain this result.
2.000000 2.000000 1.000000 1.000000 2.000000 2.000000 1.000000 1.000000 2.000000 2.000000 1.000000 1.000000 2.000000 2.000000 1.000000 1.000000
Of the variables in the WRITE IOlist, only STEPW is changed inside the DO loop. Therefore, the discrepancies in READANG and READPAD must have existed before the DO loop. These discrepancies were possibly created as a result of the optimizations in parts of the code that preceded (in execution order, not lexical order) the lines that you showed.
Try using a format that shows more digits, as Steve recommended, remove STEPW from the IOlist and move the WRITE statement to the line above the DO statement. Now compile and run with different optimization levels. If you see a difference, trace the execution back to an earlier place where READANG and READPAD were written to.
I suspect that you will find that either or both READANG and READPAD are close to, but not exactly equal to, 2.000000.
Try an experiment. Remove the PRINT after the STEPW(I)=. Then after the loop, add an additional WRITE statement with a FORMAT 2229 to output using 4(Z0,X) as the edit descriptor.
You may find that in the Debug and Release build, were you see 1.000000/0.9999999, you may have a number in either or both of readang(12,1), readpad(5,1) that is a tad higher or tad lower than an exact integer. The real cause observing the 9999999's is in the WRITE output formatting.
I put the following two write statements after the loop.
write(222,'(4E20.10)') readang(12,1), readpad(5,1), READANG(12,1)/readpad(5,1), stepw(1) write(222,'(4(Z0,1x))') readang(12,1), readpad(5,1), READANG(12,1)/readpad(5,1), stepw(1)
with /O2 I get this.
0.2000000000E+01 0.2000000000E+01 0.1000000000E+01 0.9999999404E+00 40000000 40000000 3F800000 3F7FFFFF
Ok, off by one LSB. That's not surprising with any change to order of operations. Is this the cause of the "wrong results" you see later? If so, your program is way too sensitive to small differences and might benefit from double precision.
These results show that after the loop, the result stored in stepw(1) within the loop differs by 1 least significant bit from the result generated outside the loop. The division inside the loop should have produced an exact result (power of 2 divided by power of 2).
Can you make a simple reproducer?
Verify that it exhibits same symptom and if so, post here and also file a bug report (there is a button on page with create new thread).
Here is a reproducer. I think that the optimizer is able to pre-calculate the result to be printed, and is off by 1 bit in the precomputed value, which is calculated at compile time, and stores that value in the EXE. I had a similar experience with another vendor's compiler several years ago. The results from floating point calculations done at compile time need not match the same calculations made at run time. (The code, if I remember, used 1E30 as an initial value, and the program had IF (x .eq. 1E30) THEN, and the test for equality failed.
program bmxr implicit none integer :: i, npads = 2 real, dimension(5) :: ldang, pre, grang, stepw, it, dol,nu,kover,length real, dimension(2,5) :: readang real, dimension(5) :: readpad readpad = 2.0 readang = 2.0 DO I=1,NPADS STEPW(I)=READANG(2,I)/readpad(I) end do write(*,'(4ES20.10)') readang(2,1), readpad(5),stepw(1),READANG(2,1)/readpad(5) write(*,'(4Z20)') transfer(readang(2,1),i), transfer(readpad(5),i), & transfer(stepw(1),i),transfer(READANG(2,1)/readpad(5),i) end program
The output with /Od, using the 19.1 32-bit compiler:
2.0000000000E+00 2.0000000000E+00 1.0000000000E+00 1.0000000000E+00 40000000 40000000 3F800000 3F800000
and with /O2:
2.0000000000E+00 2.0000000000E+00 9.9999994040E-01 1.0000000000E+00 40000000 40000000 3F7FFFFF 3F800000
The eventual wrong results when using the /O2 option is due to poor programming practice at a later place in the code. It does one thing if stepw(1) is exactly 1.0 and something else if it's not. It's on my list to fix that.
I suppose it's a matter of opinion if the reproducer made by mecej4 reveals a flaw with the optimizer, or not. One LSB is certainly splitting hairs. However, most people might expect that calculating the ratio of two identical values ought to be exactly 1 regardless of the optimizer setting.