helped needed with optimisation issue

jim_cox · ‎11-08-2011

Hello again

As you may recall I maintain a large legacy Fortran codebase developed over many years and many different compilers

With the Intel compilers, one of our major programs is giving problems. If optimisation is turned on, it will run once through the main loop and then hang. To get it to run we have to use /Od

This particular program is also a prime candidate for parallelization.

Now I'm sure the problem stems from our code

So any hints on finding / fixing these sort of issues would be greatfully received.

Cheers

Jim

TimP · ‎11-08-2011

If you don't know the responsible subroutine, you could make a complete set of .obj files with the best working optimization, and another set with the most conservative failing optimization, then link and test combinations until you find the culprit. If you don't wish to split the source files, the compile option level directive
!dir$ optimize:N ( where N=0|1|2) may be of some help.
I don't know whether to infer that a set of optimizations like -O1 -fp:source -Qsave -Qzero is already showing a failure, but that comes to mind as a possible minimum step up from -Od. I suppose you already tried on the compile time source checker and run time /check options.

jim_cox · ‎11-08-2011

Amongst others we have set:

/warn:declarations
/warn:interfaces
/Qzero

But not /Qsave - so I'll add that.

What does the /fp:source do?

We also compile a series of library routines separately - so I'll try optimising that separately.

Thankx for your help

jimdempseyatthecove · ‎11-08-2011

The symptoms you describe often relate to (non)initialization problems that by chance did not cause noticable problems earlier. An additional source for these problems in legacy programs is improper use of COMMON variables that caused the program to work earlier by chance and fail now by chance.

This is not to say that there isn't an optimization bug, which is a small possibility.

Use a binary search type of method.

Compile all source fileswith /Od and verify the program works.
Recompile first half of files with optimizationsand test

if program works then
recompile half of second half (3rd quarter) with optimizations
if program works then
recompile half of4th quarter with optimizations
if program works then
...
else
...
endif
else
recompile...
if...
...
else
...
endif
endif
else
recompile second half of first half with /Od
if program works then
...
else
...
endif
endif

This should quickly locate the source file or files that are giving you the problem.
Once you identify the files, then look closer as to why the problem occures.

If your source file list is relatively small you can compile the sources one at a time with/without optimizations.

One of the projects I worked on in the past had over 700 source files. The "binary" search worked well with that project.

Jim Dempsey

jim_cox · ‎11-08-2011

>The symptoms you describe often relate to (non)initialization problems that by chance did not cause noticable
>problems earlier. An additional source for these problems in legacy programs is improper use of COMMON
>variables that caused the program to work earlier by chance and fail now by chance.

Thats pretty much what I figured - and we do have some very very ugly old code :(

As mentioned above, I've tried optimising the main libray routines, which seems to work ok

I'll give your binary technique a go - not counting the library, there are only 72 source files.

I was also wondering if some of the compatibility options turned on might be having an effect.

Thankx for your input.

jimdempseyatthecove · ‎11-09-2011

Recently in some old code that may have worked flawlessly since 1985 a problem occurred (let's say was finally discovered). The problem resided in the use of named COMMON /namehere/ where the variable declarations in one source file was not the same as in a different source file. This is not necessarily a problem as long as the programmer understands that the compiler will remap and interpret those locations as directed. The original programmer is likely not supporting the current code base, or may have forgotten they remapped the common data. Now as code is revised, problems pop-up in code that was not touched.

Note, be very careful about "fixing" named commons that define the storage area differently. You will have to look at any changes very carefully because any change may introduce unintended consequences.

Good luck bug hunting.

Jim Dempsey

jim_cox · ‎11-09-2011

You're right about the 'unintended consequences' thing

Some initial testing has shown that turning on optimisation produces different results

The differences are generally small but noticeable

For example without optimistaion

CONVERGENCE TOTALS - ALL TRIPS
==============================

Trips Total 228987.41
Vehicle Minutes 2652389.08 11.58 min per trip
Vehicle Kilometers 2586738.34 11.30 km per trip

and with

CONVERGENCE TOTALS - ALL TRIPS
==============================

Trips Total 228987.41
Vehicle Minutes 2644930.62 11.55 min per trip
Vehicle Kilometers 2586679.35 11.30 km per trip

And the only difference between the program runs being the /O2 compiler switch

Are there any ways to force the same numerical results while still gaining the speed improvement?

TimP · ‎11-09-2011

I suggested /fp:source as a comparison which might interest you. That sets /Qprec-div /Qprec-sqrt /Qftz- /assume:protect-parens, thus producing more accurate results, and disables vectorization of sum reduction, thus avoiding numerical differences associated with alignment.
If your application depends on extra precision evaluation of expressions, building in (32-bit) /arch:IA32 mode will have the effect of evaluating single precision expressions in double precision, just as if you promoted it explicitly in source code. It also removes vectorization.

jim_cox · ‎11-09-2011

Thankx for your input Tim

We were already compiling with /arch:IA32 set

and setting /fp:source has made no difference to the numbers produced by the optimised program.

Do you thing playing with the floating point speculation settings wold have an effect? - At present it is defaulting to fast...

TimP · ‎11-09-2011

As you have /arch:IA32, you can expect changes in single precision results with optimization, as promotion to double precision may carry across statements rather than every assignment narrowing down to single precision.

mecej4 · ‎11-09-2011

These differences (between optimized and unoptimized)

Vehicle Minutes 2644930.62 11.55 min per trip
Vehicle Kilometers 2586679.35 11.30 km per trip

are over 2 percent, much more than is reasonable for a straightforward calculation even with single precision. What kind of calculations does your program perform? One does not expect this much error even when integrating chaotic differential equations over a few orbits.

I suspect that you have errors in your code, such as uninitialized variables or array overruns. Finding the bugs may be time consuming and uninviting, but I believe that playing with compiler options is a waste of time until those bugs are fixed.

jimdempseyatthecove · ‎11-09-2011

This sounds as if you are performing (storing)your calculations in single precision.

If you are performing in single precision and .NOT. using SSEplease bear note that unoptimized code will have more stores ofintermediary results to memory, thus rounding the 80-bit FPU temps (FPU stack variables) to 32-bit memory variables. Compiling with optimizaiton enabled with tend to keep more intermediary results within the FPU temps (FPU stack variables) and thus not round-off the 80-bit FPU temps.

If you are using SSE then disregard the above.

If you can, see if you can set your variables to use double precision REAL(8) then compile with optimizations enabled and compare the REAL(8) optimized results with both REAL(4) results. I will venture to guess the REAL(4) optimized results are closer to the REAL(8) results. (you may try double precision unoptimized to fill out our comparison table).

Did the "binary search" of optimized code indicate one (or few) source files are sensitive to optimization? If so, maybe you can show us a code snip or two of where you think the problem lies.

Jim Dempsey

TimP · ‎11-09-2011

What Jim is discussing here about extra precision with /arch:IA32 is the same point I was making. ifort follows Microsoft practice in setting 53-bit (standard double) precision mode. Unless you set /Qpc80 or otherwise modify the initialization of x87 modes, you aren't using the full 80 bits; you will see effective promotion to double precision for single precision expression values not stored to memory.

jim_cox · ‎11-13-2011

We are using /arch:IA32 and no SSE stuff as we had issues with some machines not being able to run the code. It took quite a while to find the compiler settings to get our old code to run.

The codebase stems originally from PDP days - and shows a definate 16 bit (not 32 bit) bias. Almost all floating pointing results are calculated as real*4

The effort of making it all real*8 is significant - and I also worry about being able to allocate enogh memory for the large number of large arrays we use.