Re: Strange behaviour: Screen output changes calculation result

pgruhn · ‎07-12-2006

Hi there,

I'm trying to run an old aerodynamic code that I got from a colleague. It works fine except for one coefficient, where the code produces NaNs as result. So I searched for theroutine where the coefficient is calculated and added some lines to output the current value of the concerned variables to the screen.

I did not change anything in the calculation(the output was only before and after the calculation), but after adding the screen output I now get calculated values instead of NaNs! Moreover, adding another line of screen outputchanges the results again (unfortunately, none of the results are correct as I can see from a validation fileI have at hand).

I'm really puzzled by this behaviour. I guess I encounter some kind of bad memory handling here, but I cannot find any mistakes in the code. I checked for the COMMON-Blocks, but they seem to be defined correctly and consistently.

Can anybody provide any help or hints?

dbruceg · ‎07-12-2006

Unitialized variables?

pgruhn · ‎07-12-2006

The concerned variables are declaredin the COMMON block of the SUBROUTINE AEROHEAT (see below; CFXT,CFYT,CFZT and CFTT are used to calculate CF, the variable that keeps changing). As far as I can see the first variable (CFXT) is set to some calculated value by the program, the other variables (CFYT,CFZT,CFTT) are set to zero before the first call.

If I leave out the first screen output Iget NaN as result. If I leave it out the result is different depending on whether the second or third output line are commented or not.

Another odd thing: If I let the variables be printed to the screen right before AEROHEAT is called, I get NaN again...

SUBROUTINE AEROHEAT

... some declaration

COMMON / SOSE23 / CFXT,CFYT,CFZT,CFTT

... some declaration

WRITE (*,'(A,4(F8.5,5X))') 'Anf.: ',CFXT,CFYT,CFZT,CFTT

... some operation not influencing CFXT,CFYT,CFZT or CFTT

WRITE (*,'(A,4(F8.5,5X))') 'Anf2: ',CFXT,CFYT,CFZT,CFTT

... some operation not influencing CFXT,CFYT,CFZT or CFTT

WRITE (*,'(A,4(F8.5,5X))') 'Anf3: ',CFXT,CFYT,CFZT,CFTT

... long calculation ...

WRITE (*,'(A,4(F8.5,5X))') 'Ende: ',CFXT,CFYT,CFZT,CFTT

300 RETURN

END

anthonyrichards · ‎07-12-2006

1) If you can, add IMPLICIT NONE into all your routines and recompile and link. This should help you to find uninitialized variables

2) You must check that EVERYWHERE that you use COMMON/SOSE23/ that the declarations of the variables in the COMMON, in TYPE and LENGTH,whatever they are named, agree in every case. Check also that the variables occur in the same order as well.Otherwise you may be having alignment problems. Much better in this case to put the COMMON variables into a MODULE and USE it where you presently have the COMMON/SOSE23/ and its associated type declarations.

jparsly · ‎07-12-2006

Do you have array bounds checking turned on? When you store data in an array, your program uses the array indices to calculate an offset from the beginning of the array where the data should be placed. If your array indices are out-of-bounds, the data can end up in some other variable, or it can replace part of your code.

A simple example:

COMMON /COLD/ A(100),B

I=101

A(I) = 5

In this example, B will be set to 5.

Fortran programs don't normally check for out-of-bound references, since this can make a program run significantly slower. In some cases, a program will be "lucky", and get away one of these errors for years because the out-of-bounds location was in a location that was not in active use at the time of the reference. These cases often get uncovered when switching to a new compiler, which maps out the memory differently.

Note that adding write statements will cause the memory layout of variables to be different, which may explain why your results change.

Bounds checking won't catch all cases of out-of-bound indices. If you pass an array to a subroutine, it may not be able to tell what the bounds are. You often see code like this:

DIMENSION A(100)

CALL SUB(A)

.

.SUBROUTINE SUB(A)

DIMENSION A(1)

A(100) = 10

I=101

A(I)=5

With bound checks turned on, depending on your compiler it will either complain for both the A(100) statement and the A(I) statement, or it won't complain about either one. The reasoning behind the DIMENSION A(1) is apparently due to some historical shortcoming in the early versions of Fortran, and crops up often enough that some compiler vendors have made a special case out of it for the purposes of bounds checking.

pgruhn · ‎07-14-2006

Well, first of all, thanks for your quick answers, I really appreciate your help.

@anthonyrichards

The definition of variables is IMPLICIT REAL*8 (A-H , O-Z) throughout the code, so the variables in COMMON /SOSE23/ are consistentin TYPE and LENGTH. They do occur in the same order everywhere. Changing to IMPLICIT NONE seems not practical because this would mean to declare all variables again by hand, which might give room for some new added errors.

It would be convenient to put all COMMON variables in one MODULE (thats how I do it on all programs of my own), but unfortunately there are some variables names defined twice in differentCOMMON-blocks (which of course are not used at the same time, but in different subs), so if I replace all COMMON blocks in all subroutines by one MODULE I have to change the variable names as well. I'm a bit hesitant to do so because it is a rather large code.

@jparsly

Array bounds checking is turned on. I'm not sure what you mean by memory layout in "Note that adding write statements will cause the memory layout of variables to be different, which may explain why your results change."

So, thanks again, butmy problem remains unsolved and any further hints are still welcome. Meanwhile I start thinking about Anthonys advice to replace all variable definitions by one MODULE, although this involves a hell lot of work ...

anthonyrichards · ‎07-14-2006

This looks like a perfect job for using the debugger. Select debug, insert breakpoints where you presently have WRITEs and start the debugger and run to the first breakpoint. Monitor the COMMON block variables (and others that are relevant)in a watch window. Step through the code. This should certainly give you more information.

onkelhotte · ‎07-14-2006

I once have had a similar problem. My function was a different, this is only to explain the problem:

real function calculation(a,b)
calculation=a*b
end function

In release mode, calculation returned NaN, in debug everything was fine. When I used in realease mode a write(*,*) calculation and everything was fine again, like pgruhn experienced with his code. I got it to work correctly that way:

real function calculation(a,b)
result=a*b
calculation=result
end function

I dont know why, but it works :-)

Have a nice weekend,
Markus

anthonyrichards · ‎07-14-2006

..and I hope your program is not part of a safety-critical system!

I use CVF. In my experience, when running in debug mode, all variables appear to be initialised to zero (ALLOCATED arrays may not be). In release mode, all variables are indefinite and are not initialised to zero. So, one difference between Debug and release will occur if you are using an uninitialised variable, which may have to be originally set to zero. Possibly this happens in Debug mode and by accident all is well, but it will bite you in the behind in Release mode when some random, possibly indefinite value is initially present in the variable's memory address.

I no longer use IMPLICIT REAL etc, only IMPLICIT NONE and ensure all my variables (including ALLOCATED arrays) are explicitly initialised at some stage. This removes a whole raft of possible problems, leaving exceeding array bounds as the next major possibility for undefined or unexpected results.

pgruhn · ‎07-17-2006

Finally I found an error in the code, it was indeed a problem of uninitialized variables. Some variables were initialized within an IF-THEN-Statement (which I somehow missed for some time), so they were not initialized in all cases which then lead finally to a division by zero and thus to my NaN results.

The reasoning behind the IF-THEN-statement was probably to spare some extra calculations, because the calculated variables did not change since the last call of SUBROUTINEAEROHEAT, but since the concerned variables had no COMMON property, IF9 did not keep their values.

I can only guess that this was not a problem forthe compiler the code was originally written for,so this error was not noticed before.

I still don'tfully understandhow the screen output affected the results,but now that I deleted the IF-THEN statement this is not a problem anymore and I finally get some results . To me this is a good example that Anthony is right, its always the best to define all variables explicitly and in an extra module that is implemented via a USE statement.

Thanks again for allthe answers.

michael84 · ‎07-17-2006

How did you find the uninitialized variables? Did you set the /RTCu compiler option on?

We are having a very similar problem to the one you describe. In a quick test the /RTCu option caught uninitialized scalar variables but not uninitialized array variables. Are there other ways to use the compiler to catch these or is manual analysis of the program necessary?

Thanks.

Steven_L_Intel1 · ‎07-17-2006

/RTCu (/check:uninit) does not currently do anything for arrays. In a future update, it may at least detect that an array has had no assignments to any element. At this time we're not doing element-by-element checks.

Strange behaviour: Screen output changes calculation results