Odd floating-point problem with -ipo and -ip flag on Itanium Linux Fortran

milos · ‎10-05-2005

Hi everyone,

I have a strange floating point error problem with the -ipo optimization flag in Intel Fortran 9.0 (Itanium, Linux non-commercial version 9.0 Build 20050430 Package ID: l_fc_p_9.0.021). I compile a multi-file code as follows

ifort -O3 -ipo -g -c modes.f90
ifort -O3 -ipo -g aa.o bb.o ... modes.o -o prog

I find that with -ipo used, a variable (beta(i4) in code below) ends up NaN. Without -ipo used, it has the correct answer. Further, if I use -ip instead of -ipo, the same problem occurs, and using -ip -mp (maintain floating point precision) together instead of -ipo above still doesn't help.

Strangely though, even with -ipo (or -ip) used as above, if I place a dummy print*,'Hello' statement at one of a few "right" places in the function Bf (below), the code works fine! Further, in the calling subroutine (modeslvr), uncommenting the print*,'Hello' line (top comment line below) doesn't fix the problem, but uncommenting a print statement (second, 2-line comment below) that calls Bf does fix the problem. The function Bf doesn't operate on any global variables, just does some math and returns the result in the first parameter (bt). Finally, another way to "fix" the problem is to comment out the first 5 lines in the loop below, and uncomment the last 3, which do the same math but rewritten a little (without using Bz, etc).

If anyone can confirm to me an awareness of a problem with -ipo and -ip, or suggest something that I could be overlooking and could be wrong with the code, I would greatly appreciate it. e.g. I worry about things like stacks which I don't know enough about, but the amount of space used by the code is rather tiny.

I've posted below a short insert as the code is sizeable, but if it would help I can post more. I can get around the problem here with the incomprehensible steps above, but I'm afraid to use -ipo/-ip for fear of a random recurrence of the problem.

Thanks for anyone's and everyone's advice in advance,

Milos

Here's the relevant part of the code:
-------------------------------------

SUBROUTINE modeslvr(beta,n,d,wv,lyrs,pol)
IMPLICIT NONE
INTEGER, PARAMETER :: Mmode = 20,Mlyrs =10
INTEGER, INTENT(in) :: pol,lyrs
DOUBLE PRECISION, INTENT(in):: wv
DOUBLE PRECISION, DIMENSION(Mlyrs), INTENT(in) :: n,d
DOUBLE PRECISION, DIMENSION(Mmode), INTENT(out) :: beta
DOUBLE PRECISION, DIMENSION(2) :: bt,Bg,Bz
DOUBLE PRECISION :: k,Bf,Bp,dltbt
INTEGER :: i1,i2,i3,i4,Mmax

...[snip]...

beta(i4) = (bt(1)+bt(2))/2.0d0
DO i2 = 1,5
Bz(1) = Bf(beta(i4)-dltbt,k,n,d,lyrs,pol)
Bz(2) = Bf(beta(i4)+dltbt,k,n,d,lyrs,pol)
Bp = (Bz(2)-Bz(1))/(2.0d0*dltbt)
print*,'dly:',Bz(1),Bz(2),Bp,(beta(i4)-Bz(2)/Bp)
beta(i4) = beta(i4) - Bz(2)/Bp
! print*,'Hello'
! print*, ( Bf(beta(i4)+dltbt,k,n,d,lyrs,pol) - &
! Bf(beta(i4)-dltbt,k,n,d,lyrs,pol) )/(2.0d0*dltbt)
! Bp = ( Bf(beta(i4)+dltbt,k,n,d,lyrs,pol) - &
! Bf(beta(i4)-dltbt,k,n,d,lyrs,pol) )/(2.0d0*dltbt)
! beta(i4) = beta(i4)-Bf(beta(i4)+dltbt,k,n,d,lyrs,pol)/Bp
ENDDO

...[snip]...

DOUBLE PRECISION FUNCTION Bf(bt,k,n,d,m,pol)
IMPLICIT NONE
INTEGER, PARAMETER :: Mmode = 20, Mlyrs = 10
DOUBLE PRECISION, DIMENSION(Mlyrs), INTENT(in) :: n,d
DOUBLE PRECISION, INTENT(in) :: bt,k
INTEGER, INTENT(in) :: pol,m

...[snip]...

TimP · ‎10-06-2005

In the 6 months or so since that compiler was issued, significant improvements have been made. It's unfortunate, if that is the only compiler available for free evaluation, that it isn't more up to date.
It's not at all unusual for addition of a print statement to suppress problems with optimization, regardless of whether the compiler is responsible, or the programmer has failed to initialize data. The compiler does have options to catch some of the latter problems; you didn't say if you tried them.
Using array elements such as Bz(), Bz(2), beta(i4) in this way, where you could easily have used local scalar names, could be a nasty way to attempt to trip up an optimizing compiler.

milos · ‎10-06-2005

Tim,

|> In the 6 months or so since that compiler was issued,
|> significant improvements have been made. It's unfortunate,
|> if that is the only compiler available for free evaluation,
|> that it isn't more up to date.

I downloaded the 9.0 compiler 3 days ago. Previously I had been using version 7.1 (commercial ver) which worked fine,
and seems to work fine for this case with and without -ipo and -ip (but doesn't seem to have a -ftrapuv or equivalent flag).

|> compiler is responsible, or the programmer has failed
|> to initialize data. The compiler does have options to catch
|> some of the latter problems; you didn't say if you tried them.

Thanks, I had not been aware of this flag to check for uninitialized variables. I tried compiling with both -ftrapuv (catch uninitialized vars) and -implicitnone (to make sure all variables were defined). With -ftrapuv it now crashes [forrtl: error (65): floating invalid], rather than before just giving a NaN when compiled with -ipo or -ip, and just going on about its business.

But without -ipo/-ip, or with strategically located print statements, or with the rephrased math lines (and no print statements) it works both with and without -ftrapuv and -implicitnone (although I see that -ftrapuv gives no guarantee of a catch). I also eyeball checked the code to be sure that all variables leading to that point were initialized (they are), by which I assume you mean just that they are assigned a value prior to being used otherwise.

|> Using array elements such as Bz(), Bz(2), beta(i4) in this
|> way, where you could easily have used local scalar names,
|> could be a nasty way to attempt to trip up an optimizing
|> compiler.

Yes, thanks for the comment. Actually, this piece of code doesn't need to be optimized. It just needs to "survive" the compiler optimization of other, heavy computing parts which do need the optimizations like loop unrolling, etc.

Incidentally, I use identical options for all source files. Is it legit to use nonuniform compile options on different source files that are part of the same final executable? I.e. that I leave out -ipo on some files and use it on others?

Would you know, is there a place online where the ongoing updates to the compiler are documented, like a version history?

Thanks for your reply and comments.

Milos

Intel_C_Intel · ‎08-02-2006

hi everybody,

i have an even more simple piece of code to find out about the error. and i have also an possible solution. bad news: the error happens on IA32 machines too.

the loop below would not work as expected if the compiler flag "!DEC$ NOVECTOR" is not included. to find this error in an large aeroacoustic code is not simple as you might guess. especially as it did not produce an FPE, but simply took data from a wrong portion of the field F. KA and KE are known to be 1 at compile time, so the second (J)-loop is vectorized by -ipo. other options seem to be insignificant for this optimization. eg it happens with -parallel set or unset and with -O2 and -O3.

!DEC$ NOVECTOR
DO I=IA,IE
DO J=JA,JE
DO K=KA,KE
IJC = NST(IMB) + (I-1)*JL(IMB)+J
IJEX = IJEX + 1
FEXS(IJEX)=F(IJC)
ENDDO
ENDDO
ENDDO

i needn't say, that F and FEXS are large fields. they have not the same size, but are large enough to hold the assigned indices (IJEX runs up to 13000, IJC up to 250000). i do not know it exctly, but could the problem be that J runs from 1 to 3 and this is not simply vectorizable by making pairs of it and run over J=JS,JE,2? it could also be some prefetching that may not work for the large fields (unfortunatelly i run more or less randomly across it).

anyway, i think this loop should not have been vectorized, from what i read at the intruducing pages for the vectorization. therefore adding the "!DEC$ NOVECTOR" helped. i found out about the faulty loop by placing a print inside, which also swiches vectorization off, as it is a function call. function calls cannot be vectorized according to intel.

Intel_C_Intel · ‎08-02-2006

Dear Christoph,

Are you sure this is a vectorization issue? Switch ipo alone does not enable vectorization and the directive seems misplaced (furthermore, placing directives in the code sometimes impacts other optimizations). In any case, the example contains too manyunknowns to help you further. If possible, could you please construct a unit test case and email this to me (aart.bik@intel.com)?

Sincerely,

Aart Bik

http://www.aartbik.com/