Solved: Compiler bug - floating point invalid

cooper__db · ‎02-27-2014

The following code produces a floating point invalid error at run-time when using the flags -O2 -fpe0 -fltconsistency:

program example

   integer, parameter :: leng = 300
   real (kind = 8), dimension(leng) :: errors
   integer :: i

   do i = 1, leng
      errors(i) = 1d-20 * sin(float(i))
   end do

   print*, minval(errors)

   where(errors < 0) errors = -100.0_8 * errors / minval(errors)

end program example

This occurs at the line beginning "where". This is also sometimes a problem for maxval, but it was easier to recreate with minval. If a new real(8) is created to hold the value of minval(errors) before the where, this fixes the problem. Also, removing the fltconsistency flag fixes the problem.

-Dave

Steven_L_Intel1 · ‎03-03-2014

If you are using version 12.1, and can't reproduce the problem with a newer compiler, I don't think a report to Premier Support would be fruitful. At this point I would recommend simply installing the newest compiler you're eligible for (I don't see your forum email address in our records, so maybe you're registered with a different one.)

View solution in original post

jimdempseyatthecove · ‎02-27-2014

A problem with your code is:

where(errors < 0) errors = -100.0_8 * errors / minval(errors)

The where statement scans array errors(:) for values < 0, when found, searches entire array of errors for minval, and uses that value in the errors = statement. Then the next hit in where(errors < 0), causes a new search of entire array for minval, which may have changed due to the prior assignment. At some point, the value of the minval, may have been reduced to -0.0. (note the expression contains 3 negative numbers, potentially including -0.0).

When you obtained the minval(errors) outside of the where statement, you would have obtained the minval prior to any substitutions made by the where statement.

Jim Dempsey

jimdempseyatthecove · ‎02-27-2014

Correction,

When I run the program here, with added print of minval after where.

The result is -100.000

Jim Dempsey

cooper__db · ‎02-27-2014

Jim,

Thank you for the comments. Indeed, I thought something like that would be the culprit when I first started in on this bug. Also, performing the minval() outside of the where is not only a safer, but also a faster solution (although that's probably a moot point with -O2).

Anyway, I chose the sin() function to generate the errors array since sin values of positive integers are never 0.

-Dave

Martyn_C_Intel · ‎02-28-2014

I wasn't able to reproduce the exception when I tried. Please could you specify:

IA-32 or Intel 64;

The compiler version (from ifort -V)

The processor type on which you are running?

I don't think the behavior described by Jim should occur; the semantics of WHERE are that the mask is evaluated before any operations on the left hand side are performed, and even if speculated, MINVAL should be performed on the original contents of A.

jimdempseyatthecove · ‎03-01-2014

Martyn,

It is not the mask of the where, but the assignment statement(expression) that references the array being modified.

Jim Dempsey

Izaak_Beekman · ‎03-01-2014

I must say, without digging through the standards, the semantics of what actually happens on line 12 with the call to minval are not at all clear to me. I would urge anyone using a where construct to avoid using functions which operate on arrays and return scalars, just as a stylistic matter. Looking at Fortran 95/2003 explained, page 109, it appears that the masking is only applied to elemental function calls on the RHS of the assignment operator within a where statement. So for a reduction operation, like minval, the entire array errors is passed to it, as Jim noted above. However, I seem to recall (although I may be mistaken) that the semantics of the where statement are equivalent to array assignment statements, so there should be no danger of updates to the LHS causing changes to the RHS--the ENTIRE RHS is evaluated before being assigned to the LHS. I would guess that this might require the creation of some array temporaries which might have an impact on performance, but I think that line 12 in the OP is completely safe, and valid fortran, however difficult it may be to understand.

Under ifort 13.0.2 20130314 running the code you posted above with the addition of a `print*, minval(errors) ,maxval(errors)` after the where statement I get no runtime or compile time errors and the following output:

[bash]

$ ./a.out
-9.999902248382567E-021
-100.000000000000 9.999118447303771E-021

[/bash]

cooper__db · ‎03-02-2014

I am running an older version of ifort: 12.1.3.293 Build 20120212. This is installed on an Intel-64bit Linux (CentOS) system.

The code itself is quite old - dating from the early '90s. I only discovered this bug when our development team was checking to make sure our code didn't break when we turned on optimization. I have since changed the where line somewhat:

real (kind = 8)  :: d_minerr
.
.
.
d_minerr = 1 / minval(error)

where(errors < 0) errors = -100.0_8 * errors * d_minerr

However, I did discover while debugging that if I calculated the minval as below that I still got the floating invalid:

minerr = minval(errors)

where(errors < 0) errors = -100.0_8 * errors / minerr

It sounds like everyone is having trouble recreating this. This only happens for me when optimization L2 is also on. So, just to make sure, you are compiling with -O2 -fpe0 -fltconsistency, correct? If so, then I'm guessing this issue must have been addressed at some point between 12.x and 14.x, whether directly intentional or not, by reworking the Fortran to assembly optimization magic, and I will leave it at that.

FortranFan · ‎03-03-2014

Dave wrote:

I am running an older version of ifort: 12.1.3.293 Build 20120212. This is installed on an Intel-64bit Linux (CentOS) system.

...

It sounds like everyone is having trouble recreating this. This only happens for me when optimization L2 is also on. So, just to make sure, you are compiling with -O2 -fpe0 -fltconsistency, correct? If so, then I'm guessing this issue must have been addressed at some point between 12.x and 14.x, whether directly intentional or not, by reworking the Fortran to assembly optimization magic, and I will leave it at that.

I'm not sure any differences between Intel Fortran 12.x and 14.x can explain the problems you're having. There must be something more to this; some detail you're missing out on. You may want to consider sharing all your files - source, makefile, build output, etc. - with Intel Premier Support for them to take a look.

A sidebar issue, but something you want to follow more strictly as you take this old code and start supporting it in the future, is being consistent with kind representation of real numbers. In your example, you use a mixed bag of KIND=8 specification, use of FLOAT function which produces a result with KIND=4, and a real constant (1) with default kind. Consistency along the lines in the sample code below may help you minimize floating point issues in your larger code:

[fortran]

PROGRAM example

!.. Establish working precision and use it consistently

INTEGER, PARAMETER :: WP = SELECTED_REAL_KIND(15,307)

!..

INTEGER, PARAMETER :: leng = 300

INTEGER :: i

REAL(KIND = WP), DIMENSION(leng) :: errors

REAL(KIND = WP) :: d_minerr

DO i = 1, leng

errors(i) = 1E-20_wp * SIN(REAL(i, KIND=WP))

END DO

d_minerr = 1.0_wp / MINVAL(errors)

PRINT*, d_minerr

WHERE(errors < 0.0_wp) errors = -100.0_wp * errors * d_minerr

PRINT*, MINVAL(errors)

END PROGRAM example

[/fortran]

The above code doesn't have any floating-point issues on Windows platform with Intel Fortran 9.1 or with Compaq (DEC) Fortran 6.6 (dated circa 1999) which was a predecessor to Intel Fortran.

Steven_L_Intel1 · ‎03-03-2014

I tried version 12.1, 64-bit, with the options you specified and did not get any errors.

cooper__db · ‎03-03-2014

FortranFan: Once the ice storm clears out of here and I can get back into work, I will submit the files you suggested to premier support. It is starting to sound like there may be something wrong with the compiler on my server, so hopefully this can all be sorted out.

Thanks everyone for the suggestions and support.

Steven_L_Intel1 · ‎03-03-2014

If you are using version 12.1, and can't reproduce the problem with a newer compiler, I don't think a report to Premier Support would be fruitful. At this point I would recommend simply installing the newest compiler you're eligible for (I don't see your forum email address in our records, so maybe you're registered with a different one.)

Martyn_C_Intel · ‎03-03-2014

I could reproduce something a bit similar with the executable code as written, if the actual dimension of the errors array is larger than leng, and the extra elements include a NaN, e.g.:

program example
integer, parameter :: leng = 300
real (kind = 8), dimension(0:leng) :: errors
integer (kind = 8) :: i, SNaN
equivalence(errors, SNaN)
data SNaN /ZFFF1000000000000/

do i = 1, leng
errors(i) = 1d-20 * sin(float(i))
end do

where(errors < 0) errors = -100.0_8 * errors / minval(errors)

end program example

But there's no dependence on -fltconsistency. Or one could imagine a bug where the compiler created a temporary copy of the array errors for use in evaluating the left hand side, but did not initialize it, so that zero divided by zero gave an invalid exception. But like Steve and others, I was unable to reproduce the exception in the code as written with any compiler version.

I suggest:

1) repeat the test with a 14.0 compiler. (If necessary, you should be able to download an evaluation version). If you don't see the problem with 14.0, then I agree that there's probably little to be gained by pursuing it further.

2) if you still see the problem, then attach here both the assembly code, created with -S, and the exact source file used to create it. If there's a problem with the current compiler, we'd like to get to the bottom of it.

Martyn

cooper__db · ‎03-03-2014

Steve,

Unfortunately, we have already frozen the compiler for post-launch processing. The compiler is registered through the company I work for, SSAI. In either case, I would have to do some prodding of management to get the credentials to set up a premier account, but that shouldn't be a problem. We will have the final production server running in about at month, at which time I will test this issue again. In the meantime, I will ask the SA to install the evaluation version of 14.0 and see if this is still a problem. If 14.0 solves the problem, I will let management know of the issue, and hopefully we can upgrade. If not, I'll be sure to send in the assembly code, makefile, etc to premier.

Steven_L_Intel1 · ‎03-03-2014

You can install locally, for yourself. It's one of the options during install. Don't even need root to do so.

cooper__db · ‎03-05-2014

After installing the trial version of the latest Fortran compiler, the FPE errors have disappeared. I will be doing some more testing over the next few days just to compare outputs from the application between the two compilers, but I think that this will fix the problem. Thanks for the help.

As an interesting side note, when I got into work today and compiled/ran the code, I didn't get any FPEs either! I'm not sure if this was some wonky change in my path when SSHing from home or if I may have pasted the wrong code. However, I was still able to replicate the problem when I reverted my application's code to something similar. Sorry for the added headache.