Re: Ifort bug with O2 + ftrapuv + where

davidromps · ‎03-09-2009

Hello,

I have encountered a bug in ifort 10.0.5 and 11.0.081. The following code, when compiled with "-O2 -ftrapuv" produces a "forrtl: error (65): floating invalid" during runtime. This occurs only when -O2 and -ftrapuv are used in combination. It also only occurs for a sufficiently large array size. On my machine, Linux2.6.18-92.1.17.el5, this occurs for an array of size 8 or larger. For 7 or smaller the program executes as intended.

Thanks,

David

[cpp]program test

implicit none

real, dimension(8) :: x

x = 0.

where (x .gt. 0.)
   x = 1. / x
end where

end program test

[/cpp]

TimP · ‎03-09-2009

Quoting - davidromps

"-O2 -ftrapuv" produces a "forrtl: error (65): floating invalid" during runtime. This occurs for an array of size 8 or larger.
program test
implicit none

real, dimension(8) :: x

x = 0.

where (x .gt. 0.)
x = 1. / x
end where

end program test

I see that -prec-div- must be set to make this happen. I have -prec-div set in ifort.cfg, so I can't reproduce yourissue unless I over-ride that setting.
As both array assignments are vectorized, the vectorized code would be executed when the loop length is 8 or more. When -prec-div- is set, the vectorized code will consist of a Newton-Raphson iteration sequence which has limited range of validity and frequently will be less accurate in the low order bit.
This Newton-Raphson scheme is advertised as having higher throughput than the IEEE accurate sequence you get with -prec-div. Recent Intel CPUs (beginning with those which support SSE4.1) have excellent performance in the IEEE accurate division, both scalar and vector versions, so you can see why I prefer to set -prec-div.
In the 11.0 compilers, it should bepossible to enable math function vectorization (-fast-transcendentals)separately from other options, so itbecomes feasible to set -fp-model source (implying -prec-div -prec-sqrt -ftz-) more often than it was in earlier versions.

davidromps · ‎03-11-2009

Quoting - tim18

I see that -prec-div- must be set to make this happen. I have -prec-div set in ifort.cfg, so I can't reproduce yourissue unless I over-ride that setting.
As both array assignments are vectorized, the vectorized code would be executed when the loop length is 8 or more. When -prec-div- is set, the vectorized code will consist of a Newton-Raphson iteration sequence which has limited range of validity and frequently will be less accurate in the low order bit.
This Newton-Raphson scheme is advertised as having higher throughput than the IEEE accurate sequence you get with -prec-div. Recent Intel CPUs (beginning with those which support SSE4.1) have excellent performance in the IEEE accurate division, both scalar and vector versions, so you can see why I prefer to set -prec-div.
In the 11.0 compilers, it should bepossible to enable math function vectorization (-fast-transcendentals)separately from other options, so itbecomes feasible to set -fp-model source (implying -prec-div -prec-sqrt -ftz-) more often than it was in earlier versions.

Hi, Tim,

I think the issue here is that 1/x is being evaluated at all. I don't know much about compiler vectorization, but I would surprised to learn that 1/x is evaluated first and then assigned to x only where the conditional is true. If this were true and were the reason for the "invalid float", then why does the same program without the "where" and "end where" lines execute without complaint?

Thanks,

David

TimP · ‎03-11-2009

Quoting - davidromps

Hi, Tim,

I think the issue here is that 1/x is being evaluated at all. I don't know much about compiler vectorization, but I would surprised to learn that 1/x is evaluated first and then assigned to x only where the conditional is true. If this were true and were the reason for the "invalid float", then why does the same program without the "where" and "end where" lines execute without complaint?

Thanks,

David

I don't see this as surprising, when you invoke auto-vectorization. You ask for the compiler to evaluate multiple iterations in parallel; although you made an artificial case where the compiler could have recognized that the conditional is loop independent and so skipped the entire loop, you didn't write it that way.

Steven_L_Intel1 · ‎03-11-2009

Quoting - tim18

I don't see this as surprising, when you invoke auto-vectorization. You ask for the compiler to evaluate multiple iterations in parallel; although you made an artificial case where the compiler could have recognized that the conditional is loop independent and so skipped the entire loop, you didn't write it that way.

There's no loop in this program. WHERE is not a loop construct.

Hideki_I_Intel · ‎09-02-2009

Quoting - Steve Lionel (Intel)

There's no loop in this program. WHERE is not a loop construct.

FORTRAN frontend processes WHERE statement and creates a loop or (a loop nest for multi dim case) as needed.
For the optimizers, there is a loop (unless completely unrolled during an optimization phase).

SSE to SSE4.2(and AVX) vector instruction setsare not operation masked. The default FP model setting is "fast" and it allows the vectorizer to blindly issue an unguarded packeddivide even ifthe original divideis under a conditional. Whether "-O2 -ftrapuv" should still be "fast" fp model is debatable, however.

The difference between 7-iter and 8-iter casesis just a side effect of how the different optimization happens to those specific constant trip counts and therefore not essential to the particular issue. However, I understand it can be a source of frustration.

"-fp-speculation off"would beappropriate for this case.