Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
29297 Discussions

Ifort bug with O2 + ftrapuv + where

davidromps
Beginner
881 Views
Hello,

I have encountered a bug in ifort 10.0.5 and 11.0.081. The following code, when compiled with "-O2 -ftrapuv" produces a "forrtl: error (65): floating invalid" during runtime. This occurs only when -O2 and -ftrapuv are used in combination. It also only occurs for a sufficiently large array size. On my machine, Linux2.6.18-92.1.17.el5, this occurs for an array of size 8 or larger. For 7 or smaller the program executes as intended.

Thanks,
David
[cpp]program test

implicit none

real, dimension(8) :: x

x = 0.

where (x .gt. 0.)
   x = 1. / x
end where

end program test

[/cpp]
0 Kudos
5 Replies
TimP
Honored Contributor III
881 Views
Quoting - davidromps
"-O2 -ftrapuv" produces a "forrtl: error (65): floating invalid" during runtime. This occurs for an array of size 8 or larger.
program test
implicit none

real, dimension(8) :: x

x = 0.

where (x .gt. 0.)
x = 1. / x
end where

end program test
I see that -prec-div- must be set to make this happen. I have -prec-div set in ifort.cfg, so I can't reproduce yourissue unless I over-ride that setting.
As both array assignments are vectorized, the vectorized code would be executed when the loop length is 8 or more. When -prec-div- is set, the vectorized code will consist of a Newton-Raphson iteration sequence which has limited range of validity and frequently will be less accurate in the low order bit.
This Newton-Raphson scheme is advertised as having higher throughput than the IEEE accurate sequence you get with -prec-div. Recent Intel CPUs (beginning with those which support SSE4.1) have excellent performance in the IEEE accurate division, both scalar and vector versions, so you can see why I prefer to set -prec-div.
In the 11.0 compilers, it should bepossible to enable math function vectorization (-fast-transcendentals)separately from other options, so itbecomes feasible to set -fp-model source (implying -prec-div -prec-sqrt -ftz-) more often than it was in earlier versions.
0 Kudos
davidromps
Beginner
881 Views
Quoting - tim18
I see that -prec-div- must be set to make this happen. I have -prec-div set in ifort.cfg, so I can't reproduce yourissue unless I over-ride that setting.
As both array assignments are vectorized, the vectorized code would be executed when the loop length is 8 or more. When -prec-div- is set, the vectorized code will consist of a Newton-Raphson iteration sequence which has limited range of validity and frequently will be less accurate in the low order bit.
This Newton-Raphson scheme is advertised as having higher throughput than the IEEE accurate sequence you get with -prec-div. Recent Intel CPUs (beginning with those which support SSE4.1) have excellent performance in the IEEE accurate division, both scalar and vector versions, so you can see why I prefer to set -prec-div.
In the 11.0 compilers, it should bepossible to enable math function vectorization (-fast-transcendentals)separately from other options, so itbecomes feasible to set -fp-model source (implying -prec-div -prec-sqrt -ftz-) more often than it was in earlier versions.

Hi, Tim,

I think the issue here is that 1/x is being evaluated at all. I don't know much about compiler vectorization, but I would surprised to learn that 1/x is evaluated first and then assigned to x only where the conditional is true. If this were true and were the reason for the "invalid float", then why does the same program without the "where" and "end where" lines execute without complaint?

Thanks,
David
0 Kudos
TimP
Honored Contributor III
881 Views
Quoting - davidromps

Hi, Tim,

I think the issue here is that 1/x is being evaluated at all. I don't know much about compiler vectorization, but I would surprised to learn that 1/x is evaluated first and then assigned to x only where the conditional is true. If this were true and were the reason for the "invalid float", then why does the same program without the "where" and "end where" lines execute without complaint?

Thanks,
David
I don't see this as surprising, when you invoke auto-vectorization. You ask for the compiler to evaluate multiple iterations in parallel; although you made an artificial case where the compiler could have recognized that the conditional is loop independent and so skipped the entire loop, you didn't write it that way.
0 Kudos
Steven_L_Intel1
Employee
881 Views
Quoting - tim18
I don't see this as surprising, when you invoke auto-vectorization. You ask for the compiler to evaluate multiple iterations in parallel; although you made an artificial case where the compiler could have recognized that the conditional is loop independent and so skipped the entire loop, you didn't write it that way.
There's no loop in this program. WHERE is not a loop construct.
0 Kudos
Hideki_I_Intel
Employee
881 Views
There's no loop in this program. WHERE is not a loop construct.

FORTRAN frontend processes WHERE statement and creates a loop or (a loop nest for multi dim case) as needed.
For the optimizers, there is a loop (unless completely unrolled during an optimization phase).

SSE to SSE4.2(and AVX) vector instruction setsare not operation masked. The default FP model setting is "fast" and it allows the vectorizer to blindly issue an unguarded packeddivide even ifthe original divideis under a conditional. Whether "-O2 -ftrapuv" should still be "fast" fp model is debatable, however.

The difference between 7-iter and 8-iter casesis just a side effect of how the different optimization happens to those specific constant trip counts and therefore not essential to the particular issue. However, I understand it can be a source of frustration.

"-fp-speculation off"would beappropriate for this case.
0 Kudos
Reply