- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have encountered a bug in ifort 10.0.5 and 11.0.081. The following code, when compiled with "-O2 -ftrapuv" produces a "forrtl: error (65): floating invalid" during runtime. This occurs only when -O2 and -ftrapuv are used in combination. It also only occurs for a sufficiently large array size. On my machine, Linux2.6.18-92.1.17.el5, this occurs for an array of size 8 or larger. For 7 or smaller the program executes as intended.
Thanks,
David
[cpp]program test implicit none real, dimension(8) :: x x = 0. where (x .gt. 0.) x = 1. / x end where end program test [/cpp]
Link Copied
5 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - davidromps
"-O2 -ftrapuv" produces a "forrtl: error (65): floating invalid" during runtime. This occurs for an array of size 8 or larger.
program test
implicit none
real, dimension(8) :: x
x = 0.
where (x .gt. 0.)
x = 1. / x
end where
end program test
program test
implicit none
real, dimension(8) :: x
x = 0.
where (x .gt. 0.)
x = 1. / x
end where
end program test
As both array assignments are vectorized, the vectorized code would be executed when the loop length is 8 or more. When -prec-div- is set, the vectorized code will consist of a Newton-Raphson iteration sequence which has limited range of validity and frequently will be less accurate in the low order bit.
This Newton-Raphson scheme is advertised as having higher throughput than the IEEE accurate sequence you get with -prec-div. Recent Intel CPUs (beginning with those which support SSE4.1) have excellent performance in the IEEE accurate division, both scalar and vector versions, so you can see why I prefer to set -prec-div.
In the 11.0 compilers, it should bepossible to enable math function vectorization (-fast-transcendentals)separately from other options, so itbecomes feasible to set -fp-model source (implying -prec-div -prec-sqrt -ftz-) more often than it was in earlier versions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - tim18
I see that -prec-div- must be set to make this happen. I have -prec-div set in ifort.cfg, so I can't reproduce yourissue unless I over-ride that setting.
As both array assignments are vectorized, the vectorized code would be executed when the loop length is 8 or more. When -prec-div- is set, the vectorized code will consist of a Newton-Raphson iteration sequence which has limited range of validity and frequently will be less accurate in the low order bit.
This Newton-Raphson scheme is advertised as having higher throughput than the IEEE accurate sequence you get with -prec-div. Recent Intel CPUs (beginning with those which support SSE4.1) have excellent performance in the IEEE accurate division, both scalar and vector versions, so you can see why I prefer to set -prec-div.
In the 11.0 compilers, it should bepossible to enable math function vectorization (-fast-transcendentals)separately from other options, so itbecomes feasible to set -fp-model source (implying -prec-div -prec-sqrt -ftz-) more often than it was in earlier versions.
As both array assignments are vectorized, the vectorized code would be executed when the loop length is 8 or more. When -prec-div- is set, the vectorized code will consist of a Newton-Raphson iteration sequence which has limited range of validity and frequently will be less accurate in the low order bit.
This Newton-Raphson scheme is advertised as having higher throughput than the IEEE accurate sequence you get with -prec-div. Recent Intel CPUs (beginning with those which support SSE4.1) have excellent performance in the IEEE accurate division, both scalar and vector versions, so you can see why I prefer to set -prec-div.
In the 11.0 compilers, it should bepossible to enable math function vectorization (-fast-transcendentals)separately from other options, so itbecomes feasible to set -fp-model source (implying -prec-div -prec-sqrt -ftz-) more often than it was in earlier versions.
Hi, Tim,
I think the issue here is that 1/x is being evaluated at all. I don't know much about compiler vectorization, but I would surprised to learn that 1/x is evaluated first and then assigned to x only where the conditional is true. If this were true and were the reason for the "invalid float", then why does the same program without the "where" and "end where" lines execute without complaint?
Thanks,
David
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - davidromps
Hi, Tim,
I think the issue here is that 1/x is being evaluated at all. I don't know much about compiler vectorization, but I would surprised to learn that 1/x is evaluated first and then assigned to x only where the conditional is true. If this were true and were the reason for the "invalid float", then why does the same program without the "where" and "end where" lines execute without complaint?
Thanks,
David
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - tim18
I don't see this as surprising, when you invoke auto-vectorization. You ask for the compiler to evaluate multiple iterations in parallel; although you made an artificial case where the compiler could have recognized that the conditional is loop independent and so skipped the entire loop, you didn't write it that way.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Steve Lionel (Intel)
There's no loop in this program. WHERE is not a loop construct.
FORTRAN frontend processes WHERE statement and creates a loop or (a loop nest for multi dim case) as needed.
For the optimizers, there is a loop (unless completely unrolled during an optimization phase).
SSE to SSE4.2(and AVX) vector instruction setsare not operation masked. The default FP model setting is "fast" and it allows the vectorizer to blindly issue an unguarded packeddivide even ifthe original divideis under a conditional. Whether "-O2 -ftrapuv" should still be "fast" fp model is debatable, however.
The difference between 7-iter and 8-iter casesis just a side effect of how the different optimization happens to those specific constant trip counts and therefore not essential to the particular issue. However, I understand it can be a source of frustration.
"-fp-speculation off"would beappropriate for this case.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page