Changes to floating point operations since version 10.1.30

mostlyAtNight · ‎09-24-2009

Hello All,

Since moving from version 10.1.30 to version 11.* or the Intel visual Fortran compiler, I've noticed differences in the way my application is behaving with respect to floating point operations.

My application solves simultaneous equations using matrix techniques. The version of my program compiled with version 11 of the compiler is no longer able to find a solution for the same problem which the software can successfully solve when compiled with version 10.1.30

Have any changes been made to the way floating point procedures are handled in version 11 of the compiler?

I've tried playing with a few of the floating point options in the project properties but unfortunately this does not fix the problem.

Any help or suggestions would be much appreciated as at the moment, we are having to make a release of our software using Intel visual Fortran version 10.

Regards,

Pete

TimP · ‎09-24-2009

What was the default architecture (no SSE) for the 32-bit ia32 10.1 compiler is selected in 11.x by /arch:ia32
The new default /arch:SSE2 (same for ia32 and intel64) should work more reliably if you set /assume:protect_parens /Qprec-div /Qprec-sqrt or (more conservative) /fp:source. If you depended on implicit promotion of single precision expressions to double, you would need to write it in explicitly.
If I didn't make suitable guesses about what you are doing, please be more specific.

mostlyAtNight · ‎09-24-2009

Hi tim18,

Thanks for your fast response.

I'll try compiling again with those options and see if it improves things.

You make an interesting point about implicit promotion of single precision to double precision - I'm going to investigate this too.

Thanks again for your help.

Regards,

Pete

mostlyAtNight · ‎09-24-2009

Hi tim18,

I've just tried those options and /arch:ia32 does the trick.

In this case it seems that I am loosing precision due to the compiler taking advantage of SSE instructions.

Is this a general disadvantage of using SSE or could there be particular areas of my code that need adjusting to minimise the loss of precision?

PS. I also tried the other options you mentioned to improve the accuracy of the SSE instructions but unfortunately these were not able to solve my problem.

Kind regards,

Pete

TimP · ‎09-24-2009

Quoting - mostlyAtNight

Hi tim18,

I've just tried those options and /arch:ia32 does the trick.

In this case it seems that I am loosing precision due to the compiler taking advantage of SSE instructions.

Is this a general disadvantage of using SSE or could there be particular areas of my code that need adjusting to minimise the loss of precision?

PS. I also tried the other options you mentioned to improve the accuracy of the SSE instructions but unfortunately these were not able to solve my problem.

With /arch:ia32, single precision expressions are promoted to double, giving more accuracy. With optimization, the promotion may even persist across assignments. SSE code doesn't do that, except where you specify it explicitly, e.g. dble(A)*B+C. The old option /Op does some of this even with SSE, but it's slow. It's possible that changing critical code sections to declared double precision may be in order.
There are a few cases where specifying a more accurate order of expression evaluation may do away with a requirement for extra precision:
A*(B-C) rather than A*B - A*C (Fortran permits a compiler to do this automatically, but don't count on it)
(a+b)*(a-b) rather than a**2 - b**2
(A-B) + (C-D) (with assume:protect_parens) rather than A-B+C-D, if all variables have the same sign
Polynomial evaluation by Horner's rule, with a few minor enhancements:
X + X*X*(A2 + X*A3) (polynomial, A1==1, A0==0)
For a sum of several terms, promotion to double precision often is the best way.

mostlyAtNight · ‎09-25-2009

Hi tim18,

That explains things nicely - I think we'll be now looking to convert the majority of our code to double precision.

Thanks again for your help.

Regards,

Pete

Ilie__Daniel · ‎10-02-2009

Quoting - tim18

With /arch:ia32, single precision expressions are promoted to double, giving more accuracy. With optimization, the promotion may even persist across assignments. SSE code doesn't do that, except where you specify it explicitly, e.g. dble(A)*B+C. The old option /Op does some of this even with SSE, but it's slow. It's possible that changing critical code sections to declared double precision may be in order.
There are a few cases where specifying a more accurate order of expression evaluation may do away with a requirement for extra precision:
A*(B-C) rather than A*B - A*C (Fortran permits a compiler to do this automatically, but don't count on it)
(a+b)*(a-b) rather than a**2 - b**2
(A-B) + (C-D) (with assume:protect_parens) rather than A-B+C-D, if all variables have the same sign
Polynomial evaluation by Horner's rule, with a few minor enhancements:
X + X*X*(A2 + X*A3) (polynomial, A1==1, A0==0)
For a sum of several terms, promotion to double precision often is the best way.

Does this mean that if I redeclare all my real(4) variables as real(8), I could take advantage of the SSE2 instructions without the loss of precision?
Does /arch:ia32 promote expresions to double precision, even if they are, let's say, a simple sum of two integer(4) variables?
Does it make sense then, to redeclare all integer(4) to integer(8)?

Kind regards,
Daniel.

Steven_L_Intel1 · ‎10-02-2009

First of all, there's no effect on integer operations.

/arch:ia32 will cause some single-precision intermediate operations to be done in double precision, which can, as you find, give inconsistent results. Using SSE instructions minimizes those surprises, but you lose the "extra" precision you had with x87 instructions. If your program needs more precision than what you declared, then sure, use double precision instead.

TimP · ‎10-02-2009

Quoting - Steve Lionel (Intel)

First of all, there's no effect on integer operations.

64-bit integers could lose performance in a 32-bit build, without, as Steve said, any direct connection with the other issues you have raised. The only reason I have seen for use of 64-bit integers to go along with promotion of source code from single to double precision is in the case where COMMON alignments demand the same size integers and floats.