Solved: SSE2 vs. IA32

Ilie__Daniel · ‎12-15-2011

Hello!

I am planning to change the architecture option from /arch:IA32 to /arch:SSE2, in order to enable vectorisation. I am aware that this comes at a numerical cost, i.e. the results will not look the same. This is because the x87 instructions store the intermediate results in 80bit significand precision.

I noticed that there is an option to specify the significand precision /Qpcn. If I specify /arch:SSE2 and /Qpc80, will this improve the accuracy of results, without disabling any optimisations?

Is there anything else I could set, so that the accuracy of results will remain high?

Kind regards,
Daniel.

TimP · ‎12-15-2011

/Qpc80 affects only x87 execution. It extends the extra precision boost to double precision data types.
For full accuracy of SSE code, I use /assume:protect_parens /Qprec-div /Qprec-sqrt. /Qftz- also will improve accuracy for tiny operands, and should have little performance impact on the Sandy Bridge CPU generation.
All of those options are included in /fp:source; all but /Qftz apply also to IA32 code but become more important with SSE.
If you have expression evaluations which depend on promotion to double precision, they should be written explicitly in the source code so as to control variations from one compiler and architecture to another.

View solution in original post

TimP · ‎12-15-2011

/Qpc80 affects only x87 execution. It extends the extra precision boost to double precision data types.
For full accuracy of SSE code, I use /assume:protect_parens /Qprec-div /Qprec-sqrt. /Qftz- also will improve accuracy for tiny operands, and should have little performance impact on the Sandy Bridge CPU generation.
All of those options are included in /fp:source; all but /Qftz apply also to IA32 code but become more important with SSE.
If you have expression evaluations which depend on promotion to double precision, they should be written explicitly in the source code so as to control variations from one compiler and architecture to another.

Ilie__Daniel · ‎12-15-2011

Thank you for your answer.

What do you mean by:
"expression evaluations which depend on promotion to double precision"?

Some thing like this
real(8) :: x
real(4) :: y,z

Instead of
x = y + z ! This can produce different results when migrated.
I should have
x = dble(y) + dble(z) ! This is consistent all the time
?

TimP · ‎12-15-2011

You would require 3 or more operands in an expression (taking into account that optimization may extend across assignments) before promotion to double could make a difference. In cases such as
v = (w-x) + (y-z)
promotion e.g.
v = (w-dble(x)) + (y-dble(z))
could avoid numerical problems. If your compiler ignores parentheses (as ifort does by default) you need to promote at least 3 of the 4 operands explicitly.
Usually, when you see code written carefully with parentheses, the author expects the specified order of operations to give good results without requiring extra precision, provided that the compiler heeds the parentheses. This is a sign that you should set -assume:protect_parens or some option which implies it e.g. /fp:source or /standard-semantics
The implication in my example is that the differences w-x and y-z are expected to be small, such that the expressed order of operation will be accurate, but other orders of evaluation are likely to be inaccurate.

The rules of Fortran allow a compiler to make certain transformations in expression evaluation, such as
a*x -y*a => a*(x-y)
or
x/y/z => x/(y*z)
(but not in reverse), where promotion to double may make a significant difference. If you write code which is eligible for such optimization, you have no guarantee which way it will go.

The suspicion remains that the default treatment of parentheses by Intel compilers (like "traditional" K&R C) is a holdover from the x87 behavior (with /Qpc80 or equivalent). Current gcc/gfortran require a specific option to be set if one wishes to ignore the language rules about parentheses.
Unfortunately, when you set ifort -assume:protect_parens, you forbid useful "legal" transformations such as
x/2 => x*.5
but that is off the topic you raised, as it doesn't change numerical results.

Ilie__Daniel · ‎12-15-2011

Thank very much Tim!
Excellent example.