Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
29298 Discussions

P-4 precision problem with version 8 Intel compilers on LINUX

Deleted_U_Intel
Employee
567 Views
Hi,

I have run into precision problems with the new INTEL compilers for
LINUX, both for FORTRAN (ifort) and C (icc). I am running Redhat 8
on a Dell Pentium IV, and am using the INTEL compilers version 8.

It seems that using the processor-optimized compilation flags
(and thus activating the "vectorizer") affects the outcome
of computations. I have looked around for documentation on this
behavior and couldn't find anything. Any help and suggestions where to look
or how to solve this problem are appreciated.

Here's an example program (analog issues arise with C code):

program main
double precision a
integer i,j
a=1d-14
do i=0,100000
do j=0,100000
a = a * 1.00000001d0
enddo
enddo
print *,a
end

(Don't ask why one would run such code.) Here's what happens
with target architecture flags set:

> ifort -tpp7 -xW -O3 tmp.f ; time a.out
tmp.f(6) : (col. 6) remark: LOOP WAS VECTORIZED.
1.332615264577497E-008
1.845u 0.007s 0:01.85 99.4% 0+0k 0+0io 171pf+0w


Here's the output without the P-IV flags:

> ifort -O3 tmp.f ; time a.out
2.693495799080514E+029
30.611u 0.060s 0:30.87 99.3% 0+0k 0+0io 172pf+0w

I.e., it took much longer to run the test program, and the answers are
different. The second output appears to be 'correct', as judged from
comparing the run with a gcc compiled program.

I have experimented with some of the flags like -mp for maintaining
precision, same outcome. Apparently only removing the -tpp7 and -xW flags helps
to make results consistent. Obviously, it would be nice to make use of the
speed improvement without loosing precision.

Thanks for your help in advance


Thorsten
0 Kudos
3 Replies
Steven_L_Intel1
Employee
567 Views
Ah, this is fun. When you don't vectorize, the computations are done in the standard x87 registers which are extended precision and range. This means that additional low order bits are carried around in the computation of "a" and this affects the result.
When you compile with -xW, the vectorization changes to use the SSE2 instructions and registers, which are NOT extended precision and range. Thus the computations are rounded to standard double precision and the extra low fraction bits are not carried around.
gcc does not vectorize, so it will use the x87 method. Also, if you ran this on a non-x86 processor (such as a Sun SPARC), you'd see the same result as the Intel compiler gets with SSE2.
The -xW results are more consistent (and faster), but you do lose the extra intermediate precision that can be visible in strange tests such as this one.
0 Kudos
Steven_L_Intel1
Employee
567 Views
It was pointed out to me that there may be another problem at work here, unrelated to precision. I'll play with this some more if I get the time.
0 Kudos
Steven_L_Intel1
Employee
567 Views
There seems to be an actual bug at work here. I see Tim Prince responded to your post in comp.lang.fortran about it, and I think he identified the problem, in that the compiler is combining the loops and creating one monster loop whose iteration count exceeds a 32-bit integer. This is an unsafe optimization.
0 Kudos
Reply