Auto-parallelization BUG?

lombardig2 · ‎05-08-2003

The question is:
I tried a cycle with a simple operation like summing i with i=1:num.
If num is a big integer like 210.000.000 or more I get a wrong result with auto-parallelization option.
For parallel fortran compiling I used the /Qparallel switch in Fortran Build Tool. I use Visual Studio VC++ 6.
The code that I tested is more simple of the previous post:
----------------------------------------
program myprog
IMPLICIT NONE
DOUBLEPRECISION A
INTEGER I
CHARACTER (LEN=11) :: FORM1
A=0
DO I=1,2100000000
A=A+DBLE(I)
END DO
FORM1 = "( D22.16 )"
PRINT FORM1, A
END
-----------------------
Results that I got:
with parallel compiling: 0.2205000000082677D+19 (Wrong)
without: 0.2205000000067114D+19

Thank you,

Guido Lombardi

TimP · ‎05-08-2003

You must expect differences such as these when you add so many numbers, and change the order of addition. Bear in mind that 16-digit integers begin to exceed the number of bits retained in double precision. By parallelizing, you are splitting the job into parallel partial sums, which are added together at the end. The parallel version is likely to be closer to exact, unless you are using generic x87 code and have switched to 64-bit precision mode.

If you are running on a machine which supports SSE2, you will get different results (with normal optimization) according to whether you choose SSE2 or generic code, as SSE2 vectorization splits each threaded sum into 4 partial sums.

lombardig2 · ‎05-09-2003

Your explanataion seems to be true because if I try with a cycle of 210000000 (1/10 respect to the previous test) I get:
exact: 0.2205000010500000D+17
not parallel: 0.2205000006710887D+17
parallel: 0.2205000010346576D+17
But WHY?
The integer type is a 32 bits type and a double precision type is a 64 bits where Exponent length (11 bits) and Mantissa length (52 bits).
Now, I shouldn't get overflow of the variables, because I use 1/10 of the maximum representable integer and in the double type I have more bits to represent the number.

Thank you, Guido

Jugoslav_Dujic · ‎05-09-2003

Yeah, but 210,000,000 has 28 valid bits, thus the sum has approx. n*n/2, i.e. about 55 bits, which cannot fit into the 52-bit mantissa without loss of precision.

Jugoslav