After thinking about my post

Lars-goran_Ofversted · ‎01-17-2017

I'm running the latest ifort 2017.1 and found this unwanted behavior. It's illustrated by a short program and the result after two different compilations.

> cat test_sum.f
      PROGRAM TEST_SUM
      INTEGER(8),PARAMETER :: N = 1024 * 1024 * 1024
      INTEGER(8) :: SEED(1) = (/12345678/)
      REAL(4) :: A(N)
      REAL(8) :: B(N)

!     Fill the array A with random numbers in the range [0 - 1].
      CALL RANDOM_SEED(PUT=SEED)
      CALL RANDOM_NUMBER( A )

! Copy the values to the double precision array B.
B(:) = A(:)

!     Expected sum = N * 0.5 = 536870912 = 5.3687091E+08
!     The sum may differ a bit due to the random numbers drawn.
      PRINT *, 'Sum(A):', REAL(SUM(A))
      PRINT *, 'Sum(B):', REAL(SUM(B))

END PROGRAM TEST_SUM
>
> ifort -i8 test_sum.f
> a.out
Sum(A): 5.3684355E+08
Sum(B): 5.3687296E+08
>
> ifort -i8 -fp-model precise test_sum.f
> a.out
Sum(A): 1.6777216E+07
Sum(B): 5.3687296E+08

So the real*8 array was summed up correctly, but not the real*4 array when compiled with the -fp-model precise option. The result was the same if the arrays A and B were allocated arrays.

jimdempseyatthecove · ‎01-18-2017

>>Expected sum = N * 0.5 = 536870912 = 5.3687091E+08

Worst case difference could be N lsb's as you accumulate each partial sum. Average may be N*0.5 lsb's

Float/Single has 7.22 digits of precision. 10^7.22 = 16595869 ~= 1.66E+7

When the partial sum of the float approaches and exceeds 1.66E7, the additional additions effectively add 0.0.
The summation of the float array is off by (5.3687091E+08 - 1.66E+7) ~= 5.2E+8. The actual difference will depend on when the partial sum approaches and exceeds 8.3E+6. The aove is thumbnail sketch, a mathematician can provide a proper statistical analysis.

The results appear realistic.

Jim Dempsey

jimdempseyatthecove · ‎01-18-2017

Try merging this into your program and report back the results:

REAL(4) :: T(N/2) ! N must be even (it is)
REAL(4) :: SUMA
INTEGER(8) :: I
...
T = A(1:N/2) + A(N/2+1:N)
I = N / 2
DO WHILE(I .gt. 1)
  T(1:I/2) = T(1:I/2) + T(I/2+1:I)
  I = I / 2
END DO
SUMA = T(1)

Jim Dempsey

TimP · ‎01-18-2017

In case someone wonders why -fp-precise might be responsible for such a difference, that option requires SUM to be implemented as a single chain of serial addition. Without that option, ifort can choose to optimize performance [ by splitting the sum into several partial sums (typically 8 individual 4- or 8- wide parallel sums, depending on target ISA), and this would delay the effect Jim described.

jimdempseyatthecove · ‎01-19-2017

After thinking about my post #3, it would have worked as intended if N were a power of 2. The following removes any restrictions on N (other than it be positive non-zero and fit in memory).

REAL(4) :: T(N/2+1) ! No restrictions on N
REAL(4) :: SUMA
INTEGER(8) :: I,J
...
J = N
I = J / 2
T = A(1:I) + A(I+1:I*2) ! In event J is odd, sum all but last
IF(I*2 .NE J) THEN ! (if N (now J) were odd)
  I = I+1
  T(I) = T(J) ! copy last to after partial sum
ENDIF 
DO WHILE(I .gt. 1)
  J = I
  I = J / 2
  T(1:I/2) = T(1:I/2) + T(I/2+1:I*2)
  IF(I*2 .NE J) THEN
    I = I+1
    T(I) = T(J)
  ENDIF 
END DO
SUMA = T(1)

Performing the same thing on the double arrays would yield a better result.

Jim Dempsey

Bug in intrinsic function sum() compiled with -fp-model precise