Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
26749 Discussions

Bug in intrinsic function sum() compiled with -fp-model precise

Lars-goran_Ofversted
112 Views

I'm running the latest ifort 2017.1 and found this unwanted behavior. It's illustrated by a short program and the result after two different compilations.

> cat test_sum.f
      PROGRAM TEST_SUM
      INTEGER(8),PARAMETER :: N = 1024 * 1024 * 1024
      INTEGER(8) :: SEED(1) = (/12345678/)
      REAL(4) :: A(N)
      REAL(8) :: B(N)

!     Fill the array A with random numbers in the range [0 - 1].
      CALL RANDOM_SEED(PUT=SEED)
      CALL RANDOM_NUMBER( A )

!     Copy the values to the double precision array B.
      B(:) = A(:)

!     Expected sum = N * 0.5 = 536870912 = 5.3687091E+08
!     The sum may differ a bit due to the random numbers drawn.
      PRINT *, 'Sum(A):', REAL(SUM(A))
      PRINT *, 'Sum(B):', REAL(SUM(B))

      END PROGRAM TEST_SUM
>
> ifort -i8 test_sum.f
> a.out
 Sum(A):  5.3684355E+08
 Sum(B):  5.3687296E+08
>
> ifort -i8 -fp-model precise test_sum.f
> a.out
 Sum(A):  1.6777216E+07
 Sum(B):  5.3687296E+08

So the real*8 array was summed up correctly, but not the real*4 array when compiled with the -fp-model precise option. The result was the same if the arrays A and B were allocated arrays.

0 Kudos
4 Replies
jimdempseyatthecove
Black Belt
112 Views

>>Expected sum = N * 0.5 = 536870912 = 5.3687091E+08

Worst case difference could be N lsb's as you accumulate each partial sum. Average may be N*0.5 lsb's

Float/Single has 7.22 digits of precision. 10^7.22 = 16595869 ~= 1.66E+7

When the partial sum of the float approaches and exceeds 1.66E7, the additional additions effectively add 0.0.
The summation of the float array is off by (5.3687091E+08 - 1.66E+7) ~= 5.2E+8. The actual difference will depend on when the partial sum approaches and exceeds 8.3E+6. The aove is thumbnail sketch, a mathematician can provide a proper statistical analysis.

The results appear realistic.

Jim Dempsey

jimdempseyatthecove
Black Belt
112 Views

Try merging this into your program and report back the results:

REAL(4) :: T(N/2) ! N must be even (it is)
REAL(4) :: SUMA
INTEGER(8) :: I
...
T = A(1:N/2) + A(N/2+1:N)
I = N / 2
DO WHILE(I .gt. 1)
  T(1:I/2) = T(1:I/2) + T(I/2+1:I)
  I = I / 2
END DO
SUMA = T(1)

Jim Dempsey

TimP
Black Belt
112 Views

In case someone wonders why -fp-precise might be responsible for such a difference, that option requires SUM to be implemented as a single chain of serial addition.  Without that option, ifort can choose to optimize performance [ by splitting the sum into several partial sums (typically 8 individual 4- or 8- wide parallel sums, depending on target ISA), and this would delay the effect Jim described.

jimdempseyatthecove
Black Belt
112 Views

After thinking about my post #3, it would have worked as intended if N were a power of 2. The following removes any restrictions on N (other than it be positive non-zero and fit in memory).

REAL(4) :: T(N/2+1) ! No restrictions on N
REAL(4) :: SUMA
INTEGER(8) :: I,J
...
J = N
I = J / 2
T = A(1:I) + A(I+1:I*2) ! In event J is odd, sum all but last
IF(I*2 .NE J) THEN ! (if N (now J) were odd)
  I = I+1
  T(I) = T(J) ! copy last to after partial sum
ENDIF 
DO WHILE(I .gt. 1)
  J = I
  I = J / 2
  T(1:I/2) = T(1:I/2) + T(I/2+1:I*2)
  IF(I*2 .NE J) THEN
    I = I+1
    T(I) = T(J)
  ENDIF 
END DO
SUMA = T(1)

Performing the same thing on the double arrays would yield a better result.

Jim Dempsey

Reply