unexpected SUM() output

Roman1 · ‎05-12-2014

I am getting unexpected behavior in the following simple test program. I am expecting the output values of n and s to be identical. When the program is compiled in Release mode, they are identical:
n = 20000000, s = 20000000

However, when the program is compiled in Debug mode, they are different:
n = 20000000, s = 16777216

From what I have read, integers greater than 2**24=16777216, cannot be expressed accurately with single precision variables. Is this what is happening in the code? It is interesting that the Release code can still calculate the correct value.

program test_sum
implicit none

real,allocatable:: a(:)
integer n, s

n = 20000000
allocate( a(n) )
a = 1.0
s = nint(SUM(a))  ! Expect s == n

write(*,*) 'n =', n
write(*,*) 's =', s

stop
end program test_sum

Steven_L_Intel1 · ‎05-12-2014

In release mode, with optimization, the SUM operation is vectorized and it computes several smaller sums which it then adds together and you get lucky. As you note, single precision real loses precision after 2**24, so adding 1 has no effect.

TimP · ‎05-12-2014

I expect that the last time adding 1 has an effect, it will add 2 due to the round-to-even effect, then it will stick. As Steve said, under vectorized sum reduction, there will be multiple partial sums, probably 8 of them for SSE, so the ceiling would be correspondingly larger, but looking for a result which is a multiple of 8 improves your chances. This vectorization usually improves accuracy, but it's difficult to predict by how much. It might vary with data alignment, but the allocatable should be 16-byte aligned, and you have available options including -align:array32byte if you want more.

Roman1 · ‎05-12-2014

Thanks for the replies. It all makes sense now. I did not know that optimized SUM used vectorization.

Roman

John_Campbell · ‎05-12-2014

If the accumulator for SUM stays as a register then there could be retained precision.
As 1.0, which is 1.0_4, has the same effective value (bit pattern) as 1.0_8 and 1.0_16, this will also help.
From Tim's answer, OpenMP should also improve the accuracy as this will accumulate the sum of each process, at the end.
For the alternative of having variable values for A, this would result in increased round-off error accumilation, for the same size n.

If you know the problem, then use real*8,allocatable :: a(:)

John

jimdempseyatthecove · ‎05-13-2014

>>If the accumulator for SUM stays as a register then there could be retained precision.

Not true when registers are SSE/AVX. True if using the x87-style scalar FPU

>>As 1.0, which is 1.0_4, has the same effective value (bit pattern) as 1.0_8 and 1.0_16, this will also help.

It has expressively the same bit expression: zero sign, shift of zero exponent, implicit msb of 1, remainder of bits of mantissa of 0's

This observation is a good learning example to be kept in mind when crafting your algorithms. On way of correction is to use larger accumulators (REAL(8), ...) but this may (or may not) have undesirable effects or requirements (longer computation time and/or more storage). What you can do is look at how the results accumulate and modify how you accumulate the result.

In the case of SUM of array of similar sized numbers you could use something like this

module modSUM
    integer, parameter :: CutOff = 10000
contains
    real recursive function fnSUM(A, N) result(RET)
        implicit none
        integer :: N
        real :: A(N)
        integer :: I
        if(N .GT. CutOff) then
            RET = fnSUM(A(1:N/2),N/2) + fnSUM(A(N/2+1:N), N - N/2)
        else
            RET = A(1)
            do I=2,N
                RET = RET + A(I)
            enddo
        endif
    end function fnSUM
end module modSUM

program test_sum
    use modSUM
    implicit none

    real,allocatable:: a(:)
    integer n, s

    n = 20000000
    allocate( a(n) )
    a = 1.0
    s = nint(SUM(a))  ! Expect s == n

    write(*,*) 'n =', n
    write(*,*) 's =', s
    s = nint(fnSUM(a,n))  ! Expect s == n
    write(*,*) 's =', s

    stop
end program test_sum


 n =    20000000
 s =    16777216
 s =    20000000
Press any key to continue . . .

Jim Dempsey