- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That FP arithmetic can be tricky in multi-threaded environment is well known. However, I am getting strange results depending on whether I use -O2 optimization or not, even though the code itself is serial. In particular, in the enclosed sample code, four complex arrays, z1, z2, z3, and z4, are initialized with random numbers so that their value ranges are pairwise at different scales. Then the operation z1*z2 + z3*z4 performed elementwise and stored in an array z. Finally sum(z) is computed. If I compile with -O2 flag, the result depends on whether the operands are in the above order, or like: z2*z1 + z4*z3 (controlled at runtime by a command line argument). If I compile with -g, or with -fp-model precise, the two results are identical. Can anyone explain why this could be happening?
Thanks,
George
Thanks,
George
Link Copied
- « Previous
-
- 1
- 2
- Next »
25 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
George,
When accumulating (adding or subtracting) FP numbers who's resultant value's mantissa (non-exponent part) requires more bits than the system's FP mantissa holds, then rounding occurs. If you change the order of the accumulations, then you also can change the points at which the rounding occurs. This can change the result (usually by a small amount).
While working the code (options or statements)such that the same sequence and same precisions are used, thiswill produce reproducibility in your runs, it will not necessarily assure the highest precision. Higher precision (of accumulations) will occure if you accumulate the smallest values first and the largest last.
i.e. adding a very small number to a very large number where difference in magnitudes exceed precision of mantissa, is equivilent to accumulating 0's. Whereas adding all the small numbers together, might produce a result, who's difference in magnitude from the larger numbers is less than the precision of the mantissa. Coding this way runs slower, since it may require a sort first.
Jim Dempsey
When accumulating (adding or subtracting) FP numbers who's resultant value's mantissa (non-exponent part) requires more bits than the system's FP mantissa holds, then rounding occurs. If you change the order of the accumulations, then you also can change the points at which the rounding occurs. This can change the result (usually by a small amount).
While working the code (options or statements)such that the same sequence and same precisions are used, thiswill produce reproducibility in your runs, it will not necessarily assure the highest precision. Higher precision (of accumulations) will occure if you accumulate the smallest values first and the largest last.
i.e. adding a very small number to a very large number where difference in magnitudes exceed precision of mantissa, is equivilent to accumulating 0's. Whereas adding all the small numbers together, might produce a result, who's difference in magnitude from the larger numbers is less than the precision of the mantissa. Coding this way runs slower, since it may require a sort first.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
George,
To illustrate the point:
The first float is the summation without regard to size. This is less than...
The second float is the summation by approximate order of size. This produces a slightly larger number that is closer to the true result.
The third number (double) is larger yet, and excepting for formatting/printing issues represents the correct (or near correct) result.
Jim Dempsey
To illustrate the point:
[bash]! rantest.f90 program rantest implicit none intrinsic random_number real, allocatable :: array(:) real :: sum1, sum2, clipLow, clipHigh real(8) :: dsum integer, parameter :: N = 100000 integer :: I allocate(array(N)) call random_number(array) sum1 = 0.0 dsum = 0.0_8 do I=1,N sum1 = sum1 + array(I) dsum = dsum + dble(array(I)) end do sum2 = 0.0 clipLow = 0.0 clipHigh = 1.0E-30 do while(clipLow .lt. 1.0) do I=1,N if((array(I) .le. clipHigh) .and. (array(I) .gt. clipLow)) sum2 = sum2 + array(I) end do clipLow = clipHigh clipHigh = clipHigh * 2.0 if(clipHigh .gt. 1.0) clipHigh = 1.0 end do write(*,100) sum1, sum2, dsum 100 format(x,Z8,x,Z8,x,Z16) write(*,200) sum1, sum2, dsum 200 format(x,E15.10,x,E15.10,x,D15.10) end program rantest Output: 4743AF59 4743AF77 40E875F115F2F480 .5009534766E+05 .5009546484E+05 .5009553393D+05 [/bash]The second line of output shows a difference illustrates how the accumulation of disparate numbers behave like truncation.
The first float is the summation without regard to size. This is less than...
The second float is the summation by approximate order of size. This produces a slightly larger number that is closer to the true result.
The third number (double) is larger yet, and excepting for formatting/printing issues represents the correct (or near correct) result.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Follow-up
I extended the above program to tally dsum1 and dsum2 along side the sum1 and sum2. The results indicating:
I extended the above program to tally dsum1 and dsum2 along side the sum1 and sum2. The results indicating:
4743AF59 4743AF77 40E875F115F2F480 40E875F115F2F480
.5009534766E+05 .5009546484E+05 .5009553393D+05 .5009553393D+05
Note the Z formatting of the two doubles (dsum1 and dsum2) are equal. Which should be the case (for this program) since array(N) was single precision. Conversion from single to double is exact. And for this program, the summation of the converted reals did not saturate the mantissa of the doubles (dsum1, dsum2).
George has to answer to himself: do you want consistent results or accurate results (or accurate consistent results)?
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The original examples don't raise the well-known issues with sum accumulation; as no "fused multiply add" was under discussion, the expectation of commutivity was reasonable, aside from possibilities related to inconsistent use of x87 extra precision.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>> the expectation of commutivity was reasonable
The expectations are reasonable, but the practicalities are it is not always possible. "fused multiply add" is not at issue here, but may be one of the players in the issue when used by the compiler. The root cause is the sequence in which the accumulations are performed using values who's summation(s) do not fit within the precision of the calculator (in this case FPU, FPU temp, or SSE register all differ). Additionaly, when the precision is insufficient, then roundoff rules apply and sequence of operations will affect outcomes.
This expectation is based on the requirement of evaluations made with exacting precision. Compiler optimizations, in particular common sub-expression elimination, combined with instruction/statement reordering to improve common sub-expression elimination, will (can) affect the sequence in which the accumulations occur. When these numbers cannot maintain exacting precision comutivity cannot be assured.
Jim Dempsey
The expectations are reasonable, but the practicalities are it is not always possible. "fused multiply add" is not at issue here, but may be one of the players in the issue when used by the compiler. The root cause is the sequence in which the accumulations are performed using values who's summation(s) do not fit within the precision of the calculator (in this case FPU, FPU temp, or SSE register all differ). Additionaly, when the precision is insufficient, then roundoff rules apply and sequence of operations will affect outcomes.
This expectation is based on the requirement of evaluations made with exacting precision. Compiler optimizations, in particular common sub-expression elimination, combined with instruction/statement reordering to improve common sub-expression elimination, will (can) affect the sequence in which the accumulations occur. When these numbers cannot maintain exacting precision comutivity cannot be assured.
Jim Dempsey

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
-
- 1
- 2
- Next »