Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

strange bug

kissin__yoel
Beginner
1,126 Views

 Hi

I am encountering a strange bug that gives me a slightly wrong/different output for a matrix transformation which I have simplified below. The difference is not present if I run with -debug, or, strangely, if I uncomment a print statement in the loop. Or even if I allocate the mat_out array with the number 67 without using the variable dim. It's clear that there are no out of bounds references or other errors in this simple code. The difference is very slight, but it is nonetheless annoying/worrying me.

This bug does not happen if I use gfortran compiler.


 

 program sample
implicit none
 integer :: dim, i, j, k
 real*8:: b
 real*8 ,allocatable:: mat_a(:,:), mat_b(:,:), mat_out(:,:)
 dim=67
allocate (mat_out(dim,dim),mat_a(dim,dim),mat_b(dim,dim))
read (100) mat_a,mat_b
j=66
  do i=65,64,-1
      b=0. 
 do k=1, i 
   b=b+mat_a(i,k)*mat_b(k,j)
  enddo
mat_out(i,j)=b
!print*, mat_out(i,j)
  enddo
 print'(1d25.16)',mat_out(64,66)

 end program sample

 

0 Kudos
25 Replies
Arjen_Markus
Honored Contributor I
941 Views

You do not describe the nature of the bug - is it in the outcome? A run-time error? Note that the two matrices are read from an external file that you did not include, so we can not easily rerun your program. Does the bug, whatever its nature, occur with specific data?

0 Kudos
kissin__yoel
Beginner
941 Views

Hi

Thanks for responding!

The problem is that the matrix output changes slightly -- an example of this change is printed in the code. (This change does not happen if dim is fixed as a parameter or if a print statement is inserted as mentioned previously.)

I've attached the input file with the required matrices (had to add .txt on the end to upload).

Thanks for your help!

0 Kudos
mecej4
Honored Contributor III
941 Views

Here are the results from various compilers on Windows 10:

Gfortran 7.2, Cygwin-64, -O2, NAG 6.2, 64-bit, Lahey 7.1 (32-bit), Ifort 19U1 /Od : 0.2808552152170796D-02

Silverfrost 8.3, 64 bit, /opt : 0.2808552152170795D-02

Ifort 16U8, 64 bit, /O2 /QxSSE3 : 0.2808552152170796D-02

Ifort 19U1, 64 bit, /O2 /QxSSE2 : 0.2808552152171018D-02

Ifort 19U1, 64 bit, /O2 /QxSSE3 : 0.2808552152170796D-02

Ifort 16U8, 64 bit, /O2 /QxAVX, Ifort 19U1, 64 bit, /O2   : 0.2808552152171018D-02

I agree that the differences are worth reconciling. With Ifort 16U8, going from QxSSE3 to QxAVX the change is striking.

With Ifort 19U1, 64 bit /O2, /QxSSE3 seems to do fine, but /QxSSE2 and /QxAVX not so.

All this was on an i7-2720QM.

0 Kudos
kissin__yoel
Beginner
941 Views

Thanks for that confirmation!

Any insight into what might be causing this? The code seems very straightforward so there appears to be something strange going on..

0 Kudos
jimdempseyatthecove
Honored Contributor III
941 Views

This is a case of not fully comprehending computational floating point arithmetic as it pertains to a sum of products.

In debug build, Fused Multiply and Add would not be used. On a system with FMA, the sum can attain additional precision (small that it be). Examine the following changes to your program and the output:

program strange_bug
    implicit none
    integer :: dim, i, j, k
    real*8:: b, b_before
    real*8 ,allocatable:: mat_a(:,:), mat_b(:,:), mat_out(:,:)
    character(len=2) :: tag
    dim=67
    allocate (mat_out(dim,dim),mat_a(dim,dim),mat_b(dim,dim))
    read (100) mat_a,mat_b
    j=66
    do i=65,64,-1
        b=0. 
        do k=1, i
            if(mat_a(i,k)*mat_b(k,j) /= 0.0_8) then
                b_before = b
                b=b+mat_a(i,k)*mat_b(k,j)
                if(b==b_before) then
                    tag = 'nc'
                else
                    tag = ' '
                endif
                print '(d25.16,"+",d25.16,"=",d25.16,A2)', b_before,mat_a(i,k)*mat_b(k,j),b, tag
            endif
        enddo
        mat_out(i,j)=b
        !print*, mat_out(i,j)
    enddo
    print *
    print'(1d25.16)',mat_out(64,66)
end program strange_bug

  0.0000000000000000D+00+  -0.9040747749095477D-12=  -0.9040747749095477D-12
 -0.9040747749095477D-12+   0.1629814680063232D-06=   0.1629805639315483D-06
  0.1629805639315483D-06+  -0.2168059516891366D-03=  -0.2166429711252051D-03
 -0.2166429711252051D-03+   0.8038972926188621D-02=   0.7822329955063415D-02
  0.7822329955063415D-02+  -0.1153842811949880D+00=  -0.1075619512399246D+00
 -0.1075619512399246D+00+   0.7828215526216622D+00=   0.6752596013817376D+00
  0.6752596013817376D+00+  -0.2465685962947973D+01=  -0.1790426361566236D+01
 -0.1790426361566236D+01+   0.2628712487297249D+01=   0.8382861257310130D+00
  0.8382861257310130D+00+   0.1045203778178807D+01=   0.1883489903909820D+01
  0.0000000000000000D+00+   0.1111973010847838D-11=   0.1111973010847838D-11
  0.1111973010847838D-11+  -0.1972206715662725D-06=  -0.1972195595932617D-06
 -0.1972195595932617D-06+   0.2556697565971491D-03=   0.2554725370375558D-03
  0.2554725370375558D-03+  -0.9107872783702130D-02=  -0.8852400246664574D-02
 -0.8852400246664574D-02+   0.1224030974802360D+00=   0.1135506972335714D+00
  0.1135506972335714D+00+  -0.7420646068905499D+00=  -0.6285139096569785D+00
 -0.6285139096569785D+00+   0.1897216758767210D+01=   0.1268702849110231D+01
  0.1268702849110231D+01+  -0.1265894296958060D+01=   0.2808552152170796D-02

  0.2808552152170796D-02

Note that the largest magnitude number is ...D+1 and the smallest is ...D-12, for a magnitude range of D+14

Decimal precision of real*8 is D+15.95.

Depending on the order of the summation, say of the smallest magnitude number (D-12) with a large magnitude number (D+1) you could drop D+15.95 - (D+15.95 - D+14) = 14 of the significant digits (for that specific addition).

The result, therefore, is dependent on two factors:

1) the order in the summation with respect to the magnitude of the number being added to the partial sum
2) If a precision difference is introduced with hardware FMA

Barring promoting the calculation of the dot product to real*16, the only other alternative is to perform the sum of the products from least magnitude to greatest magnitude, thus maintaining greatest precision. However, this would require an additional array, and a sort or binning:

program strange_bug
    implicit none
    integer :: dim, i, j, k
    real*8:: b, temp
    real*8 ,allocatable:: mat_a(:,:), mat_b(:,:), mat_out(:,:)
    
    integer, parameter :: smallest_magnitude = exponent(tiny(0.0_8))
    integer, parameter :: largest_magnitude = exponent(huge(0.0_8))
! this errors out on IVF 19.0u2
!    real*8 :: magnitude_bins(smallest_magnitude:largest_magnitude)
    real*8 :: magnitude_bins(-1021:1024)
    integer :: product_magnitude
    
    dim=67
    allocate (mat_out(dim,dim),mat_a(dim,dim),mat_b(dim,dim))
    read (100) mat_a,mat_b
    j=66
    do i=65,64,-1
        magnitude_bins=0.0_8
        do k=1, i
            temp = mat_a(i,k)*mat_b(k,j)
            product_magnitude = exponent(temp)
            magnitude_bins(product_magnitude) = magnitude_bins(product_magnitude) + temp
        enddo
        mat_out(i,j)=sum(magnitude_bins)
        !print*, mat_out(i,j)
    enddo
    print *
    print'(1d25.16)',mat_out(64,66)
end program strange_bug

** however, now you traded of performance (vector, and/or vector FMA) for, closer accuracy.

Jim Dempsey

0 Kudos
kissin__yoel
Beginner
941 Views

Thanks for that idea!

I'm not sure that this is the actual explanation in this instance: a) the error disappears when the print statement is active, or when the array dimension is a parameter - this seems to indicate some unexpected behaviour rather than how the summation works.

Also, if we print out the order of the original summation and compare it with the summation ordered by magnitude, we get : 

 

program strange_bug
    implicit none
    integer :: dim, i, j, k
    real*8:: b, temp
    real*8 ,allocatable:: mat_a(:,:), mat_b(:,:), mat_out(:,:)
    
    integer, parameter :: smallest_magnitude = exponent(tiny(0.0_8))
    integer, parameter :: largest_magnitude = exponent(huge(0.0_8))
! this errors out on IVF 19.0u2
!    real*8 :: magnitude_bins(smallest_magnitude:largest_magnitude)
    real*8 :: magnitude_bins(-1021:1024), jj(1)
    integer :: product_magnitude,l
    
    dim=67
    allocate (mat_out(dim,dim),mat_a(dim,dim),mat_b(dim,dim))
    read (100) mat_a,mat_b
    j=66
    do i=65,64,-1
        magnitude_bins=0.0_8

        do k=1, i
            temp = mat_a(i,k)*mat_b(k,j)
            product_magnitude = exponent(temp)
           magnitude_bins(product_magnitude) = magnitude_bins(product_magnitude) + temp
          if (temp/=0.0_8) write (*,*) temp,magnitude_bins(product_magnitude),product_magnitude
                        
        enddo
!        mat_out(i,j)=sum(magnitude_bins)
        do l=-1021,1024
       ! mat_out(i,j)=sum(magnitude_bins)
        mat_out(i,j)=mat_out(i,j)+magnitude_bins(l)
        if (magnitude_bins(l)/=0.0_8) write (*,*) l, magnitude_bins(l)
       
        !print*, mat_out(i,j)
        enddo
    enddo
    print *
    print'(1d25.16)',mat_out(64,66)
 end program strange_bug

we get 

  1.111973010847838E-012  1.111973010847838E-012         -39
 -1.972206715662725E-007 -1.972206715662725E-007         -22
  2.556697565971491E-004  2.556697565971491E-004         -11
 -9.107872783702130E-003 -9.107872783702130E-003          -6
  0.122403097480236       0.122403097480236               -3
 -0.742064606890550      -0.742064606890550                0
   1.89721675876721        1.89721675876721                1
  -1.26589429695806       0.631322461809149                1
         -39  1.111973010847838E-012
         -22 -1.972206715662725E-007
         -11  2.556697565971491E-004
          -6 -9.107872783702130E-003
          -3  0.122403097480236
           0 -0.742064606890550
           1  0.631322461809149

   0.2808552152170796D-02

so the order of summation is not really changed.

The reason I think that your code worked is because the accumulated product is stored in an array - magnitude_bins - instead of a real variable. Indeed, if we run the following: 

program strange_bug
    implicit none
    integer :: dim, i, j, k
    real*8:: b, temp
    real*8 ,allocatable:: mat_a(:,:), mat_b(:,:), mat_out(:,:)
    
    integer, parameter :: smallest_magnitude = exponent(tiny(0.0_8))
    integer, parameter :: largest_magnitude = exponent(huge(0.0_8))
! this errors out on IVF 19.0u2
!    real*8 :: magnitude_bins(smallest_magnitude:largest_magnitude)
    !real*8 :: magnitude_bins(-1021:1024)
    real*8 :: magnitude_bins(1:1)
    integer :: product_magnitude,l
    
    dim=67
    allocate (mat_out(dim,dim),mat_a(dim,dim),mat_b(dim,dim))
    read (100) mat_a,mat_b
    j=66
    do i=65,64,-1
        magnitude_bins=0.0_8
        do k=1, i
            temp = mat_a(i,k)*mat_b(k,j)
           !product_magnitude = exponent(temp) 
            product_magnitude = 1
           magnitude_bins(product_magnitude) = magnitude_bins(product_magnitude) + temp
          enddo
        mat_out(i,j)=magnitude_bins(1)
        enddo
    print *
    print'(1d25.16)',mat_out(64,66)
 end program strange_bug

gives the right answer while

program strange_bug
    implicit none
    integer :: dim, i, j, k
    real*8:: b, temp
    real*8 ,allocatable:: mat_a(:,:), mat_b(:,:), mat_out(:,:)
    
    integer, parameter :: smallest_magnitude = exponent(tiny(0.0_8))
    integer, parameter :: largest_magnitude = exponent(huge(0.0_8))
! this errors out on IVF 19.0u2
!    real*8 :: magnitude_bins(smallest_magnitude:largest_magnitude)
    !real*8 :: magnitude_bins(-1021:1024)
    real*8 :: magnitude_bins(1:1)
    integer :: product_magnitude,l
    
    dim=67
    allocate (mat_out(dim,dim),mat_a(dim,dim),mat_b(dim,dim))
    read (100) mat_a,mat_b
    j=66
    do i=65,64,-1
    b=0.0_8
        do k=1, i
            temp = mat_a(i,k)*mat_b(k,j)
            b = b + temp                        
        enddo
      mat_out(i,j)=b
        enddo
    print *
    print'(1d25.16)',mat_out(64,66)
 end program strange_bug

gives the wrong answer! Something quite strange going on.

0 Kudos
jimdempseyatthecove
Honored Contributor III
941 Views

Let me be clear. This isn't a case of a right answer verses a wrong answer.
Rather it is a case on one approximation verses a different approximation.

You might want to take some time to read: Improving Numerical Reproducibility in C/C++/Fortran
by Steve Lionel

The last version of the program in #7, release build, will elide (remove) the temp and use vectors and possibly FMA if available, Whereas the magnitude version will not use vectors (IOW similar behavior to Debug build)

Additional notes:

If you decide to use the magnitude_bins (slower but better precision), you could reduce the size of the array (and thus the time to perform the sum) by dividing the indicies to fit the range of exponent(temp)/10, or something like that (expand the range by 1 on each side to inhibit index out of range).

You should also consider

real*16 :: b
...
b = 0.0_16
...
   do k=1, I
     b = b + real(mat_a(I,k), kind(b)) * real(mat_b(k,j), kind(b))
   end do
   mat_out = b  ! or if you wish real(b, kind(mat_out))
...

The conversions to real*16 and additions may be less time than using the magnitude_bins and sum. You can experiment.

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
941 Views

The magnitude_bins could potentially have (worst case) an error of log2(10) bits past the lsb end of the mantissa times the length of the dot product. This is for the addition operation. The multiplication part of the dot product can loose up to the number of bits in the mantissa (52) past the lsb end of the mantissa. IIF (strongly worded if and only if) you (the cpu) does not incorporate the lost bits, you can expect an uncertainty of (up to) 1/2 bit per operation * the length of the dot product. 2x1/2x67 for the working program, or 2x1/2x65 for the reduced test program. Or approximately 6 and a fraction bits for random distribution. *** a worst case scenario would be where the first product were sufficiently large, such that should the remaining 66 products be extremely small (less than 1D15.9'th of the first product), that the summations result in adding 0.0's. This isn't necessarily a problem for small matrices, or even large matrices provided that the magnitudes of the products are reasonably close. You have to know the data characteristics as well as the abstract solution, then apply the knowledge to dictate how you write your code.

Jim Dempsey

0 Kudos
John_Campbell
New Contributor II
941 Views

I tested the program using gFortran 8.2.0 and tested -mavx vs -msse3 vs -msse2. All these gave the same results

However in all cases when testing all with -fast-math, I get  b = 0.2808552152171018D-02

Without -fast-math, I get b = 0.2808552152170796D-02

Is there an implied switch in ifort 19.1 /O2 /QxAVX or /QxSSE2 for -fast-math ? ( but not /QxSSE3 ! )

These tests on i5-2300

0 Kudos
kissin__yoel
Beginner
941 Views

Ah that's interesting.

If that is correct then how can I disable the -fast -math option? Would that impact the performance? Also, why is the result accurate with the print statement? Doesn't that point to there being some error perhaps in the memory allocation?

Using real*16 precision still doesn't give the desired answer unfortunately..

0 Kudos
Arjen_Markus
Honored Contributor I
941 Views

A print statement may change the generated object code completely. Quite often, if you have a so-called Heisenbug, a print statement makes it go away, or better the bug is hidden because you are trying to look at it via a print statement (that is where the name comes from). This is probably due to such a print statement upsetting optimisation opportunities

0 Kudos
jimdempseyatthecove
Honored Contributor III
941 Views

>>Using real*16 precision still doesn't give the desired answer unfortunately..

Ummm... it will give a closer answer to the correct (infinite precision) answer.

As Humpty Dumpty once said, per Lewis Carroll: “When I use a word, it means just what I choose it to mean, neither more nor less.”

Seeing that Debug build produces the answer you want, this implies no optimizations (specifically vectorization). Then, isolate the function that contains this procedure (looks like your own matrix multiplication), and compile this function with -O0.

Please review the article link in #8

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
941 Views

Maybe this will shed a little more light on the issue and give you a better understanding of "order makes a difference". I made a minor addition to your program. The program is built in debug (iow no vectorization). The inner loop dot product is now written three ways. Once iterating in original low index to high index (k=1,i) and another iterating from high index to low (k=i,1,-1), and lastly a scalar ersatz implementation of AVX2. The results of the dot product for cell (64,66) being printed out:

EDIT...

program strange_bug
    implicit none
    integer :: dim, i, j, k
    real*8:: b, b0, b1, b2, b3
    real*8 ,allocatable:: mat_a(:,:), mat_b(:,:), mat_out(:,:)
    
    dim=67
    allocate (mat_out(dim,dim),mat_a(dim,dim),mat_b(dim,dim))
    read (100) mat_a,mat_b
    j=66
    do i=65,64,-1
        ! lo papa hi um
        b=0.0_8
        do k=1, i   ! iterate low to high
            b = b + mat_a(i,k)*mat_b(k,j)
        enddo
        mat_out(i,j)=b
        if(i==64 .and. j==66) print'(1d25.16)',b
        
        ! hi papa lo um
        b=0.0_8
        do k=i, 1, -1   ! iterate high to low
            b = b + mat_a(i,k)*mat_b(k,j)
        enddo
        if(i==64 .and. j==66) print'(1d25.16)',b
        
        ! scalar simulation of vector dot product using AVX (4-wide real*8)
        ! AVX2 processes 4-wide slices represented in scalars b0, b1, b2, b3
        b0=0.0_8
        b1=0.0_8
        b2=0.0_8
        b3=0.0_8
        do k=1, i,4   ! iterate low to high at vector width
            b0 = b0 + mat_a(i,k+0)*mat_b(k+0,j)
            if(k+1 <= i) b1 = b1 + mat_a(i,k+1)*mat_b(k+1,j)
            if(k+2 <= i) b2 = b2 + mat_a(i,k+2)*mat_b(k+2,j)
            if(k+3 <= i) b3 = b3 + mat_a(i,k+3)*mat_b(k+3,j)
        enddo
        ! first horizontal add
        b0 = b0 + b1
        b2 = b2 + b3
        ! second horizontal add
        b = b0 + b2
        if(i==64 .and. j==66) print'(1d25.16)',b
       
    enddo
end program strange_bug

 0.2808552152170796D-02
 0.2808552152170850D-02
 0.2808552152170796D-02

You can now clearly see that the sequential (scalar) order of producing the product sums makes a difference in the results.

Which of the three have the correct answer, they all have the correct answer within the precision capability of the CPU.

** note, while the first and third results are the same. It should be noted that due to the summation order difference the sameness is more by chance than by design.

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
941 Views

Additional note.

If you absolutely require reproducibility, then you will be required to stick with the method (scalar, 2-wide, 4-wide, 8-wide, with/without FMA), together with any dependent libraries smvl, mkl, immintrin, ..., including optimization level,  as was used in the code that generated your reference code.

An alternative is to re-certify the results of using advanced (newer) capabilities of the processors.

A better process, IMHO, is to define the certification values with tolarances equal to the estimated error..

Jim Dempsey

0 Kudos
John_Campbell
New Contributor II
941 Views

Very interesting; this is the first time I have seen a significant difference by using -ffast-math, although I can't see why.
I changed the inner loop to print out the accumulation of b, in a way to minimise change to the loop operation. The results still vary with the use of -ffast-math.

! computation
  jj=66
  ii=64
!  do i=65,64,-1
  do i=1,64
    b=0. 
    do k=1, i 
      b = b + mat_a(ii,k)*mat_b(k,jj)
    end do
    mat_out(ii,jj)=b
    k = i
    print*, i,mat_a(ii,k),mat_b(k,jj),mat_out(ii,jj)
  end do
  print'(1d25.16)',mat_out(64,66)
  write (*,*) '  0.2808552152170796D-02  expected'

The change is: With and without the -ffast-math, the abbreviated results are listed below. I am listing this as the mat_a, mat_b values don't indicate to me why -ffast-math would produce different results, although this could be in the optimisation. Does anyone have a suggestion. (b(1:55,j) = 0)

================================================================ 
  It is now Saturday, 23 February 2019 at 13:29:11.803
 Intel(R) Core(TM) i5-2300 CPU @ 2.80GHz
[AUDIT Ver 1.21] Saturday, 23 February 2019 at 13:29:11.850
 Version : GCC version 8.2.0
 Options : -msse3 -mtune=generic -march=x86-64 -O3
 n=       71824  (       71824  expected)
           0  values differ in mat_a
        2278  non-zero values in mat_a
...
          56   5.2942005353994777        0.0000000000000000        0.0000000000000000     
          57  -6.9772361336749711       -1.5937156053541101E-013   1.1119730108478377E-012
          58   9.3635064939922223       -2.1062693948342160E-008  -1.9721955959326167E-007
          59  -12.932698708069680       -1.9769250205884531E-005   2.5547253703755583E-004
          60   18.678115037019669       -4.8762269456262042E-004  -8.8524002466645740E-003
          61  -23.570349441287728       -5.1930964275745881E-003  0.11355069723357142     
          62   24.873687256803521       -2.9833317402009962E-002 -0.62851390965697851     
          63  -20.132074971377321       -9.4238510509451617E-002   1.2687028491102310     
          64   9.8428051603894851      -0.12861113029570201        2.8085521521707957E-003
   0.2808552152170796D-02
   0.2808552152170796D-02  expected
[AUDIT Ver 1.21] elapse        0.123 seconds: Saturday, 23 February 2019 at 13:29:11.975        1.017
================================================================ 
  It is now Saturday, 23 February 2019 at 13:29:42.512
 Intel(R) Core(TM) i5-2300 CPU @ 2.80GHz
[AUDIT Ver 1.21] Saturday, 23 February 2019 at 13:29:42.558
 Version : GCC version 8.2.0
 Options : -msse3 -mtune=generic -march=x86-64 -O3 -ffast-math
 n=       71824  (       71824  expected)
           0  values differ in mat_a
        2278  non-zero values in mat_a
...
          56   5.2942005353994777        0.0000000000000000        0.0000000000000000     
          57  -6.9772361336749711       -1.5937156053541101E-013   1.1119730108478377E-012
          58   9.3635064939922223       -2.1062693948342160E-008  -1.9721955959326167E-007
          59  -12.932698708069680       -1.9769250205884531E-005   2.5547253703755583E-004
          60   18.678115037019669       -4.8762269456262042E-004  -8.8524002466645740E-003
          61  -23.570349441287728       -5.1930964275745881E-003  0.11355069723357142     
          62   24.873687256803521       -2.9833317402009962E-002 -0.62851390965697851     
          63  -20.132074971377321       -9.4238510509451617E-002   1.2687028491102310     
          64   9.8428051603894851      -0.12861113029570201        2.8085521521710177E-003
   0.2808552152171018D-02
   0.2808552152170796D-02  expected
[AUDIT Ver 1.21] elapse        0.075 seconds: Saturday, 23 February 2019 at 13:29:42.636        1.042

edit: original post modified for use of ii,jj

As Jim has found, it is difficult to tell which result is closer to the better answer ( with real*16 b >  0.2808552152170851D-02 )

0 Kudos
mecej4
Honored Contributor III
941 Views

Perhaps, a different viewpoint will help at this juncture. Let us set aside the effects of compiler options and FPU instruction sets, and concentrate on (i) loss of precision as a result of subtractive cancellation (ii) decimal representation versus internal representation of real numbers.

It so happens that only eight of the pairs of numbers being multiplied in OP's program+data are both non-zero. I took those eight pairs, shortened the numbers, and wrote the following simplified program.

program sample
implicit none
integer :: k, ib(2)
real*8:: b,t
real*8 :: u(8) = (/ -7d0, 9.4d0, -12.9d0,18.7d0, -23.6d0,24.8d0,-20.1d0,9.8d0 /), &
          v(8) = (/ 2d-13,2d-8,2d-5,5d-4,5d-3,3d-2,9d-2,1.198d-1 /)
b=0.d0
print '(A)','  i          u(i)                     v(i)                u(i)*v(i)           CumSum'
do k=1,8
   t = u(k)*v(k)
   b = b+t
   print '(i3,2es25.14,2f19.15)',k,u(k),v(k),t,b
end do
ib=transfer(b,ib)
print'(/,A,es25.15,1x,A,4x,2Z8,1x,A)',' Result = ',b,'(Decimal)',ib(2),ib(1),'(IEEE-64)'

end program sample

The output:

  i           u(i)                     v(i)               u(i)*v(i)           CumSum
  1   -7.00000000000000E+00    2.00000000000000E-13 -0.000000000001400 -0.000000000001400
  2    9.40000000000000E+00    2.00000000000000E-08  0.000000188000000  0.000000187998600
  3   -1.29000000000000E+01    2.00000000000000E-05 -0.000258000000000 -0.000257812001400
  4    1.87000000000000E+01    5.00000000000000E-04  0.009350000000000  0.009092187998600
  5   -2.36000000000000E+01    5.00000000000000E-03 -0.118000000000000 -0.108907812001400
  6    2.48000000000000E+01    3.00000000000000E-02  0.744000000000000  0.635092187998600
  7   -2.01000000000000E+01    9.00000000000000E-02 -1.809000000000000 -1.173907812001400
  8    9.80000000000000E+00    1.19800000000000E-01  1.174040000000000  0.000132187998600

 Result =     1.321879986000418E-04 (Decimal)    3F21537E4306C000 (IEEE-64)

Note that the all the numbers in the last column terminate in '00', as does the internal representation of the result, '...C000'. The '418' at the end of the decimal result is simply the result of conversion from the internal representation to decimal. Conversely, the internal representation of the decimal number 0.000132187998600 (which is actually the exact result of doing all the calculations using decimal arithmetic) is 3F21537E4306B9F9, which does not end in '00'.

All but the last two of the products being summed have magnitudes in ascending order. The last addition involves two nearly equal numbers, and about 4 digits of precision are lost as a result of that addition.

0 Kudos
andrew_4619
Honored Contributor II
941 Views

kissin, yoel wrote:
Using real*16 precision still doesn't give the desired answer unfortunately..

Using real*16 must surely give a more precise answer. Is your "desired" answer the one that some combination of compiler and options has  produced in the past that has been defined as the "correct" answer? If that is the case you need to reconsider your QA process.

0 Kudos
John_Campbell
New Contributor II
941 Views

What should the desired answer be ?

Given that:
  mat_a(64,63) = -20.132074971377321   
  mat_b(63,66) =  -0.094238510509451617
 their product =   1.8972167587672100
and considering the accuracy of these provided values,
the resulting variation in accumulated b should be inside the tolerance for the "desired answer" using 8-byte reals.
(At this stage of the calculation, b could not be more accurate than 2.D-16, as epsilon = 2.2D-16. The difference between the two possible values of b is actually 2.2d-16)

To argue any differently would require greater accuracy of the values in mat_a and mat_b.
I doubt if their accuracy can be improved.

For me, I have reviewed another example where -ffast-math gives acceptable accuracy.

0 Kudos
gib
New Contributor II
941 Views

The various methods will all write out the same result with a suitably chosen format (i.e. one without too many decimal places).

0 Kudos
jimdempseyatthecove
Honored Contributor III
811 Views

The (a potential) problem the OP has is he may be up against is having his code meeting "certified correct results". Where the certification requires the approximation of the results data from some an approved prior work, to match the approximation of the results data from a newly derived (compiled) work... AS IF the original results data were absolutely correct. IOW to reach a dogmatic result as opposed to equivalent or even better result.

Jim Dempsey

0 Kudos
Reply