Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

matmul can give wrong results when code is compiled with -O3

Alexander_B_2
Beginner
673 Views

The following code produces the correct result when compiled with -O2 but the results can wrong with -O3.

$ ifort  -O2 test_bug.F90 && ./a.out
 size(H,2),n           10          10 T
 HPH   100.0000       100.0000       100.0000       100.0000


$ ifort  -O3 test_bug.F90 && ./a.out
 size(H,2),n           10          10 T
 HPH  0.0000000E+00  0.0000000E+00  0.0000000E+00  0.0000000E+00

I am using ifort 14.0.1 20131008 but a colleague confirms that this error also affects the more current version with the version string 2015.5.223_ilp64.

The provided code is a minimal program with produces the error and seemingly unrelated statements affects whether the result is correct or not.

Thank you for your help!

program test_rrsqrt
 implicit none
  integer, parameter :: m = 5
  integer, parameter :: n = 10  
  real :: H(m,n)
  H = 1
  call testing_local_analysis_covar(H)
contains

 subroutine testing_local_analysis_covar(H)
  implicit none
  real, intent(in) :: H(:,:)
  real :: Pf(size(H,2),size(H,2))
  real :: HPH(2,2)
  integer :: mloc
  real, allocatable :: Hloc(:,:)
  real :: A(n,2), B(2,2)

  Pf = 1

  ! bug is not triggered if one these two lines are commented
  A = matmul(Pf, transpose(H))
  B = matmul(H,matmul(Pf,transpose(H)))

  mloc = 2
  write(6,*) 'size(H,2),n ',size(H,2), n, size(H,2) == n

  allocate(Hloc(mloc,size(H,2))) ! tiggers bug
!  allocate(Hloc(mloc,n))  ! does no trigger bug

  Pf = 1
  Hloc = 1

  HPH = matmul(Hloc,matmul(Pf,transpose(Hloc))) ! -> does not work, unless allocate(Hloc(mloc,n))
!  HPH = matmul(matmul(Hloc,Pf),transpose(Hloc)) ! -> works!
  write(6,*) 'HPH', HPH
  deallocate(Hloc)
 end subroutine testing_local_analysis_covar
end program test_rrsqrt


 

0 Kudos
9 Replies
Steven_L_Intel1
Employee
673 Views

Thanks - I can reproduce this and we'll investigate further.

0 Kudos
adel_s_1
Beginner
673 Views

Possible Google Search

0 Kudos
Steven_L_Intel1
Employee
673 Views

Escalated as issue DPD200407800. I will update this thread when I learn more.

0 Kudos
Steven_L_Intel1
Employee
673 Views

The developers tell me that one of the optimization phases hits an internal limit and gives up, leaving the internal representation in an unstable state. Until this is fixed, you can use the (undocumented) option -qoverride-limits as a workaround to allow the phase to complete. I tested this and it does work for your example (without taking any noticeable more time to compile.)

0 Kudos
Alexander_B_2
Beginner
673 Views

Thank you very much for your helpful insight! Is there a chance that future versions of ifort will accept this code directly with -O3?

0 Kudos
Steven_L_Intel1
Employee
673 Views

Yes, I certainly hope so! That you get wrong code at -O3 is a bug. I don't know how it will get fixed, but it will get fixed. I just wanted to give you a workaround for now. When I hear more from the developers, I will let you know here.

0 Kudos
Steven_L_Intel1
Employee
673 Views

This has been fixed for Update 3, due in May.

0 Kudos
Alexander_B_2
Beginner
673 Views

Great! Thank you very much for resolving this issue!

0 Kudos
kgore4
Beginner
673 Views

"-fp-model precise" seems to work around it too.

$ ifort -mkl -O3 -fp-model precise mm.f90 && ./a.out
 size(H,2),n           10          10 T
 HPH   100.0000       100.0000       100.0000       100.0000
$ ifort -mkl -O3 mm.f90 && ./a.out                 size(H,2),n           10          10 T
 HPH  0.0000000E+00  0.0000000E+00  0.0000000E+00  0.0000000E+00
$ ifort -mkl -O2 mm.f90 && ./a.out
 size(H,2),n           10          10 T
 HPH   100.0000       100.0000       100.0000       100.0000

 

0 Kudos
Reply