"-fp-model precise" seems to

Alexander_B_2 · ‎03-04-2016

The following code produces the correct result when compiled with -O2 but the results can wrong with -O3.

$ ifort -O2 test_bug.F90 && ./a.out
size(H,2),n 10 10 T
HPH 100.0000 100.0000 100.0000 100.0000

$ ifort -O3 test_bug.F90 && ./a.out
size(H,2),n 10 10 T
HPH 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00

I am using ifort 14.0.1 20131008 but a colleague confirms that this error also affects the more current version with the version string 2015.5.223_ilp64.

The provided code is a minimal program with produces the error and seemingly unrelated statements affects whether the result is correct or not.

Thank you for your help!

program test_rrsqrt
 implicit none
  integer, parameter :: m = 5
  integer, parameter :: n = 10  
  real :: H(m,n)
  H = 1
  call testing_local_analysis_covar(H)
contains

 subroutine testing_local_analysis_covar(H)
  implicit none
  real, intent(in) :: H(:,:)
  real :: Pf(size(H,2),size(H,2))
  real :: HPH(2,2)
  integer :: mloc
  real, allocatable :: Hloc(:,:)
  real :: A(n,2), B(2,2)

  Pf = 1

  ! bug is not triggered if one these two lines are commented
  A = matmul(Pf, transpose(H))
  B = matmul(H,matmul(Pf,transpose(H)))

  mloc = 2
  write(6,*) 'size(H,2),n ',size(H,2), n, size(H,2) == n

  allocate(Hloc(mloc,size(H,2))) ! tiggers bug
!  allocate(Hloc(mloc,n))  ! does no trigger bug

  Pf = 1
  Hloc = 1

  HPH = matmul(Hloc,matmul(Pf,transpose(Hloc))) ! -> does not work, unless allocate(Hloc(mloc,n))
!  HPH = matmul(matmul(Hloc,Pf),transpose(Hloc)) ! -> works!
  write(6,*) 'HPH', HPH
  deallocate(Hloc)
 end subroutine testing_local_analysis_covar
end program test_rrsqrt

Steven_L_Intel1 · ‎03-04-2016

Thanks - I can reproduce this and we'll investigate further.

adel_s_1 · ‎03-04-2016

Possible Google Search

Steven_L_Intel1 · ‎03-04-2016

Escalated as issue DPD200407800. I will update this thread when I learn more.

Steven_L_Intel1 · ‎03-22-2016

The developers tell me that one of the optimization phases hits an internal limit and gives up, leaving the internal representation in an unstable state. Until this is fixed, you can use the (undocumented) option -qoverride-limits as a workaround to allow the phase to complete. I tested this and it does work for your example (without taking any noticeable more time to compile.)

Alexander_B_2 · ‎03-22-2016

Thank you very much for your helpful insight! Is there a chance that future versions of ifort will accept this code directly with -O3?

Steven_L_Intel1 · ‎03-22-2016

Yes, I certainly hope so! That you get wrong code at -O3 is a bug. I don't know how it will get fixed, but it will get fixed. I just wanted to give you a workaround for now. When I hear more from the developers, I will let you know here.

Steven_L_Intel1 · ‎03-29-2016

This has been fixed for Update 3, due in May.

Alexander_B_2 · ‎03-29-2016

Great! Thank you very much for resolving this issue!

kgore4 · ‎03-29-2016

"-fp-model precise" seems to work around it too.

$ ifort -mkl -O3 -fp-model precise mm.f90 && ./a.out
size(H,2),n           10          10 T
HPH   100.0000       100.0000       100.0000       100.0000
$ ifort -mkl -O3 mm.f90 && ./a.out                 size(H,2),n           10          10 T
HPH 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00
$ ifort -mkl -O2 mm.f90 && ./a.out
size(H,2),n           10          10 T
HPH   100.0000       100.0000       100.0000       100.0000

matmul can give wrong results when code is compiled with -O3