Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29278 Discussions

inconsistent results with -O2 as compared to -O1

m_sulc
Beginner
497 Views
Hello,
I have encountered following problem with ifort compiler v 12.1.3 on x86_64 GNU/Linux.
When I compile the following minimal example with the options "-O2 -implicitnone" I get
different results for the two loops in the "compute_matrix_H" subroutine (i.e. the two DO loops,
where the first one has as upper bound the variable N, while a fixed value is used in the second one).
It seems that with "-O1", or adding "-g" to "-O2" resolves this issue as well as allocating the working
array mp_v3 dynamically as indicated in the subroutine. Also not allocating the (unused) array mat_SS
seems to have a positive effect.

The included module "parameters" is stored in an external file and contains solely the declaration

INTEGER, PARAMETER :: adequate = KIND(1D0)
INTEGER :: NDim
INTEGER :: NGWPs

It seems that the problem is also resolved when this declaration is moved into the
main program and the inclusion of the "parameters" module is therefore avoided.

Any ideas why this occurs?

Thanks in advance,
M.



[fortran]!------------------------------------------------------------------------------- PROGRAM gdr USE parameters ! INTEGER, PARAMETER :: adequate = KIND(1D0) ! INTEGER :: NDim ! INTEGER :: NGWPs COMPLEX(adequate), ALLOCATABLE :: mat_SS(:, :) NDim = 2 NGWPs = 16 ALLOCATE(mat_SS(NGWPs, NGWPs)) CALL compute_matrix_H(NDim) DEALLOCATE(mat_SS) CONTAINS SUBROUTINE compute_matrix_H(N) INTEGER, INTENT(IN) :: N COMPLEX(adequate) :: N_mat(1:N, 1:N), mp_v3(1:N, 1:N, 1:N) REAL(adequate) :: x, y, beta INTEGER :: M, l, k ! COMPLEX(adequate), ALLOCATABLE :: N_mat(:, :), mp_v3(:, :, :) ! ALLOCATE(n_mat(N, N), mp_v3(N, N, N)) WRITE(*, *) "size of mp_v3 = ", SIZE(mp_v3) x = 3 y = 2 beta = 0.01_adequate mp_v3(1, 1, 1) = 6*x*beta mp_v3(2, 1, 1) = 2*y mp_v3(1, 2, 1) = 2*y mp_v3(2, 2, 1) = 2*x mp_v3(1, 1, 2) = 2*y mp_v3(2, 1, 2) = 2*x mp_v3(1, 2, 2) = 2*x mp_v3(2, 2, 2) = 6*y*beta N_mat = 0 DO k = 1, N DO l = 1, N N_mat(l, k) = mp_v3(l, l, k) END DO END DO WRITE(*, '(2(2F22.15,/))') ABS(N_mat) N_mat = 0 DO k = 1, 2 DO l = 1, 2 N_mat(l, k) = mp_v3(l, l, k) END DO END DO WRITE(*, '(2(2F22.15,/))') ABS(N_mat) END SUBROUTINE END PROGRAM[/fortran]
0 Kudos
2 Replies
Anonymous66
Valued Contributor I
497 Views
I get the same results at -O2 as at -O0.

It is possible for optimization to affect the results of some programs though due to changes in the order of operations. If you want to avoid any optimizations that can affect the results,compile with the option-fp-model precise.
0 Kudos
mecej4
Honored Contributor III
497 Views
There is definitely an optimizer bug here.

The loop

[fortran] DO k = 1, N DO l = 1, N N_mat(l, k) = mp_v3(l, l, k) END DO END DO [/fortran] is compiled to

[bash]..B1.13: # Preds ..B1.13 ..B1.12 movaps (%rcx,%rdx), %xmm0 #15.12 incq %rdi #15.12 movaps %xmm0, (%rsi,%rbx) #15.12 addq $16, %rsi #15.12 addq %r14, %rcx #15.12 cmpq %rax, %rdi #15.12 jb ..B1.13 # Prob 96% #15.12 [/bash]
with the 64 bit version of IFort 12.1.4.319 Build 20120410 (and 12.1.4.325 Build 20120410 on Windows 7) when the options used are -O2 -QxHOST (OS: Suse Linux 12.1, CPU: C2D E8400). On Windows, the same errors occur if -O0 is used. On both platforms, no errors occurs when -O1 or -O3 are used.

Similar errors occur with the 32-bit version of the same release, and the corresponding WIndows compilers, although of course the assembler outputs are different:

The fault is in the instruction [bash] addq $16, %rsi #15.12[/bash]In effect, the compiled code is storing into elements (1,1,1), (1,2,1), (1,1,2), (1,2,2) of array mp_v3, because of the uniform stride in the address calculation in the compiled code. The proper elements that should have been stored into are (1,1,1), (2,2,1), (1,1,2), (2,2,2), and the corresponding increments that should have been added to the index in %rsi should have been 24, 8 and 24 bytes, instead of the uniform 16 byte increment used.

The use of "-fp precise" has no relevance to this bug, which is a serious optimizer bug and should be noted.

Curiously, if the declaration, allocation and deallocation of the array mat_SS, which is otherwise not used in the rest of the program, are taken out, the bug disappears.

I have made up a simpler "reproducer" by removing complex variables and portions of the code that may be removed without making the bug go away, and posted it in a new thread, Optimizer Bug.

[UPDATE: Ron Green of Intel has responded with:

The bug report ID is DPD200232347

]
0 Kudos
Reply