problem with forall construct when optimization -O2 is used

jerzyg · ‎08-06-2010

hello,

I observed a strange behavior of ifort 11.1 when compiling some codes with forall loops and array stripes. Please consider the following example:

module mx
contains

pure function ocross_product_dp(a,b)
implicit none
double precision,dimension(:),intent(in) :: a,b
double precision,dimension(size(a),size(b)) :: ocross_product_dp
integer :: i
forall (i = 1:size(a))
ocross_product_dp(i,:i) = a(i) * b(:i)
endforall
end function

pure function ocross_product_xp(a,b)
implicit none
double precision,dimension(5),intent(in) :: a,b
double precision,dimension(5,5) :: ocross_product_xp
integer :: i
forall (i = 1:5)
ocross_product_xp(i,:i) = a(i) * b(:i)
endforall
end function

end module

program fngTest
use mx
implicit none
integer :: i
double precision,dimension(5,5) :: M = 0.D0
double precision,dimension(5) :: U = 0.D0
double precision,dimension(5) :: V = 0.D0

do i=1,5
U(i) = dble(i)
V(i) = 10.D0*i
enddo

write(*,45) 'u', U
write(*,45) 'v', V
print *, 'M_raw before:'
write(*,*) M
M = ocross_product_dp(U,V)
write(*,*) 'M='
write(*,44) (M(i,1:5) , i=1,5)

print *, 'M_raw after:'
write(*,*) M

44 format(5(F12.6,X))
45 format(A,X,':',X,5(F12.6,X))
end program

The functions ocross_product_dp and ocross_product_xp in module mx are supposed to calculate lower-triangular matrix. The forall construct is used that iterates over stripes of output matrix and stripes of input vectors.

If the code listed above is compiled with either -O2 or -O3 optimization switch, the output from the program is as follows (please look at the first and second row after "M="):

u : 1.000000 2.000000 3.000000 4.000000 5.000000
v : 10.000000 20.000000 30.000000 40.000000 50.000000
M_raw before:
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000
M=
10.000000 0.000000 0.000000 0.000000 1.000000
20.000000 40.000000 0.000000 1.000000 0.000000
30.000000 60.000000 90.000000 0.000000 0.000000
40.000000 80.000000 120.000000 160.000000 0.000000
50.000000 100.000000 150.000000 200.000000 250.000000
M_raw after:
10.0000000000000 20.0000000000000 30.0000000000000
40.0000000000000 50.0000000000000 6.953252086138744E-310
40.0000000000000 60.0000000000000 80.0000000000000
100.000000000000 0.000000000000000E+000 0.000000000000000E+000
90.0000000000000 120.000000000000 150.000000000000
0.000000000000000E+000 1.00000000000000 6.953252086109890E-310
160.000000000000 200.000000000000 1.00000000000000
6.953252086138744E-310 0.000000000000000E+000 0.000000000000000E+000
250.000000000000

The elements (1,5) and (2,4) of matrix M are clearly incorrect, they should be both zero.

If I replace call to function ocross_product_dp with a call to function ocross_product_xp or if I turn off optimization by specifying -O0 compiler switch, the output from the program is exactly as expected:

u : 1.000000 2.000000 3.000000 4.000000 5.000000
v : 10.000000 20.000000 30.000000 40.000000 50.000000
M_raw before:
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000
M=
10.000000 0.000000 0.000000 0.000000 0.000000
20.000000 40.000000 0.000000 0.000000 0.000000
30.000000 60.000000 90.000000 0.000000 0.000000
40.000000 80.000000 120.000000 160.000000 0.000000
50.000000 100.000000 150.000000 200.000000 250.000000
M_raw after:
10.0000000000000 20.0000000000000 30.0000000000000
40.0000000000000 50.0000000000000 6.953271318849528E-310
40.0000000000000 60.0000000000000 80.0000000000000
100.000000000000 6.902014741602593E-310 6.902014741603581E-310
90.0000000000000 120.000000000000 150.000000000000
2.077534183186942E-317 6.953271318813165E-310 6.953271318806446E-310
160.000000000000 200.000000000000 6.953271318815537E-310
6.953271318849528E-310 0.000000000000000E+000 1.570995158424529E-316
250.000000000000

Could you point out where I made a mistake in this code? Perhaps it is a result of compiler bug?

I use ifort 11.1 20100414, the program above was tested on two Intel emt64 systems.
I also compiled the code with gfortran, the results I obtained are correct regardless optimization is turned on or off.

Thanks in advance for help.

Best regards,
Jerzy

Kevin_D_Intel · ‎08-06-2010

Thank you for the nice reproducer. This appears to be a defect (as you note at opt-levels -O2 and higher) with using SIZE() in the declaration for ocross_product_dp. The use in the forall triplet-spec is working. I'll forward this to Development and update the post when I learn more.

jerzyg · ‎08-06-2010

Thanks for your response.

I may add that according to tests I performed, size and shape of parameters that are received by ocross_product_dp appears to be correct.

Moreover, if I replace initialization of U and V arrays as shown in my previous post:

double precision,dimension(5) :: U = 0.D0
double precision,dimension(5) :: V = 0.D0
do i=1,5
U(i) = dble(i)
V(i) = 10.D0*i
enddo

by initialization on declatation:

double precision,dimension(5) :: U = (/ ( dble(i), i=1,5 ) /), V = (/ (10.D0*i, i=1,5) /)

then the problem with incorrect result from ocross_product_dp misteriously disappears and the content of the output matrix becomes correct.

best regards,
Jerzy

Kevin_D_Intel · ‎08-13-2010

Hi Jerzy,

Sorry for the delay getting back to you. After investigating further, I believe there is a programming error here but I'm awaiting Development's opinion too.

First, I overlooked this fact initially but notice in your posted results that "M_raw after" shows pollution of array M with garbage values for both your ocross_product_xp and ocross_product_dp array-valued functions.

The programming error is that the array-valued functions return the entire array and not just the array section calculated, therefore, the garbage values collect in array M as a result of not initializing the array inside the array-valued function.

I believe the solution is to add initialization statements inside each function, something like this:

ocross_product_dp = 0.D0

Kevin_D_Intel · ‎08-13-2010

Hi Jerzy,

Development confirmed the array-valued function returns the entire array. They alsoindicated for across-product you probably wanted to do this:

ocross_product_dp(i,:size(b)) = a(i) * b(:size(b))

instead of this:

ocross_product_dp(i,:i) = a(i) * b(:i)

That modification fills the entire array and M after becomes:

M=
10.000000 20.000000 30.000000 40.000000 50.000000
20.000000 40.000000 60.000000 80.000000 100.000000
30.000000 60.000000 90.000000 120.000000 150.000000
40.000000 80.000000 120.000000 160.000000 200.000000
50.000000 100.000000 150.000000 200.000000 250.000000

M_raw after:
10.0000000000000 20.0000000000000 30.0000000000000
40.0000000000000 50.0000000000000 20.0000000000000
40.0000000000000 60.0000000000000 80.0000000000000
100.000000000000 30.0000000000000 60.0000000000000
90.0000000000000 120.000000000000 150.000000000000
40.0000000000000 80.0000000000000 120.000000000000
160.000000000000 200.000000000000 50.0000000000000
100.000000000000 150.000000000000 200.000000000000
250.000000000000

Hope that helps.

jerzyg · ‎08-17-2010

Hi Kevin,

thank you for pointing out the origin of the problem. Indeed, lack of array initialization was a flaw of this code.

Thanks for your help!

best regards,
Jerzy