Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.
29306 Discussions

problem with forall construct when optimization -O2 is used

jerzyg
Beginner
779 Views
hello,

I observed a strange behavior of ifort 11.1 when compiling some codes with forall loops and array stripes. Please consider the following example:


module mx
contains

pure function ocross_product_dp(a,b)
implicit none
double precision,dimension(:),intent(in) :: a,b
double precision,dimension(size(a),size(b)) :: ocross_product_dp
integer :: i
forall (i = 1:size(a))
ocross_product_dp(i,:i) = a(i) * b(:i)
endforall
end function

pure function ocross_product_xp(a,b)
implicit none
double precision,dimension(5),intent(in) :: a,b
double precision,dimension(5,5) :: ocross_product_xp
integer :: i
forall (i = 1:5)
ocross_product_xp(i,:i) = a(i) * b(:i)
endforall
end function

end module

program fngTest
use mx
implicit none
integer :: i
double precision,dimension(5,5) :: M = 0.D0
double precision,dimension(5) :: U = 0.D0
double precision,dimension(5) :: V = 0.D0

do i=1,5
U(i) = dble(i)
V(i) = 10.D0*i
enddo

write(*,45) 'u', U
write(*,45) 'v', V
print *, 'M_raw before:'
write(*,*) M
M = ocross_product_dp(U,V)
write(*,*) 'M='
write(*,44) (M(i,1:5) , i=1,5)

print *, 'M_raw after:'
write(*,*) M

44 format(5(F12.6,X))
45 format(A,X,':',X,5(F12.6,X))
end program


The functions ocross_product_dp and ocross_product_xp in module mx are supposed to calculate lower-triangular matrix. The forall construct is used that iterates over stripes of output matrix and stripes of input vectors.

If the code listed above is compiled with either -O2 or -O3 optimization switch, the output from the program is as follows (please look at the first and second row after "M="):


u : 1.000000 2.000000 3.000000 4.000000 5.000000
v : 10.000000 20.000000 30.000000 40.000000 50.000000
M_raw before:
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000
M=
10.000000 0.000000 0.000000 0.000000 1.000000
20.000000 40.000000 0.000000 1.000000 0.000000
30.000000 60.000000 90.000000 0.000000 0.000000
40.000000 80.000000 120.000000 160.000000 0.000000
50.000000 100.000000 150.000000 200.000000 250.000000
M_raw after:
10.0000000000000 20.0000000000000 30.0000000000000
40.0000000000000 50.0000000000000 6.953252086138744E-310
40.0000000000000 60.0000000000000 80.0000000000000
100.000000000000 0.000000000000000E+000 0.000000000000000E+000
90.0000000000000 120.000000000000 150.000000000000
0.000000000000000E+000 1.00000000000000 6.953252086109890E-310
160.000000000000 200.000000000000 1.00000000000000
6.953252086138744E-310 0.000000000000000E+000 0.000000000000000E+000
250.000000000000


The elements (1,5) and (2,4) of matrix M are clearly incorrect, they should be both zero.

If I replace call to function ocross_product_dp with a call to function ocross_product_xp or if I turn off optimization by specifying -O0 compiler switch, the output from the program is exactly as expected:


u : 1.000000 2.000000 3.000000 4.000000 5.000000
v : 10.000000 20.000000 30.000000 40.000000 50.000000
M_raw before:
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
0.000000000000000E+000
M=
10.000000 0.000000 0.000000 0.000000 0.000000
20.000000 40.000000 0.000000 0.000000 0.000000
30.000000 60.000000 90.000000 0.000000 0.000000
40.000000 80.000000 120.000000 160.000000 0.000000
50.000000 100.000000 150.000000 200.000000 250.000000
M_raw after:
10.0000000000000 20.0000000000000 30.0000000000000
40.0000000000000 50.0000000000000 6.953271318849528E-310
40.0000000000000 60.0000000000000 80.0000000000000
100.000000000000 6.902014741602593E-310 6.902014741603581E-310
90.0000000000000 120.000000000000 150.000000000000
2.077534183186942E-317 6.953271318813165E-310 6.953271318806446E-310
160.000000000000 200.000000000000 6.953271318815537E-310
6.953271318849528E-310 0.000000000000000E+000 1.570995158424529E-316
250.000000000000


Could you point out where I made a mistake in this code? Perhaps it is a result of compiler bug?

I use ifort 11.1 20100414, the program above was tested on two Intel emt64 systems.
I also compiled the code with gfortran, the results I obtained are correct regardless optimization is turned on or off.

Thanks in advance for help.

Best regards,
Jerzy

0 Kudos
5 Replies
Kevin_D_Intel
Employee
779 Views
Thank you for the nice reproducer. This appears to be a defect (as you note at opt-levels -O2 and higher) with using SIZE() in the declaration for ocross_product_dp. The use in the forall triplet-spec is working. I'll forward this to Development and update the post when I learn more.
0 Kudos
jerzyg
Beginner
779 Views
Thanks for your response.

I may add that according to tests I performed, size and shape of parameters that are received by ocross_product_dp appears to be correct.

Moreover, if I replace initialization of U and V arrays as shown in my previous post:

double precision,dimension(5) :: U = 0.D0
double precision,dimension(5) :: V = 0.D0
do i=1,5
U(i) = dble(i)
V(i) = 10.D0*i
enddo


by initialization on declatation:


double precision,dimension(5) :: U = (/ ( dble(i), i=1,5 ) /), V = (/ (10.D0*i, i=1,5) /)


then the problem with incorrect result from ocross_product_dp misteriously disappears and the content of the output matrix becomes correct.

best regards,
Jerzy
0 Kudos
Kevin_D_Intel
Employee
779 Views

Hi Jerzy,

Sorry for the delay getting back to you. After investigating further, I believe there is a programming error here but I'm awaiting Development's opinion too.

First, I overlooked this fact initially but notice in your posted results that "M_raw after" shows pollution of array M with garbage values for both your ocross_product_xp and ocross_product_dp array-valued functions.

The programming error is that the array-valued functions return the entire array and not just the array section calculated, therefore, the garbage values collect in array M as a result of not initializing the array inside the array-valued function.

I believe the solution is to add initialization statements inside each function, something like this:

ocross_product_dp = 0.D0

0 Kudos
Kevin_D_Intel
Employee
779 Views
Hi Jerzy,

Development confirmed the array-valued function returns the entire array. They alsoindicated for across-product you probably wanted to do this:

ocross_product_dp(i,:size(b)) = a(i) * b(:size(b))

instead of this:

ocross_product_dp(i,:i) = a(i) * b(:i)

That modification fills the entire array and M after becomes:

M=
10.000000 20.000000 30.000000 40.000000 50.000000
20.000000 40.000000 60.000000 80.000000 100.000000
30.000000 60.000000 90.000000 120.000000 150.000000
40.000000 80.000000 120.000000 160.000000 200.000000
50.000000 100.000000 150.000000 200.000000 250.000000


M_raw after:
10.0000000000000 20.0000000000000 30.0000000000000
40.0000000000000 50.0000000000000 20.0000000000000
40.0000000000000 60.0000000000000 80.0000000000000
100.000000000000 30.0000000000000 60.0000000000000
90.0000000000000 120.000000000000 150.000000000000
40.0000000000000 80.0000000000000 120.000000000000
160.000000000000 200.000000000000 50.0000000000000
100.000000000000 150.000000000000 200.000000000000
250.000000000000

Hope that helps.

0 Kudos
jerzyg
Beginner
779 Views
Hi Kevin,

thank you for pointing out the origin of the problem. Indeed, lack of array initialization was a flaw of this code.

Thanks for your help!

best regards,
Jerzy
0 Kudos
Reply