vectorization of SUM and array products

cngilbreth · ‎07-27-2008

Hi,

I'm trying to get an array expression inside a loop to vectorize, but am having problems. The loop is:

do j = 1, nstate
do i = 1, nstate
mu = abs(mmap(j) - mmap(i))

lb = kpa_min(i,j)
ub =kpa_max(i,j)
! Would like the following two statements to vectorize:
sum1 = sum(sig(mu,it)%a(lb:ub) * upsilonQ(i,j)%a)
sum2 = sum(tau(mu,it)%a(lb:ub) * upsilonP(i,j)%a)

ham(i, j) = sum1 + sum2
end do
end do

(I hope this is clear enough to see what I am trying to do.)
Here %a refers to an allocatable array, contained in a type whose only purpose is to contain such an array. sig, tau, upsilonQ and upsilonP are arrays of such structures. On the right-hand-side of the multiplication, %a is complex, while on the left it is real.

The messages from -vec-report3 are:

evolve.f90(236): (col. 18) remark: loop was not vectorized: existence of vector dependence.
evolve.f90(236): (col. 18) remark: vector dependence: proven ANTI dependence between (unknown) line 236, and (unknown) line 236.
evolve.f90(236): (col. 18) remark: vector dependence: proven FLOW dependence between (unknown) line 236, and (unknown) line 236.

I get this for both lines, as well as a bunch of duplicates of the ANTI and FLOW messages.

Any tips on how to get these statements to vectorize? I initially tried dot_product but had the same results.

Thanks very much,
Chris

TimP · ‎07-28-2008

It might be clear if you would give an example which could be compiled, including your compile options. In any case, at least -xO -fp-model fast would be required for vectorization of complex sum reduction. From your example, I don't understand why you don't use either
dot_product(sum(sig(mu,it)%a(lb:ub) , upsilonQ(i,j)%a(lb:ub))
or
sum(sig(mu,it)%a(lb:ub)) * upsilonQ(i,j)%a

cngilbreth · ‎07-28-2008

Ok, hopefully this will be clearer:

module array
type darray
real*8, allocatable :: a(:)
end type darray

type carray
complex*16, allocatable :: a(:)
end type carray
end module array

program test
use array
implicit none

type(darray), allocatable :: darrays(:,:)
type(carray), allocatable :: carrays(:,:)
integer :: i,j
complex*16 :: sum1

allocate(darrays(4,4))
allocate(carrays(4,4))

do j=1,4
do i=1,4
allocate(darrays(i,j)%a(10))
allocate(carrays(i,j)%a(10))

! loop not vectorized
! proven FLOW dependence...
! proven ANTI dependence...
darrays(i,j)%a = 1.0d0
carrays(i,j)%a = (1.0d0,0.0d0)
end do
end do

! No message at all!
sum1 = sum(carrays(1,2)%a * darrays(2,2)%a)
end program test

Incidentally, even an example like:

type(darray) :: da
type(carray) :: ca
! ...
sum1 = sum(da%a * ca%a)

doesn't vectorize.

As for my compile options, I'm using -O3 -xT -fp-model fast (the latter at your suggestion).

Thanks,
Chris

TimP · ‎07-28-2008

OK, here are my comments:

The inner loops of length 10 may not be long enough to benefit from vectorization, given that the compiler may not be able to eliminate all the trappings of the general case. The report of vector dependence doesn't make sense to me either.

The final loop is optimized away, as it is clear the result is unused. I suppose you could argue that you would like the option to issue a diagnostic when this happens, but people who write benchmarks like this to see if the compiler can skip redundant operations generally don't ask for such messages. By fairly trivial changes in your example, that line can be made to compile, and give a message "not vectorized: unsupported data type," which seems bogus, as it does generate vectorized code, in spite of the non-unitary stride. The effect of your dot_product would be expressed more clearly, if you would select the real part of the complex array explicitly.