Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
29281 Discussions

Index offset leads to failure of vectorizing simple loops

Wentao_Z_
Beginner
749 Views

Hi,

I have a quick question (maybe a compiler bug?) regarding vectorization. I am playing with the following code piece:

Declaration of variables:

1415     real(kind=8), allocatable :: F(:), dF(:)
1416     real(kind=8) :: value

Loop that is being vecotirzed:

1486         do k = ks, ke
1487           do ii = iis, iie
1488             value = vals(ii)
1489             do j = js, je
1490               ind_offset = ( (k-1)*N2 + (j-1) ) * N1g
1491               ioffset = ii + ind_offset
1492               do i = is, ie
1493                 dF(i + ind_offset) = dF(i + ind_offset) + value * F(i + ioffset)
1494               end do
1495             end do
1496           end do
1497         end do

Vectorization report for the inner loop (Line 1492 to Line 1494):

src/ModDeriv.f90(1492): (col. 15) remark: loop was not vectorized: existence of vector dependence.
src/ModDeriv.f90(1493): (col. 17) remark: vector dependence: assumed ANTI dependence between  line 1493 and  line 1493.
src/ModDeriv.f90(1493): (col. 17) remark: vector dependence: assumed FLOW dependence between  line 1493 and  line 1493.
src/ModDeriv.f90(1493): (col. 17) remark: vector dependence: assumed FLOW dependence between  line 1493 and  line 1493.
src/ModDeriv.f90(1493): (col. 17) remark: vector dependence: assumed ANTI dependence between  line 1493 and  line 1493.

 

I compiled the code using ifort 13.1.0 with -O2 -vec-report6. From the vectorization report, the compiler could not vectorize the inner loop (Line 1492 to Line 1494). The compiler was failing to recognize that the inner loop is the same situation as 

dF( i ) = dF( i ) + v * F( i )

which should be no different than 

A( i ) = B( i )  + c * D( i )

all of which are vectorizable. Any suggestions?

Thanks for your time and help.

Best regards,
    Wentao

0 Kudos
1 Solution
jimdempseyatthecove
Honored Contributor III
749 Views

Not sure, probably an oversight by the compiler optimization gurus.

TimP mentioned in a different thread that the optimizer will at times re-order and/or collapse the loop nesting when it thinks the performance would be better. This is a good test case where it is not.

Jim Dempsey

View solution in original post

0 Kudos
4 Replies
jimdempseyatthecove
Honored Contributor III
749 Views

In front of your DO i loop insert:
!DEC$ SIMD

Jim Dempsey

0 Kudos
Wentao_Z_
Beginner
749 Views

jimdempseyatthecove wrote:

In front of your DO i loop insert:
!DEC$ SIMD

Jim Dempsey

Hi Jim,

Thanks for your reply. I could vectorize the loop by adding !DIR$ SIMD in front of the DO i loop. Actually I am just curious why the compiler is so conservative in face of such an easy loop:-)

Best regards,
   Wentao

0 Kudos
jimdempseyatthecove
Honored Contributor III
750 Views

Not sure, probably an oversight by the compiler optimization gurus.

TimP mentioned in a different thread that the optimizer will at times re-order and/or collapse the loop nesting when it thinks the performance would be better. This is a good test case where it is not.

Jim Dempsey

0 Kudos
Wentao_Z_
Beginner
749 Views

jimdempseyatthecove wrote:

Not sure, probably an oversight by the compiler optimization gurus.

TimP mentioned in a different thread that the optimizer will at times re-order and/or collapse the loop nesting when it thinks the performance would be better. This is a good test case where it is not.

Jim Dempsey

Hi Jim,

Thanks for your reply. I found if I only had one inner loop (not multi-level loop), the compiler could vectorize the code without !DIR$ SIMD:

 

1486         k = ks
1487           ii = iis
1488             value = vals(ii)
1489             j = js
1490               ind_offset = ( (k-1)*N2 + (j-1) ) * N1g
1491               ioffset = ii + ind_offset
1492               do i = is, ie
1493                 dF(i + ind_offset) = dF(i + ind_offset) + value * F(i + ioffset)
1494               end do

 

So it should be the loop nesting that led to this issue. Thanks!

Best regards,
    Wentao

0 Kudos
Reply