Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Loop vectorization / Complicated array access

NsK
Novice
441 Views

Hi,
I have some issues with loop vectorization:
[fortran]
module global
    implicit none
  
    type type_A
        real(kind=4), allocatable, dimension(:) :: val
    end type type_A
 
end module global

program test
    use global
    implicit none
    type(type_A), target, allocatable, dimension(:) :: A
    type(type_A), target, allocatable               :: AA
    real(kind=4), pointer, dimension(:) :: ptr
    integer :: i
  
    !---
    allocate(AA)
    allocate(AA%val(10000))
    AA%val = 1.0
    
    ptr => AA%val
 
    do i = 1, 100
        !
        ptr(i) = exp(-ptr(i) + 1.0)
        !
    end do
    !---
    
    !---
    allocate(A(1))
    allocate(A(1)%val(10000))
    A(1)%val = 1.0
    
    ptr => A(1)%val
  
    do i = 1, 100
        !
        ptr(i) = exp(-ptr(i) + 1.0)
        !
    end do
    !---    
    
    write(*,*) ptr(500)

end program test
[/fortran]

Compiled with Qvec-report3 it produces:

1>main.f90(20): (col. 5) remark: loop was not vectorized: unsupported loop structure.
1>main.f90(24): (col. 5) remark: LOOP WAS VECTORIZED.
1>main.f90(32): (col. 5) remark: loop was not vectorized: unsupported loop structure.
1>main.f90(34): (col. 5) remark: loop was not vectorized: unsupported loop structure.
1>main.f90(38): (col. 5) remark: loop was not vectorized: existence of vector dependence.
1>main.f90(40): (col. 9) remark: vector dependence: assumed FLOW dependence between (unknown) line 40 and (unknown) line 40.
1>main.f90(40): (col. 18) remark: vector dependence: assumed ANTI dependence between (unknown) line 40 and (unknown) line 40.
1>main.f90(40): (col. 9) remark: vector dependence: assumed FLOW dependence between (unknown) line 40 and (unknown) line 40.
1>main.f90(40): (col. 18) remark: vector dependence: assumed ANTI dependence between (unknown) line 40 and (unknown) line 40.
1>main.f90(40): (col. 18) remark: vector dependence: assumed ANTI dependence between (unknown) line 40 and (unknown) line 40.
1>main.f90(40): (col. 9) remark: vector dependence: assumed FLOW dependence between (unknown) line 40 and (unknown) line 40.
1>main.f90(40): (col. 18) remark: vector dependence: assumed ANTI dependence between (unknown) line 40 and (unknown) line 40.
1>main.f90(40): (col. 9) remark: vector dependence: assumed FLOW dependence between (unknown) line 40 and (unknown) line 40.

The only way to get the second loop vectorized seems to add the !dir$ ivdep directive before.
From an old post of Steve(Mon, 02/06/2006 - 18:33):

Steve Lionel (Intel) wrote:

[...]
The compiler does not try to vectorize loops where the array access is complicated.[...]
It is a fact that arrays that are components of derived types, especially in conjuction with pointer or allocatable, complicate life for the compiler and as such some optimization opportunities may be missed.
[...]


My understanding is that my issue is related to the complicated array access. Is there a way to make it clear for the compiler without using the vectorization directive on each loop of the code?
Cheers,

Nick

0 Kudos
1 Solution
Steven_L_Intel1
Employee
441 Views

The compiler has changed a lot since 2006, and processors have changed to include new instructions that can help with vectorization. For example, I tried your code with the 14.0 compiler and got this:

C:\Projects\U480019.f90(20): (col. 5) remark: LOOP WAS VECTORIZED
C:\Projects\U480019.f90(24): (col. 5) remark: LOOP WAS VECTORIZED
C:\Projects\U480019.f90(34): (col. 5) remark: LOOP WAS VECTORIZED
C:\Projects\U480019.f90(38): (col. 5) remark: LOOP WAS VECTORIZED
C:\Projects\U480019.f90(32): (col. 5) remark: loop was not vectorized: nonstandard loop is not a vectorization candidate

Looks pretty good to me.

View solution in original post

0 Kudos
5 Replies
Steven_L_Intel1
Employee
442 Views

The compiler has changed a lot since 2006, and processors have changed to include new instructions that can help with vectorization. For example, I tried your code with the 14.0 compiler and got this:

C:\Projects\U480019.f90(20): (col. 5) remark: LOOP WAS VECTORIZED
C:\Projects\U480019.f90(24): (col. 5) remark: LOOP WAS VECTORIZED
C:\Projects\U480019.f90(34): (col. 5) remark: LOOP WAS VECTORIZED
C:\Projects\U480019.f90(38): (col. 5) remark: LOOP WAS VECTORIZED
C:\Projects\U480019.f90(32): (col. 5) remark: loop was not vectorized: nonstandard loop is not a vectorization candidate

Looks pretty good to me.

0 Kudos
NsK
Novice
441 Views

Indeed, I failed to realize that that many versions have been released since XE 2011 (12.1.3526.2010) (February 2012).
Bad luck me strikes again.

Nick

0 Kudos
John_Campbell
New Contributor II
441 Views

A lot has changed in Fortran !!
I don't understand the need for such a complex data structure. Either of the 4 effective loop structures in the following code vectorise, without resorting to the more complex data structures of the original post. I realy don't know what can be achieved by your coding approach.
My suggestion is KISS ... keep it simple..

[fortran]
module global
    implicit none
    real(kind=4), allocatable, dimension(:) :: A_val, AA_val
end module global

program test
    use global
    implicit none
    integer :: i
!---
    allocate (AA_val(10000))
    AA_val = 0.5
    AA_val(1:100) = exp(-AA_val(1:100) + 1.0)
    write(*,*) AA_val(100), AA_val(500)
!---
   allocate (A_val(10000))
    A_val = 1.0
    do i = 1, 100
       A_val(i) = exp(-A_val(i) + 1.0)
    end do
!---
    write(*,*) A_val(500)
end program test[/fortran]

0 Kudos
NsK
Novice
441 Views

Well,
Indeed I failed to realize that that many versions of the compiler had been released since my XE 2011 12.1.3526.2010 (February 2012), but this has nothing to do with the changes in Fortran.
The more complex data structure of the original post is there especially to simplify the coding, the pointer approach making the derived data type (and the number of objects, a runtime parameter) transparent to the developers and the algorithm.
Unfortunately, all codes are not equal in front of the prerequisites.

0 Kudos
jimdempseyatthecove
Honored Contributor III
441 Views

John,

NsK produced a small sample code that exhibited his issue. In his case he had an array of arrays. This type of structure can be used for sparse arrays among other things. Use of pointer can somtime cause optimization issues due to the possibility of alias and stride. If NsK's compiler is new enough to have ASSOCIATE, he might try that instead of pointer.

Jim Dempsey

0 Kudos
Reply