Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28446 Discussions

Automatic vectorization of a simple loop nest


Hi, everyone.

I have a problem using automatic vectorization in ifort (Version Build 20170411, comes with Parallel Studio XE release 2017.4). I have this simple module with two functions:

  • test1 gets an inout 2D matrix, assumes it is 64-byte aligned, and doubles the entries
  • test2 gets an in 2D matrix, assumes it is 64-byte aligned, allocates a 64-byte aligned second matrix, and copies the entries from the argument to the newly allocated array

The source (attached) uses assume_aligned and align directives. AFAIK, this should in both cases result in aligned memory access. This is the report I get for test1:

      remark #15388: vectorization support: reference fsm(i,j) has aligned access   [ loops.f90(20,11) ]
      remark #15388: vectorization support: reference fsm(i,j) has aligned access   [ loops.f90(20,22) ]
      remark #15305: vectorization support: vector length 8
      remark #15399: vectorization support: unroll factor set to 2
      remark #15309: vectorization support: normalized vectorization overhead 1.000
      remark #15300: LOOP WAS VECTORIZED
      remark #15442: entire loop may be executed in remainder
      remark #15448: unmasked aligned unit stride loads: 1 
      remark #15449: unmasked aligned unit stride stores: 1 
      remark #15475: --- begin vector cost summary ---
      remark #15476: scalar cost: 6 
      remark #15477: vector cost: 0.620 
      remark #15478: estimated potential speedup: 6.870 
      remark #15488: --- end vector cost summary ---

And this is the report for test2:

      remark #15388: vectorization support: reference F(i,j) has aligned access   [ loops.f90(39,11) ]
      remark #15389: vectorization support: reference fsm(i,j) has unaligned access   [ loops.f90(39,11) ]
      remark #15381: vectorization support: unaligned access used inside loop body
      remark #15305: vectorization support: vector length 8
      remark #15309: vectorization support: normalized vectorization overhead 1.444
      remark #15300: LOOP WAS VECTORIZED
      remark #15442: entire loop may be executed in remainder
      remark #15449: unmasked aligned unit stride stores: 1 
      remark #15450: unmasked unaligned unit stride loads: 1 
      remark #15475: --- begin vector cost summary ---
      remark #15476: scalar cost: 4 
      remark #15477: vector cost: 1.120 
      remark #15478: estimated potential speedup: 3.110 
      remark #15488: --- end vector cost summary ---

In the second loop the argument matrix reports unaligned access, which I think is wrong. Moreover, the report is the same whether or not I include the assume_aligned and align directives - they don't seem to have any impact. What am I missing here?

FYI, I compile with

ifort -O3 -mavx -fopenmp -qopt-report-phase=vec,loop -qopt-report=5 -qopt-streaming-stores never -mcmodel=medium -c loops.f90

And the code follows:

module loops

  public :: test1, test2


  subroutine test1(fsm, im, jm)
    implicit none
    real, dimension(:,:), intent(inout) :: fsm
    integer, intent(in) :: im, jm
    integer i, j

    !dir$ assume_aligned fsm:64
    do j = 1,jm
       do i = 1,im
          fsm(i,j) = fsm(i,j)*2
       end do
    end do
  end subroutine test1

  subroutine test2(fsm, im, jm)
    implicit none
    real, dimension(:,:), intent(in) :: fsm
    integer, intent(in) :: im, jm
    integer i, j
    real, dimension(:,:), allocatable :: f

    !dir$ assume_aligned fsm:64
    !dir$ attributes align: 64:: f
    allocate(f(im, jm))
    do j = 1,jm
       do i = 1,im
          f(i,j) = fsm(i,j)
       end do
    end do
  end subroutine test2
end module loops

Could anyone shed some light on this?

Thanks a lot!


0 Kudos
0 Replies