Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
26745 Discussions

Automatic vectorization of a simple loop nest

Marcin_K_
Beginner
92 Views

Hi, everyone.

I have a problem using automatic vectorization in ifort (Version 17.0.4.196 Build 20170411, comes with Parallel Studio XE release 2017.4). I have this simple module with two functions:

  • test1 gets an inout 2D matrix, assumes it is 64-byte aligned, and doubles the entries
  • test2 gets an in 2D matrix, assumes it is 64-byte aligned, allocates a 64-byte aligned second matrix, and copies the entries from the argument to the newly allocated array

The source (attached) uses assume_aligned and align directives. AFAIK, this should in both cases result in aligned memory access. This is the report I get for test1:

      remark #15388: vectorization support: reference fsm(i,j) has aligned access   [ loops.f90(20,11) ]
      remark #15388: vectorization support: reference fsm(i,j) has aligned access   [ loops.f90(20,22) ]
      remark #15305: vectorization support: vector length 8
      remark #15399: vectorization support: unroll factor set to 2
      remark #15309: vectorization support: normalized vectorization overhead 1.000
      remark #15300: LOOP WAS VECTORIZED
      remark #15442: entire loop may be executed in remainder
      remark #15448: unmasked aligned unit stride loads: 1 
      remark #15449: unmasked aligned unit stride stores: 1 
      remark #15475: --- begin vector cost summary ---
      remark #15476: scalar cost: 6 
      remark #15477: vector cost: 0.620 
      remark #15478: estimated potential speedup: 6.870 
      remark #15488: --- end vector cost summary ---

And this is the report for test2:

      remark #15388: vectorization support: reference F(i,j) has aligned access   [ loops.f90(39,11) ]
      remark #15389: vectorization support: reference fsm(i,j) has unaligned access   [ loops.f90(39,11) ]
      remark #15381: vectorization support: unaligned access used inside loop body
      remark #15305: vectorization support: vector length 8
      remark #15309: vectorization support: normalized vectorization overhead 1.444
      remark #15300: LOOP WAS VECTORIZED
      remark #15442: entire loop may be executed in remainder
      remark #15449: unmasked aligned unit stride stores: 1 
      remark #15450: unmasked unaligned unit stride loads: 1 
      remark #15475: --- begin vector cost summary ---
      remark #15476: scalar cost: 4 
      remark #15477: vector cost: 1.120 
      remark #15478: estimated potential speedup: 3.110 
      remark #15488: --- end vector cost summary ---

In the second loop the argument matrix reports unaligned access, which I think is wrong. Moreover, the report is the same whether or not I include the assume_aligned and align directives - they don't seem to have any impact. What am I missing here?

FYI, I compile with

ifort -O3 -mavx -fopenmp -qopt-report-phase=vec,loop -qopt-report=5 -qopt-streaming-stores never -mcmodel=medium -c loops.f90

And the code follows:

module loops

  public :: test1, test2

contains

  subroutine test1(fsm, im, jm)
    implicit none
    real, dimension(:,:), intent(inout) :: fsm
    integer, intent(in) :: im, jm
    integer i, j

    !dir$ assume_aligned fsm:64
    do j = 1,jm
       do i = 1,im
          fsm(i,j) = fsm(i,j)*2
       end do
    end do
  end subroutine test1

  subroutine test2(fsm, im, jm)
    implicit none
    real, dimension(:,:), intent(in) :: fsm
    integer, intent(in) :: im, jm
    integer i, j
    real, dimension(:,:), allocatable :: f

    !dir$ assume_aligned fsm:64
    !dir$ attributes align: 64:: f
    allocate(f(im, jm))
    
    do j = 1,jm
       do i = 1,im
          f(i,j) = fsm(i,j)
       end do
    end do
  end subroutine test2
  
end module loops

Could anyone shed some light on this?

Thanks a lot!

Marcin

0 Kudos
0 Replies
Reply