Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software Development Tools (Compilers, Debuggers, Profilers & Analyzers)
- Intel® Fortran Compiler
- Automatic vectorization of a simple loop nest

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Marcin_K_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-17-2017
06:01 AM

2 Views

Automatic vectorization of a simple loop nest

Hi, everyone.

I have a problem using automatic vectorization in ifort (Version 17.0.4.196 Build 20170411, comes with Parallel Studio XE release 2017.4). I have this simple module with two functions:

- test1 gets an inout 2D matrix, assumes it is 64-byte aligned, and doubles the entries
- test2 gets an in 2D matrix, assumes it is 64-byte aligned, allocates a 64-byte aligned second matrix, and copies the entries from the argument to the newly allocated array

The source (attached) uses assume_aligned and align directives. AFAIK, this should in both cases result in aligned memory access. This is the report I get for test1:

remark #15388: vectorization support: reference fsm(i,j) has aligned access [ loops.f90(20,11) ] remark #15388: vectorization support: reference fsm(i,j) has aligned access [ loops.f90(20,22) ] remark #15305: vectorization support: vector length 8 remark #15399: vectorization support: unroll factor set to 2 remark #15309: vectorization support: normalized vectorization overhead 1.000 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15448: unmasked aligned unit stride loads: 1 remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 6 remark #15477: vector cost: 0.620 remark #15478: estimated potential speedup: 6.870 remark #15488: --- end vector cost summary ---

And this is the report for test2:

remark #15388: vectorization support: reference F(i,j) has aligned access [ loops.f90(39,11) ] remark #15389: vectorization support: reference fsm(i,j) has unaligned access [ loops.f90(39,11) ] remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 1.444 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15449: unmasked aligned unit stride stores: 1 remark #15450: unmasked unaligned unit stride loads: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 4 remark #15477: vector cost: 1.120 remark #15478: estimated potential speedup: 3.110 remark #15488: --- end vector cost summary ---

In the second loop the argument matrix reports unaligned access, which I think is wrong. Moreover, the report is the same whether or not I include the assume_aligned and align directives - they don't seem to have any impact. What am I missing here?

FYI, I compile with

ifort -O3 -mavx -fopenmp -qopt-report-phase=vec,loop -qopt-report=5 -qopt-streaming-stores never -mcmodel=medium -c loops.f90

And the code follows:

module loops public :: test1, test2 contains subroutine test1(fsm, im, jm) implicit none real, dimension(:,:), intent(inout) :: fsm integer, intent(in) :: im, jm integer i, j !dir$ assume_aligned fsm:64 do j = 1,jm do i = 1,im fsm(i,j) = fsm(i,j)*2 end do end do end subroutine test1 subroutine test2(fsm, im, jm) implicit none real, dimension(:,:), intent(in) :: fsm integer, intent(in) :: im, jm integer i, j real, dimension(:,:), allocatable :: f !dir$ assume_aligned fsm:64 !dir$ attributes align: 64:: f allocate(f(im, jm)) do j = 1,jm do i = 1,im f(i,j) = fsm(i,j) end do end do end subroutine test2 end module loops

Could anyone shed some light on this?

Thanks a lot!

Marcin

For more complete information about compiler optimizations, see our Optimization Notice.