- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, everyone.
I have a problem using automatic vectorization in ifort (Version 17.0.4.196 Build 20170411, comes with Parallel Studio XE release 2017.4). I have this simple module with two functions:
- test1 gets an inout 2D matrix, assumes it is 64-byte aligned, and doubles the entries
- test2 gets an in 2D matrix, assumes it is 64-byte aligned, allocates a 64-byte aligned second matrix, and copies the entries from the argument to the newly allocated array
The source (attached) uses assume_aligned and align directives. AFAIK, this should in both cases result in aligned memory access. This is the report I get for test1:
remark #15388: vectorization support: reference fsm(i,j) has aligned access [ loops.f90(20,11) ]
remark #15388: vectorization support: reference fsm(i,j) has aligned access [ loops.f90(20,22) ]
remark #15305: vectorization support: vector length 8
remark #15399: vectorization support: unroll factor set to 2
remark #15309: vectorization support: normalized vectorization overhead 1.000
remark #15300: LOOP WAS VECTORIZED
remark #15442: entire loop may be executed in remainder
remark #15448: unmasked aligned unit stride loads: 1
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 6
remark #15477: vector cost: 0.620
remark #15478: estimated potential speedup: 6.870
remark #15488: --- end vector cost summary ---
And this is the report for test2:
remark #15388: vectorization support: reference F(i,j) has aligned access [ loops.f90(39,11) ]
remark #15389: vectorization support: reference fsm(i,j) has unaligned access [ loops.f90(39,11) ]
remark #15381: vectorization support: unaligned access used inside loop body
remark #15305: vectorization support: vector length 8
remark #15309: vectorization support: normalized vectorization overhead 1.444
remark #15300: LOOP WAS VECTORIZED
remark #15442: entire loop may be executed in remainder
remark #15449: unmasked aligned unit stride stores: 1
remark #15450: unmasked unaligned unit stride loads: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 4
remark #15477: vector cost: 1.120
remark #15478: estimated potential speedup: 3.110
remark #15488: --- end vector cost summary ---
In the second loop the argument matrix reports unaligned access, which I think is wrong. Moreover, the report is the same whether or not I include the assume_aligned and align directives - they don't seem to have any impact. What am I missing here?
FYI, I compile with
ifort -O3 -mavx -fopenmp -qopt-report-phase=vec,loop -qopt-report=5 -qopt-streaming-stores never -mcmodel=medium -c loops.f90
And the code follows:
module loops
public :: test1, test2
contains
subroutine test1(fsm, im, jm)
implicit none
real, dimension(:,:), intent(inout) :: fsm
integer, intent(in) :: im, jm
integer i, j
!dir$ assume_aligned fsm:64
do j = 1,jm
do i = 1,im
fsm(i,j) = fsm(i,j)*2
end do
end do
end subroutine test1
subroutine test2(fsm, im, jm)
implicit none
real, dimension(:,:), intent(in) :: fsm
integer, intent(in) :: im, jm
integer i, j
real, dimension(:,:), allocatable :: f
!dir$ assume_aligned fsm:64
!dir$ attributes align: 64:: f
allocate(f(im, jm))
do j = 1,jm
do i = 1,im
f(i,j) = fsm(i,j)
end do
end do
end subroutine test2
end module loops
Could anyone shed some light on this?
Thanks a lot!
Marcin
Link Copied
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page