Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

unaligned data access for -xHost option on Intel-AVX platform

felixdietzsch
Beginner
171 Views

Hi,

I am currently developing a performance critical project and I am facing some behavior I don't understand. To explain my thoughts please consider the following module file

!-----------------------------------------------------------------------
module dcs_c7h16_reitz_mod
!-----------------------------------------------------------------------
implicit none
!-----------------------------------------------------------------------
private
!-----------------------------------------------------------------------
!-----------------------------------------------------------------------
integer, parameter :: dp = selected_real_kind(15,307)
!-----------------------------------------------------------------------
    real(kind=dp), dimension(:), allocatable :: temp
!DEC$ ATTRIBUTES ALIGN: 32 :: temp
    real(kind=dp), dimension(:), allocatable, target :: lt
!DEC$ ATTRIBUTES ALIGN: 32 :: lt
!
contains
!-----------------------------------------------------------------------
!-----------------------------------------------------------------------
subroutine dcs_update_c7h16_reitz(ngridpoints,temperature)

    implicit none

    integer, intent(in) :: ngridpoints
    real(kind=dp)   , intent(in) :: temperature(ngridpoints)
    integer :: i
	!DIR$ ASSUME_ALIGNED temperature: 32

	temp = temperature

    lt = log(temperature)

end subroutine dcs_update_c7h16_reitz

end module dcs_c7h16_reitz_mod

When I compile this module on my machine (Intel(R) Core(TM) i7-3770, with AVX) using

ifort -c -vec-report6 -xHost -align array32byte test.f90

I got the following vectorization report which surprises me

test.f90(28): (col. 2) remark: vectorization support: reference dcs_c7h16_reitz_mod_mp_temp_ has aligned access
test.f90(28): (col. 2) remark: vectorization support: reference temperature has unaligned access
test.f90(28): (col. 2) remark: vectorization support: unaligned access used inside loop body
test.f90(28): (col. 2) remark: LOOP WAS VECTORIZED
test.f90(28): (col. 2) remark: loop was not vectorized: not inner loop
test.f90(30): (col. 10) remark: vectorization support: reference temperature has unaligned access
test.f90(30): (col. 5) remark: vectorization support: reference dcs_c7h16_reitz_mod_mp_lt_ has aligned access
test.f90(30): (col. 5) remark: vectorization support: unaligned access used inside loop body
test.f90(30): (col. 5) remark: LOOP WAS VECTORIZED

What confuses me is that there is an unaligned access to temperature in both lines referencing it. After reading the web pages  "Data Alignment to Assist Vectorization" and "Fortran Array Data and Arguments and Vectorization" , which are very good by the way, I played around with the "assumed_alligned" directive and the -align compiler option. However, the problem still persists.

After quite some time I found out that when I am not using the -xHost option the compiler seems to produced aligned data access

ifort -c -vec-report6 -align array32byte test.f90
test.f90(28): (col. 2) remark: vectorization support: reference dcs_c7h16_reitz_mod_mp_temp_ has aligned access
test.f90(28): (col. 2) remark: vectorization support: reference temperature has aligned access
test.f90(28): (col. 2) remark: LOOP WAS VECTORIZED
test.f90(28): (col. 2) remark: loop was not vectorized: not inner loop
test.f90(30): (col. 10) remark: vectorization support: reference temperature has aligned access
test.f90(30): (col. 5) remark: vectorization support: reference dcs_c7h16_reitz_mod_mp_lt_ has aligned access
test.f90(30): (col. 5) remark: LOOP WAS VECTORIZED

I should mention that the problem of "no aligned access" also occurs when I am using the -xAVX option. My question is now what is possibly wrong with the above code that the compiler assumes unaligned access when compiling the code on an Intel-AVX platform. I would also be very thankful if somebody has a reference for further reading on that topic since the code project I am working on is really performance critical.

Thanks a lot in advance

Felix

0 Kudos
1 Reply
TimP
Black Belt
171 Views

I've occasionally run into cases where the compiler didn't take advantage of asserted 32-byte alignment.  This may or may not have a significant impact.  Without -xHost only the default 16-byte alignment would affect code generation, even though the 32-byte alignment may still prove beneficial.

0 Kudos
Reply