<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic unaligned data access for -xHost option on Intel-AVX platform in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/unaligned-data-access-for-xHost-option-on-Intel-AVX-platform/m-p/1039856#M4502</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;I am currently developing a performance critical project and I am facing some behavior I don't understand. To explain my thoughts please consider the following module file&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;!-----------------------------------------------------------------------
module dcs_c7h16_reitz_mod
!-----------------------------------------------------------------------
implicit none
!-----------------------------------------------------------------------
private
!-----------------------------------------------------------------------
!-----------------------------------------------------------------------
integer, parameter :: dp = selected_real_kind(15,307)
!-----------------------------------------------------------------------
    real(kind=dp), dimension(:), allocatable :: temp
!DEC$ ATTRIBUTES ALIGN: 32 :: temp
    real(kind=dp), dimension(:), allocatable, target :: lt
!DEC$ ATTRIBUTES ALIGN: 32 :: lt
!
contains
!-----------------------------------------------------------------------
!-----------------------------------------------------------------------
subroutine dcs_update_c7h16_reitz(ngridpoints,temperature)

    implicit none

    integer, intent(in) :: ngridpoints
    real(kind=dp)   , intent(in) :: temperature(ngridpoints)
    integer :: i
	!DIR$ ASSUME_ALIGNED temperature: 32

	temp = temperature

    lt = log(temperature)

end subroutine dcs_update_c7h16_reitz

end module dcs_c7h16_reitz_mod&lt;/PRE&gt;

&lt;P&gt;When I compile this module on my machine (Intel(R) Core(TM) i7-3770, with AVX) using&lt;/P&gt;

&lt;PRE class="brush:plain;"&gt;ifort -c -vec-report6 -xHost -align array32byte test.f90&lt;/PRE&gt;

&lt;P&gt;I got the following vectorization report which surprises me&lt;/P&gt;

&lt;PRE class="brush:plain;"&gt;test.f90(28): (col. 2) remark: vectorization support: reference dcs_c7h16_reitz_mod_mp_temp_ has aligned access
test.f90(28): (col. 2) remark: vectorization support: reference temperature has unaligned access
test.f90(28): (col. 2) remark: vectorization support: unaligned access used inside loop body
test.f90(28): (col. 2) remark: LOOP WAS VECTORIZED
test.f90(28): (col. 2) remark: loop was not vectorized: not inner loop
test.f90(30): (col. 10) remark: vectorization support: reference temperature has unaligned access
test.f90(30): (col. 5) remark: vectorization support: reference dcs_c7h16_reitz_mod_mp_lt_ has aligned access
test.f90(30): (col. 5) remark: vectorization support: unaligned access used inside loop body
test.f90(30): (col. 5) remark: LOOP WAS VECTORIZED&lt;/PRE&gt;

&lt;P&gt;What confuses me is that there is an unaligned access to temperature in both lines referencing it. After reading the web pages&amp;nbsp;&amp;nbsp;"Data Alignment to Assist Vectorization" and "Fortran Array Data and Arguments and Vectorization" , which are very good by the way,&amp;nbsp;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;I played around with the "assumed_alligned" directive and the -align compiler option. However, the problem still persists.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;After quite some time I found out that when I am not using the -xHost option the compiler seems to produced aligned data access&lt;/SPAN&gt;&lt;/P&gt;

&lt;PRE class="brush:plain;"&gt;ifort -c -vec-report6 -align array32byte test.f90&lt;/PRE&gt;

&lt;PRE class="brush:plain;"&gt;test.f90(28): (col. 2) remark: vectorization support: reference dcs_c7h16_reitz_mod_mp_temp_ has aligned access
test.f90(28): (col. 2) remark: vectorization support: reference temperature has aligned access
test.f90(28): (col. 2) remark: LOOP WAS VECTORIZED
test.f90(28): (col. 2) remark: loop was not vectorized: not inner loop
test.f90(30): (col. 10) remark: vectorization support: reference temperature has aligned access
test.f90(30): (col. 5) remark: vectorization support: reference dcs_c7h16_reitz_mod_mp_lt_ has aligned access
test.f90(30): (col. 5) remark: LOOP WAS VECTORIZED&lt;/PRE&gt;

&lt;P&gt;I should mention that the problem of "no aligned access" also occurs when I am using the -xAVX option. My question is now what is possibly wrong with the above code that the compiler assumes unaligned access when compiling the code on an Intel-AVX platform. I would also be very thankful if somebody has a reference for further reading on that topic since the code project I am working on is really performance critical.&lt;/P&gt;

&lt;P&gt;Thanks a lot in advance&lt;/P&gt;

&lt;P&gt;Felix&lt;/P&gt;</description>
    <pubDate>Fri, 13 Jun 2014 12:41:25 GMT</pubDate>
    <dc:creator>felixdietzsch</dc:creator>
    <dc:date>2014-06-13T12:41:25Z</dc:date>
    <item>
      <title>unaligned data access for -xHost option on Intel-AVX platform</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/unaligned-data-access-for-xHost-option-on-Intel-AVX-platform/m-p/1039856#M4502</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;I am currently developing a performance critical project and I am facing some behavior I don't understand. To explain my thoughts please consider the following module file&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;!-----------------------------------------------------------------------
module dcs_c7h16_reitz_mod
!-----------------------------------------------------------------------
implicit none
!-----------------------------------------------------------------------
private
!-----------------------------------------------------------------------
!-----------------------------------------------------------------------
integer, parameter :: dp = selected_real_kind(15,307)
!-----------------------------------------------------------------------
    real(kind=dp), dimension(:), allocatable :: temp
!DEC$ ATTRIBUTES ALIGN: 32 :: temp
    real(kind=dp), dimension(:), allocatable, target :: lt
!DEC$ ATTRIBUTES ALIGN: 32 :: lt
!
contains
!-----------------------------------------------------------------------
!-----------------------------------------------------------------------
subroutine dcs_update_c7h16_reitz(ngridpoints,temperature)

    implicit none

    integer, intent(in) :: ngridpoints
    real(kind=dp)   , intent(in) :: temperature(ngridpoints)
    integer :: i
	!DIR$ ASSUME_ALIGNED temperature: 32

	temp = temperature

    lt = log(temperature)

end subroutine dcs_update_c7h16_reitz

end module dcs_c7h16_reitz_mod&lt;/PRE&gt;

&lt;P&gt;When I compile this module on my machine (Intel(R) Core(TM) i7-3770, with AVX) using&lt;/P&gt;

&lt;PRE class="brush:plain;"&gt;ifort -c -vec-report6 -xHost -align array32byte test.f90&lt;/PRE&gt;

&lt;P&gt;I got the following vectorization report which surprises me&lt;/P&gt;

&lt;PRE class="brush:plain;"&gt;test.f90(28): (col. 2) remark: vectorization support: reference dcs_c7h16_reitz_mod_mp_temp_ has aligned access
test.f90(28): (col. 2) remark: vectorization support: reference temperature has unaligned access
test.f90(28): (col. 2) remark: vectorization support: unaligned access used inside loop body
test.f90(28): (col. 2) remark: LOOP WAS VECTORIZED
test.f90(28): (col. 2) remark: loop was not vectorized: not inner loop
test.f90(30): (col. 10) remark: vectorization support: reference temperature has unaligned access
test.f90(30): (col. 5) remark: vectorization support: reference dcs_c7h16_reitz_mod_mp_lt_ has aligned access
test.f90(30): (col. 5) remark: vectorization support: unaligned access used inside loop body
test.f90(30): (col. 5) remark: LOOP WAS VECTORIZED&lt;/PRE&gt;

&lt;P&gt;What confuses me is that there is an unaligned access to temperature in both lines referencing it. After reading the web pages&amp;nbsp;&amp;nbsp;"Data Alignment to Assist Vectorization" and "Fortran Array Data and Arguments and Vectorization" , which are very good by the way,&amp;nbsp;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;I played around with the "assumed_alligned" directive and the -align compiler option. However, the problem still persists.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;After quite some time I found out that when I am not using the -xHost option the compiler seems to produced aligned data access&lt;/SPAN&gt;&lt;/P&gt;

&lt;PRE class="brush:plain;"&gt;ifort -c -vec-report6 -align array32byte test.f90&lt;/PRE&gt;

&lt;PRE class="brush:plain;"&gt;test.f90(28): (col. 2) remark: vectorization support: reference dcs_c7h16_reitz_mod_mp_temp_ has aligned access
test.f90(28): (col. 2) remark: vectorization support: reference temperature has aligned access
test.f90(28): (col. 2) remark: LOOP WAS VECTORIZED
test.f90(28): (col. 2) remark: loop was not vectorized: not inner loop
test.f90(30): (col. 10) remark: vectorization support: reference temperature has aligned access
test.f90(30): (col. 5) remark: vectorization support: reference dcs_c7h16_reitz_mod_mp_lt_ has aligned access
test.f90(30): (col. 5) remark: LOOP WAS VECTORIZED&lt;/PRE&gt;

&lt;P&gt;I should mention that the problem of "no aligned access" also occurs when I am using the -xAVX option. My question is now what is possibly wrong with the above code that the compiler assumes unaligned access when compiling the code on an Intel-AVX platform. I would also be very thankful if somebody has a reference for further reading on that topic since the code project I am working on is really performance critical.&lt;/P&gt;

&lt;P&gt;Thanks a lot in advance&lt;/P&gt;

&lt;P&gt;Felix&lt;/P&gt;</description>
      <pubDate>Fri, 13 Jun 2014 12:41:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/unaligned-data-access-for-xHost-option-on-Intel-AVX-platform/m-p/1039856#M4502</guid>
      <dc:creator>felixdietzsch</dc:creator>
      <dc:date>2014-06-13T12:41:25Z</dc:date>
    </item>
    <item>
      <title>I've occasionally run into</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/unaligned-data-access-for-xHost-option-on-Intel-AVX-platform/m-p/1039857#M4503</link>
      <description>&lt;P&gt;I've occasionally run into cases where the compiler didn't take advantage of asserted 32-byte alignment.&amp;nbsp; This may or may not have a significant impact.&amp;nbsp; Without -xHost only the default 16-byte alignment would affect code generation, even though the 32-byte alignment may still prove beneficial.&lt;/P&gt;</description>
      <pubDate>Fri, 13 Jun 2014 13:10:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/unaligned-data-access-for-xHost-option-on-Intel-AVX-platform/m-p/1039857#M4503</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2014-06-13T13:10:58Z</dc:date>
    </item>
  </channel>
</rss>

