<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Vectorisation issues with allocatable array in Intel® Fortran Compiler</title>
    <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Vectorisation-issues-with-allocatable-array/m-p/1273804#M155544</link>
    <description>&lt;P&gt;I have the following kernel (all arrays are integers):&lt;/P&gt;
&lt;P&gt;where(old(1:M,1:N) /= 0) &amp;amp;&lt;/P&gt;
&lt;P&gt;new(1:M,1:N) = max(old(1:M,1:N), old(0:M-1,1:N), &amp;amp;&lt;BR /&gt;old(2:M+1,1:N), &amp;amp;&lt;BR /&gt;old(1:M,0:N-1), &amp;amp;&lt;BR /&gt;old(1:M,2:N+1) )&lt;/P&gt;
&lt;P&gt;and it is about 3 times slower if I use allocatable arrays rather than just declaring statically (all with dimensions fixed at compile time).&lt;BR /&gt;&lt;BR /&gt;I have an equivalent loop-based C version which also shows the same effect - 3 times slower with malloc'd arrays vs static arrays. However, in C there is a genuine potential pointer-aliasing issue between new and old and this can be fixed with an "ivdep" on the inner loop. In Fortran there is surely no potential aliasing issue even with allocatables so why is the compiler not vectorising? Can I apply "ivdep" to array syntax expressions like the above?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 15 Apr 2021 10:54:22 GMT</pubDate>
    <dc:creator>dhenty</dc:creator>
    <dc:date>2021-04-15T10:54:22Z</dc:date>
    <item>
      <title>Vectorisation issues with allocatable array</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Vectorisation-issues-with-allocatable-array/m-p/1273804#M155544</link>
      <description>&lt;P&gt;I have the following kernel (all arrays are integers):&lt;/P&gt;
&lt;P&gt;where(old(1:M,1:N) /= 0) &amp;amp;&lt;/P&gt;
&lt;P&gt;new(1:M,1:N) = max(old(1:M,1:N), old(0:M-1,1:N), &amp;amp;&lt;BR /&gt;old(2:M+1,1:N), &amp;amp;&lt;BR /&gt;old(1:M,0:N-1), &amp;amp;&lt;BR /&gt;old(1:M,2:N+1) )&lt;/P&gt;
&lt;P&gt;and it is about 3 times slower if I use allocatable arrays rather than just declaring statically (all with dimensions fixed at compile time).&lt;BR /&gt;&lt;BR /&gt;I have an equivalent loop-based C version which also shows the same effect - 3 times slower with malloc'd arrays vs static arrays. However, in C there is a genuine potential pointer-aliasing issue between new and old and this can be fixed with an "ivdep" on the inner loop. In Fortran there is surely no potential aliasing issue even with allocatables so why is the compiler not vectorising? Can I apply "ivdep" to array syntax expressions like the above?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 15 Apr 2021 10:54:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Vectorisation-issues-with-allocatable-array/m-p/1273804#M155544</guid>
      <dc:creator>dhenty</dc:creator>
      <dc:date>2021-04-15T10:54:22Z</dc:date>
    </item>
    <item>
      <title>Re: Vectorisation issues with allocatable array</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Vectorisation-issues-with-allocatable-array/m-p/1273852#M155546</link>
      <description>&lt;P&gt;Please provide a small but compilable test case. I would be interested to see what the optimization report has to say about it. The use of WHERE may also be an issue.&lt;/P&gt;</description>
      <pubDate>Thu, 15 Apr 2021 13:57:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Vectorisation-issues-with-allocatable-array/m-p/1273852#M155546</guid>
      <dc:creator>Steve_Lionel</dc:creator>
      <dc:date>2021-04-15T13:57:35Z</dc:date>
    </item>
    <item>
      <title>Re: Vectorisation issues with allocatable array</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Vectorisation-issues-with-allocatable-array/m-p/1274140#M155555</link>
      <description>&lt;P&gt;With the appended code I get about 1.3 seconds with static arrays and 2.7 with allocatables:&lt;/P&gt;
&lt;P&gt;dsh@laptop$ ifort --version&lt;BR /&gt;ifort (IFORT) 2021.2.0 20210228&lt;BR /&gt;Copyright (C) 1985-2021 Intel Corporation. All rights reserved.&lt;/P&gt;
&lt;P&gt;dsh@laptop$ ifort -O3 -o wheretest wheretest.f90&amp;nbsp; # static&lt;BR /&gt;dsh@laptop$ time ./wheretest &lt;BR /&gt;new(1,1) = 575&lt;/P&gt;
&lt;P&gt;real 0m1.265s&lt;BR /&gt;user 0m1.254s&lt;BR /&gt;sys 0m0.008s&lt;BR /&gt;dsh@laptop$ ifort -O3 -o wheretest wheretest.f90&amp;nbsp; # allocatables&lt;BR /&gt;dsh@laptop$ time ./wheretest &lt;BR /&gt;new(1,1) = 575&lt;/P&gt;
&lt;P&gt;real 0m2.727s&lt;BR /&gt;user 0m2.722s&lt;BR /&gt;sys 0m0.005s&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;program wheretest

  implicit none

  integer, parameter :: M = 576, N = 576
  integer :: i

  integer, dimension(0:M+1,0:N+1) :: old, new

!  integer, dimension(:,:), allocatable :: old, new
!  allocate(old(0:M+1,0:N+1), new(0:M+1,0:N+1) )

  old(:,:) =  reshape( [ (mod(i,M), i=1,(M+2)*(N+2)) ], shape(old) )

  do i = 1, 4000

     where(old(1:M,1:N) /= 0) &amp;amp;

          new(1:M,1:N) = max(old(1:M,1:N), old(0:M-1,1:N), &amp;amp;
                                           old(2:M+1,1:N), &amp;amp;
                                           old(1:M,0:N-1), &amp;amp;
                                           old(1:M,2:N+1)    )

     old(1:M,1:N) = new(1:M,1:N)
     
  end do

  write(*,*) "new(1,1) = ", new(1,1)
  
end program wheretest&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Apr 2021 08:55:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Vectorisation-issues-with-allocatable-array/m-p/1274140#M155555</guid>
      <dc:creator>dhenty</dc:creator>
      <dc:date>2021-04-16T08:55:04Z</dc:date>
    </item>
    <item>
      <title>Re: Vectorisation issues with allocatable array</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Vectorisation-issues-with-allocatable-array/m-p/1274149#M155556</link>
      <description>&lt;P&gt;Are you timing the whole program? Is the time taken to allocate significant? Maybe a&amp;nbsp; timing around the work might be more interesting.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Apr 2021 09:02:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Vectorisation-issues-with-allocatable-array/m-p/1274149#M155556</guid>
      <dc:creator>andrew_4619</dc:creator>
      <dc:date>2021-04-16T09:02:24Z</dc:date>
    </item>
    <item>
      <title>Re: Vectorisation issues with allocatable array</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Vectorisation-issues-with-allocatable-array/m-p/1274150#M155557</link>
      <description>&lt;P&gt;Initialisation is insignificant compared to the 4000 iterations of the "do" loop - doubling the trip count to 8000 doubles the elapsed time.&lt;/P&gt;</description>
      <pubDate>Fri, 16 Apr 2021 09:04:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Vectorisation-issues-with-allocatable-array/m-p/1274150#M155557</guid>
      <dc:creator>dhenty</dc:creator>
      <dc:date>2021-04-16T09:04:19Z</dc:date>
    </item>
    <item>
      <title>Re: Vectorisation issues with allocatable array</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Vectorisation-issues-with-allocatable-array/m-p/1274554#M155577</link>
      <description>&lt;P&gt;Your program has a bug in it.&lt;/P&gt;
&lt;P&gt;Line 24 copies an undefined value of new from indices of old where old contained 0.0.&lt;/P&gt;
&lt;P&gt;I suggest you use:&lt;/P&gt;
&lt;LI-CODE lang="fortran"&gt;...
  do i = 1, 4000

     new(1:M,1:N) = max(old(1:M,1:N), old(0:M-1,1:N), &amp;amp;
                                           old(2:M+1,1:N), &amp;amp;
                                           old(1:M,0:N-1), &amp;amp;
                                           old(1:M,2:N+1)    )

     where(old(1:M,1:N) /= 0) old(1:M,1:N) = new(1:M,1:N)
     
  end do
...&lt;/LI-CODE&gt;
&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Mon, 19 Apr 2021 01:54:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Vectorisation-issues-with-allocatable-array/m-p/1274554#M155577</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2021-04-19T01:54:55Z</dc:date>
    </item>
    <item>
      <title>Re: Vectorisation issues with allocatable array</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Vectorisation-issues-with-allocatable-array/m-p/1274640#M155580</link>
      <description>&lt;P&gt;When I hastily ripped this kernel from the main program I forgot the initialisation of new which should be set to zero outside of the main loop. However, this doesn't significantly affect the result where the loop is almost twice as fast for static arrays vs allocatables.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 19 Apr 2021 08:16:50 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Vectorisation-issues-with-allocatable-array/m-p/1274640#M155580</guid>
      <dc:creator>dhenty</dc:creator>
      <dc:date>2021-04-19T08:16:50Z</dc:date>
    </item>
    <item>
      <title>Re: Vectorisation issues with allocatable array</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Vectorisation-issues-with-allocatable-array/m-p/1274754#M155584</link>
      <description>&lt;P&gt;If you could explain what you are trying to achieve - there are reasons for the alternatives, but the best choice depends on the other things?&lt;/P&gt;</description>
      <pubDate>Mon, 19 Apr 2021 16:03:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Vectorisation-issues-with-allocatable-array/m-p/1274754#M155584</guid>
      <dc:creator>JohnNichols</dc:creator>
      <dc:date>2021-04-19T16:03:33Z</dc:date>
    </item>
    <item>
      <title>Re: Vectorisation issues with allocatable array</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Vectorisation-issues-with-allocatable-array/m-p/1274999#M155599</link>
      <description>&lt;P&gt;My question is: &lt;EM&gt;why does identical code run twice as fast with static arrays vs allocatables&lt;/EM&gt;. What the code does isn't really that relevant - it's just representative of simple stencil operations. It appears to be due to vectorisation because, in an equivalent C-code, adding #pragma ivdep fixes the issue for malloc'd arrays.&lt;/P&gt;</description>
      <pubDate>Tue, 20 Apr 2021 08:03:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Vectorisation-issues-with-allocatable-array/m-p/1274999#M155599</guid>
      <dc:creator>dhenty</dc:creator>
      <dc:date>2021-04-20T08:03:31Z</dc:date>
    </item>
    <item>
      <title>Re: Vectorisation issues with allocatable array</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Vectorisation-issues-with-allocatable-array/m-p/1275214#M155607</link>
      <description>&lt;P&gt;Did you look at the optimization reports?&amp;nbsp; The static version was vectorized.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 20 Apr 2021 23:22:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Vectorisation-issues-with-allocatable-array/m-p/1275214#M155607</guid>
      <dc:creator>Barbara_P_Intel</dc:creator>
      <dc:date>2021-04-20T23:22:55Z</dc:date>
    </item>
    <item>
      <title>Re: Vectorisation issues with allocatable array</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Vectorisation-issues-with-allocatable-array/m-p/1275770#M155647</link>
      <description>&lt;P&gt;The report confirms that the static version is being vectorised:&lt;/P&gt;
&lt;PRE&gt;&lt;FONT face="courier new,courier"&gt;LOOP BEGIN at wheretest.f90(20,11)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;lt;Peeled loop for vectorization&amp;gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;LOOP END&lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;LOOP BEGIN at wheretest.f90(20,11)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;remark #15300: LOOP WAS VECTORIZED&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;LOOP END&lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;LOOP BEGIN at wheretest.f90(20,11)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;lt;Remainder loop for vectorization&amp;gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;LOOP END&lt;/FONT&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;but with allocatables it isn't:&lt;/P&gt;
&lt;PRE&gt;&lt;FONT face="courier new,courier"&gt;LOOP BEGIN at wheretest.f90(20,11)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;remark #25460: No loop optimizations reported&lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;LOOP BEGIN at wheretest.f90(20,11)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;remark #25460: No loop optimizations reported&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;LOOP END&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;LOOP END&lt;/FONT&gt;&lt;/PRE&gt;
&lt;P&gt;but I'd still like to understand why, and whether there is a directive I could use here to force vectorisation as I was able to do using &lt;FONT face="courier new,courier"&gt;#pragma ivdep&lt;/FONT&gt; in the C version.&lt;/P&gt;</description>
      <pubDate>Thu, 22 Apr 2021 09:44:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Vectorisation-issues-with-allocatable-array/m-p/1275770#M155647</guid>
      <dc:creator>dhenty</dc:creator>
      <dc:date>2021-04-22T09:44:29Z</dc:date>
    </item>
    <item>
      <title>Re: Vectorisation issues with allocatable array</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Vectorisation-issues-with-allocatable-array/m-p/1275842#M155657</link>
      <description>&lt;P&gt;Intel Fortran supports:&lt;/P&gt;
&lt;P&gt;!DIR$ IVDEP&lt;/P&gt;
&lt;P&gt;See&amp;nbsp;&lt;A href="https://software.intel.com/content/www/us/en/develop/documentation/fortran-compiler-oneapi-dev-guide-and-reference/top/language-reference/a-to-z-reference/h-to-i/ivdep.html" target="_blank" rel="noopener"&gt;IVDEP (intel.com)&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;This doesn't "force" vectorization, and even the name is somewhat misleading. There are other directives you can specify that will help the compiler vectorize (&lt;A href="https://software.intel.com/content/www/us/en/develop/documentation/fortran-compiler-oneapi-dev-guide-and-reference/top/language-reference/directive-enhanced-compilation/general-compiler-directives/rules-for-general-directives-that-affect-do-loops.html#rules-for-general-directives-that-affect-do-loops" target="_blank" rel="noopener"&gt;Rules for General Directives that Affect DO Loops (intel.com)&lt;/A&gt;) In particular, look at&amp;nbsp;&lt;A href="https://software.intel.com/content/www/us/en/develop/documentation/fortran-compiler-oneapi-dev-guide-and-reference/top/language-reference/a-to-z-reference/t-to-z/vector-and-novector.html#vector-and-novector" target="_blank"&gt;VECTOR and NOVECTOR (intel.com)&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 22 Apr 2021 13:53:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Vectorisation-issues-with-allocatable-array/m-p/1275842#M155657</guid>
      <dc:creator>Steve_Lionel</dc:creator>
      <dc:date>2021-04-22T13:53:53Z</dc:date>
    </item>
  </channel>
</rss>

