<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Thanks again, Tim. in Intel® Fortran Compiler</title>
    <link>https://community.intel.com/t5/Intel-Fortran-Compiler/is-it-worth-using-array-section-assignment/m-p/1034296#M111456</link>
    <description>&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Thanks again, Tim.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Yes, the strides in the first example were different but not for a special choice, actually. Now I see that perhaps that was not a happy choice, for what you said. A good chance to explore the optimizer behavior, though ;-)&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sun, 21 Jun 2015 22:35:04 GMT</pubDate>
    <dc:creator>e745200</dc:creator>
    <dc:date>2015-06-21T22:35:04Z</dc:date>
    <item>
      <title>is it worth using array section assignment ?</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/is-it-worth-using-array-section-assignment/m-p/1034292#M111452</link>
      <description>&lt;P&gt;Hi, all&lt;/P&gt;

&lt;P&gt;I'd like very much the capability of working with array sections in assignment such as&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;a(k:k+l) = b(j:j+l)&lt;/PRE&gt;

&lt;P&gt;After some tests, however, I am surprised that this notation leads to much less efficient code than the good old equivalent loops:&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;do i = 0,l
   a(k+i) = b(j+i)
end do

&lt;/PRE&gt;

&lt;P&gt;I guess that the former generates some temporary before the assignment, and this operation consumes time.&lt;/P&gt;

&lt;P&gt;Is there some option,&amp;nbsp; directive or trick that can be used&amp;nbsp;not to lose in performance ?&lt;/P&gt;

&lt;P&gt;Otherwise, what's worth using such a cooler notation if the old construct (in my simple experiment)&amp;nbsp; has a speedup of 1.5/2.3 over it ?&lt;/P&gt;

&lt;P&gt;Thanks in advance for any hint.&lt;/P&gt;

&lt;P&gt;PS. Here is my little example.&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;      implicit none
      real(8), allocatable :: a(:)
      real :: t0,t1,t2
      integer :: i, n, m, maxtimes
      maxtimes = 10000
      n = 100000
      do while ( n &amp;lt;= 10000000)
         print *, 'n = ', n
         allocate(a(n))
         if ( .not. allocated(a) ) stop
         call cpu_time(t0)
         do m = 1, maxtimes
            do i = 1,n/2
                a(i) = a(n-i+1)
            end do
         end do
         call cpu_time(t1)
         do m = 1, maxtimes
            a(1:n/2) = a(n:n/2+1:-1)
         end do
         call cpu_time(t2)
         deallocate(a)
         write(*,'(I8,3F10.3)') n, t1-t0, t2-t1, (t2-t1)/(t1-t0)
         n = n * 2
      end do  
      end     
&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 10 Jun 2015 14:46:37 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/is-it-worth-using-array-section-assignment/m-p/1034292#M111452</guid>
      <dc:creator>e745200</dc:creator>
      <dc:date>2015-06-10T14:46:37Z</dc:date>
    </item>
    <item>
      <title>Unfortunately, as you</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/is-it-worth-using-array-section-assignment/m-p/1034293#M111453</link>
      <description>&lt;P&gt;Unfortunately, as you surmised, ifort does allocate a temporary and perform a double copy for array section assignments within a given array, and doesn't perform much analysis to determine whether there is actual possibility of overlap.&amp;nbsp; Ideally, the allocation and deallocation would fall outside your test loop so you wouldn't see the time penalty for that.&lt;/P&gt;

&lt;P&gt;The code you show wasn't optimized by ifort until fairly recently; AVX2 offers better ISA support for it.&lt;/P&gt;

&lt;P&gt;Prior to ifort 16.0 beta, many such cases required !dir$ simd for optimization.&amp;nbsp; !$omp simd didn't help (and isn't intended to work with array assignments).&amp;nbsp; If you set the directive but there is actual overlap, it could produce wrong results.&lt;/P&gt;

&lt;P&gt;If the array section is big enough (at least 4KB) you might look at optimization report to see whether -O3 gave you streaming store (you could force that by !dir$ vector nontemporal, which may work even with array assignment).&lt;/P&gt;

&lt;P&gt;Many of us avoid choosing syntax for "coolness" although it's desirable for the code which is most readable not to give up performance.&lt;/P&gt;

&lt;P&gt;Multi-rank array assignments are notorious for not generating optimum code (besides not working with OpenMP), so I have many rank 1 array assignments inside DO loops.&lt;/P&gt;</description>
      <pubDate>Wed, 10 Jun 2015 15:11:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/is-it-worth-using-array-section-assignment/m-p/1034293#M111453</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2015-06-10T15:11:18Z</dc:date>
    </item>
    <item>
      <title>Thanks a lot, Tim !</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/is-it-worth-using-array-section-assignment/m-p/1034294#M111454</link>
      <description>&lt;P&gt;Thanks a lot, Tim !&lt;/P&gt;

&lt;P&gt;I'll study if the directive you suggested can help not losing performance in my case.&lt;/P&gt;

&lt;P&gt;In the meanwhile, I've found that calling a subroutine which simply makes an assignment on full arrays (as seen from inside the subroutine) gives the best performances,&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;! a(k:k+l) = b(j:j+l)
  call sub1(a(k), b(j), l+1)
  ...

  subroutine sub1(tgt, src, n)
  integer :: n
  real(8) :: src(n)
  real(8) :: tgt(n)
  tgt = src
  return
  end&lt;/PRE&gt;

&lt;P&gt;At this point, the final syntax is not different from using the BLAS dcopy, which was the initial stage of the code I was trying to improve ... just on e less dependency...&lt;/P&gt;</description>
      <pubDate>Wed, 10 Jun 2015 16:18:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/is-it-worth-using-array-section-assignment/m-p/1034294#M111454</guid>
      <dc:creator>e745200</dc:creator>
      <dc:date>2015-06-10T16:18:00Z</dc:date>
    </item>
    <item>
      <title>Now you show cases using the</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/is-it-worth-using-array-section-assignment/m-p/1034295#M111455</link>
      <description>&lt;P&gt;Now you show cases using the same stride for source and destination.&amp;nbsp; These have to be implemented differently from the case with different (including positive vs. negative) strides.&lt;/P&gt;

&lt;P&gt;mkl dcopy could use parallel (threaded) code if you link the mkl:parallel and the case is large enough.&amp;nbsp; That won't necessarily improve performance unless running a large enough case on multiple CPUs, depending somewhat on memory locality.&amp;nbsp; Last time I looked dcopy was unrolled more aggressively than other alternatives, to optimize performance for bigger arrays.&lt;/P&gt;

&lt;P&gt;When you call the subroutine you are asserting (according to Fortran standard) there is no overlap.&amp;nbsp; The compiler may make an intel_fast_memcpy substitution, including automatic decision at run time whether to use nontemporal stores.&amp;nbsp; optreport would show this (no report about vectorization).&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 10 Jun 2015 18:06:03 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/is-it-worth-using-array-section-assignment/m-p/1034295#M111455</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2015-06-10T18:06:03Z</dc:date>
    </item>
    <item>
      <title>Thanks again, Tim.</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/is-it-worth-using-array-section-assignment/m-p/1034296#M111456</link>
      <description>&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Thanks again, Tim.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Yes, the strides in the first example were different but not for a special choice, actually. Now I see that perhaps that was not a happy choice, for what you said. A good chance to explore the optimizer behavior, though ;-)&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 21 Jun 2015 22:35:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/is-it-worth-using-array-section-assignment/m-p/1034296#M111456</guid>
      <dc:creator>e745200</dc:creator>
      <dc:date>2015-06-21T22:35:04Z</dc:date>
    </item>
  </channel>
</rss>

