<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Well option 1 is by far the in Intel® Fortran Compiler</title>
    <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142165#M137489</link>
    <description>&lt;P&gt;Well option 1 is by far the neatest code and gets my vote. The actual 'saving' by any other method is really going to be very small so why worry about it! When ever&amp;nbsp;I have made timing tests of such things I have had to repeat a very large&amp;nbsp; number of times&amp;nbsp; so be able to see&amp;nbsp;a meaningful time difference like the blink of an eye sort of time.&lt;/P&gt;&lt;P&gt;A better bet is if you can determine at run time how big&amp;nbsp;intNumMem needs to be then make the array allocatable and of the correct size.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 09 Aug 2019 19:55:55 GMT</pubDate>
    <dc:creator>andrew_4619</dc:creator>
    <dc:date>2019-08-09T19:55:55Z</dc:date>
    <item>
      <title>Fastest Way to Initialize Variables</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142163#M137487</link>
      <description>&lt;P style="margin-left:0in; margin-right:0in"&gt;Most Efficient Way To Initialize Variables.&lt;/P&gt;&lt;P style="margin-left:0in; margin-right:0in"&gt;Say one has the following variable:&lt;/P&gt;&lt;P style="margin-left:0in; margin-right:0in"&gt;doubleprecision, dimension(6,10000) :: dblResults = 0.0d0&lt;/P&gt;&lt;P style="margin-left:0in; margin-right:0in"&gt;Every analysis cycle one needs to re-zero the dblResults array. The first dimension of 6 is always used to its full extent. The second dimension is “number of members” and is set at 10000 as an upper limit that should never be reached. Assume in this run of the analysis the first 100 positions of the second dimension are actually used, i.e.&amp;nbsp;intNumMem = 100&lt;/P&gt;&lt;P style="margin-left:0in; margin-right:0in"&gt;! which of the following options runs fastest to re-zero the dblResults array for the next analysis cycle and is the preferred way to write the code. (Or is there yet a different and even better way to re-zero the array?)&lt;/P&gt;&lt;P style="margin-left:0in; margin-right:0in"&gt;&lt;STRONG&gt;! option 1&lt;/STRONG&gt;&lt;/P&gt;&lt;P style="margin-left:0in; margin-right:0in"&gt;dblResults = 0.0d0&lt;/P&gt;&lt;P style="margin-left:0in; margin-right:0in"&gt;&lt;STRONG&gt;! option 2&lt;/STRONG&gt;&lt;/P&gt;&lt;P style="margin-left:0in; margin-right:0in"&gt;dblResults(1:6, 1:intNumMem) = 0.0d0&lt;/P&gt;&lt;P style="margin-left:0in; margin-right:0in"&gt;&lt;STRONG&gt;! option 3&lt;/STRONG&gt;&lt;/P&gt;&lt;P style="margin-left:0in; margin-right:0in"&gt;do i = 1, 6&lt;/P&gt;&lt;P style="margin-left:0in; margin-right:0in"&gt;&amp;nbsp;&amp;nbsp; do j = 1, intNumMem&lt;/P&gt;&lt;P style="margin-left:0in; margin-right:0in"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; dblResults (i, j) = 0.0d0&lt;/P&gt;&lt;P style="margin-left:0in; margin-right:0in"&gt;&amp;nbsp;&amp;nbsp; end do&lt;/P&gt;&lt;P style="margin-left:0in; margin-right:0in"&gt;end do&lt;/P&gt;&lt;P style="margin-left:0in; margin-right:0in"&gt;&lt;STRONG&gt;! option 4&lt;/STRONG&gt;&lt;/P&gt;&lt;P style="margin-left:0in; margin-right:0in"&gt;forall (i=1:6, &amp;nbsp;j=1,intNumMem)&amp;nbsp;&lt;/P&gt;&lt;P style="margin-left:0in; margin-right:0in"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; dblResults(i, j) = 0.0d0&lt;/P&gt;&lt;P style="margin-left:0in; margin-right:0in"&gt;end forall&lt;/P&gt;&lt;P style="margin-left:0in; margin-right:0in"&gt;Thank you very much in advance for your comments.&lt;/P&gt;&lt;P style="margin-left:0in; margin-right:0in"&gt;Bob&lt;/P&gt;</description>
      <pubDate>Fri, 09 Aug 2019 18:03:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142163#M137487</guid>
      <dc:creator>Robert</dc:creator>
      <dc:date>2019-08-09T18:03:59Z</dc:date>
    </item>
    <item>
      <title>! Option 5  (yet another idea</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142164#M137488</link>
      <description>&lt;P&gt;&lt;STRONG&gt;! Option 5&amp;nbsp; (yet another idea to re-zero the dblResults array for the next use.)&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Where (dblResults /= 0.0d0)&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; dblResults = 0.0d0&lt;/P&gt;&lt;P&gt;End Where&lt;/P&gt;</description>
      <pubDate>Fri, 09 Aug 2019 18:52:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142164#M137488</guid>
      <dc:creator>Robert</dc:creator>
      <dc:date>2019-08-09T18:52:41Z</dc:date>
    </item>
    <item>
      <title>Well option 1 is by far the</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142165#M137489</link>
      <description>&lt;P&gt;Well option 1 is by far the neatest code and gets my vote. The actual 'saving' by any other method is really going to be very small so why worry about it! When ever&amp;nbsp;I have made timing tests of such things I have had to repeat a very large&amp;nbsp; number of times&amp;nbsp; so be able to see&amp;nbsp;a meaningful time difference like the blink of an eye sort of time.&lt;/P&gt;&lt;P&gt;A better bet is if you can determine at run time how big&amp;nbsp;intNumMem needs to be then make the array allocatable and of the correct size.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 09 Aug 2019 19:55:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142165#M137489</guid>
      <dc:creator>andrew_4619</dc:creator>
      <dc:date>2019-08-09T19:55:55Z</dc:date>
    </item>
    <item>
      <title>andrew_4619&gt;&gt; The actual</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142166#M137490</link>
      <description>&lt;P&gt;andrew_4619&amp;gt;&amp;gt;&amp;nbsp;&lt;EM&gt;The actual 'saving' by any other method is really going to be very small&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;Not when&lt;/P&gt;&lt;P&gt;Bob&amp;gt;&amp;gt;&lt;EM&gt;&amp;nbsp;“number of members” and is set at 10000 as an upper limit that should never be reached. Assume in this run of the analysis the first 100 positions of the second dimension are actually used, i.e.&amp;nbsp;intNumMem = 100&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;Option 2 should produce the best results. (should, though check with VTune)&lt;/P&gt;&lt;P&gt;If the time consumed is really an issue, then experiment with passing the &lt;EM&gt;contiguous &lt;/EM&gt;slice of the 2D array to a subroutine taking the shape of a 1D array with size of the 6*intNumMem passed in (IOW as you may have done in Fortran 77). *** This would only be considered if (when) the compiler optimization did NOT use the AVXnnn instructions (or intel_fast_... intrinsic) to wipe the contiguous array (slice).&lt;/P&gt;&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Fri, 09 Aug 2019 21:50:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142166#M137490</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2019-08-09T21:50:17Z</dc:date>
    </item>
    <item>
      <title>Quote:jimdempseyatthecove</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142167#M137491</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;jimdempseyatthecove (Blackbelt) wrote:&lt;BR /&gt;andrew_4619&amp;gt;&amp;gt;&amp;nbsp;&lt;EM&gt;The actual 'saving' by any other method is really going to be very small&lt;/EM&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Not when&lt;/P&gt;&lt;P&gt;Bob&amp;gt;&amp;gt;&lt;EM&gt;&amp;nbsp;“number of members” and is set at 10000 as an upper limit that should never be reached. Assume in this run of the analysis the first 100 positions of the second dimension are actually used, i.e.&amp;nbsp;intNumMem = 100&lt;/EM&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;My point was Jim that 1000*verylittletime is still verylittletime to a human. If speed is a real issue using an allocated array of the right size makes more sense to me.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 10 Aug 2019 07:37:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142167#M137491</guid>
      <dc:creator>andrew_4619</dc:creator>
      <dc:date>2019-08-10T07:37:14Z</dc:date>
    </item>
    <item>
      <title>@Robert,</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142168#M137492</link>
      <description>&lt;P&gt;@Robert,&lt;/P&gt;&lt;P&gt;See excellent responses by Andrew and Jim, both of which can guide you well in terms of 2 considerations: code readability-maintenance and&amp;nbsp;performance.&amp;nbsp; As advised, you may want to look into utilizing the ALLOCATABLE facility that can help you work with right-sized datasets.&amp;nbsp; If that is not possible, you can consider your Option 2, or a variant&lt;/P&gt;
&lt;PRE class="brush:fortran; class-name:dark;"&gt;dblResults(:,1:intNumMem) = 0.0d0&lt;/PRE&gt;

&lt;P&gt;which informs a reader of your code (who may be you yourself in a future incarnation!) clearly of the array section that is zeroed out!&lt;/P&gt;</description>
      <pubDate>Sat, 10 Aug 2019 15:52:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142168#M137492</guid>
      <dc:creator>FortranFan</dc:creator>
      <dc:date>2019-08-10T15:52:31Z</dc:date>
    </item>
    <item>
      <title>Thank y’all for the replies.</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142169#M137493</link>
      <description>&lt;P style="margin-left:0in; margin-right:0in"&gt;Thank y’all for the replies. Fortran has several ways to accomplish equivalent tasks, often for legacy code compatibility. It is unclear sometimes how different code affects speed or what is preferred coding practice.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 11 Aug 2019 13:52:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142169#M137493</guid>
      <dc:creator>Robert</dc:creator>
      <dc:date>2019-08-11T13:52:57Z</dc:date>
    </item>
    <item>
      <title>My preference would be for</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142170#M137494</link>
      <description>&lt;P&gt;My preference would be for Option 1. The compiler can implement this very efficiently. Option 2 would make sense only if performance analysis showed this initialization to be a bottleneck, which I very much doubt.&lt;/P&gt;&lt;P&gt;Option 3 should have the loops reversed, though the compiler will probably do that for you. Option 4 is probably next to worst (and FORALL is deprecated), option 5 would be worst.&lt;/P&gt;</description>
      <pubDate>Sun, 11 Aug 2019 16:09:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142170#M137494</guid>
      <dc:creator>Steve_Lionel</dc:creator>
      <dc:date>2019-08-11T16:09:25Z</dc:date>
    </item>
    <item>
      <title>Quote:Steve Lionel (Ret.)</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142171#M137495</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Steve Lionel (Ret.) (Blackbelt) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;My preference would be for Option 1. The compiler can implement this very efficiently. Option 2 would make sense only if performance analysis showed this initialization to be a bottleneck, which I very much doubt.&lt;/P&gt;&lt;P&gt;Option 3 should have the loops reversed, though the compiler will probably do that for you. Option 4 is probably next to worst (and FORALL is deprecated), option 5 would be worst.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I agree. In my testing, the second way is the slowest (maybe). Although the compiler is smart enough to optimize all the situations equally (if possible).&lt;/P&gt;&lt;P&gt;Consider the following four subroutines compiled with ifort 19.0 linux with -O2 -mavx command line&lt;/P&gt;
&lt;PRE class="brush:fortran; class-name:dark;"&gt;pure subroutine fill1(a,x)
real(8), intent(out) :: a(:,:)
real(8), intent(in) :: x
integer :: n, m, i, j
    a = x
end subroutine
pure subroutine fill2(a,x)
real(8), intent(out) :: a(:,:)
real(8), intent(in) :: x
integer :: n, m, i, j
    n = size(a,1)
    m = size(a,2)
    a(1:n,1:m) = x
end subroutine
pure subroutine fill3(a,x)
real(8), intent(out) :: a(:,:)
real(8), intent(in) :: x
integer :: n, m, i, j
    n = size(a,1)
    m = size(a,2)
    do j=1, m
        do i=1, n
            a(i,j) = x
        end do
    end do
end subroutine
pure subroutine fill4(a,x)
real(8), intent(out) :: a(:,:)
real(8), intent(in) :: x
integer :: n, m, i, j
    n = size(a,1)
    m = size(a,2)
    forall(j=1:m, i=1:n) a(i,j)=x
end subroutine
&lt;/PRE&gt;

&lt;P&gt;produces the following timings (in function calls per second):&lt;/P&gt;

&lt;PRE class="brush:fortran; class-name:dark;"&gt;76923.08       58823.53       83333.34       76923.08  &lt;/PRE&gt;

&lt;P&gt;but the results depend on the order in which the above is called. I have included a burn-in calculation, as well as avoiding the elimination of unused code by doing something with `a` between calls.&lt;/P&gt;
&lt;P&gt;Online code &amp;amp; compiler here &lt;A href="https://godbolt.org/z/YtNVPn" target="_blank"&gt;https://godbolt.org/z/YtNVPn&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 12 Aug 2019 12:41:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142171#M137495</guid>
      <dc:creator>JAlexiou</dc:creator>
      <dc:date>2019-08-12T12:41:00Z</dc:date>
    </item>
    <item>
      <title>Quote:JAlexiou wrote:</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142172#M137496</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;JAlexiou wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;.. I agree. In my testing ..&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This test is not at all useful: on most computing environments in this day and age the test, as coded, will simply yield the same values for 'tic' and 'toc' and effectively lead to an output of "Infinity" for the number of functions per second for all 4 cases.&amp;nbsp; This reader should pay attention to Andrew's point in Quote #3, "When ever&amp;nbsp;I have made timing tests of such things I have had to repeat a very large&amp;nbsp; number of times&amp;nbsp; so be able to see&amp;nbsp;a meaningful time difference like the blink of an eye sort of time."&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 12 Aug 2019 14:30:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142172#M137496</guid>
      <dc:creator>FortranFan</dc:creator>
      <dc:date>2019-08-12T14:30:00Z</dc:date>
    </item>
    <item>
      <title>For your test to be somewhat</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142173#M137497</link>
      <description>&lt;P&gt;For your test to be somewhat of value:&lt;/P&gt;
&lt;PRE class="brush:fortran; class-name:dark;"&gt;module foo
  doubleprecision, dimension(6,10000) :: dblResults = 0.0d0
  contains
  ! your fill subroutines here
  ...
 ! timing routine
subroutine timeIt(a)
  use foo
  use omp_lib
  integer :: intNumMem
  doubleprecison :: a(:,:)
  doubleprecision :: T0, T1
  T0 = omp_get_wtime()
  call fill1(a, 1234.5D0)
  T1 = omp_get_wtime()
  print *, size(a,1), size(a,2), T1-T0
! ditto for fill2, ..., fill4
  ...
end subroutine timeIt

end module foo

program
  use foo
  integer :: intNumMem
  intNumMem = 10000
  call timeIt(intNumMem) ! throw away this time
  do intNumMem = 10000,1,-1000
    call timeIt(dblResults(1:6,1:intNumMem))
    ! assure optimizer does not elede code
    if(sum(dblResults) == -1.0D0) print *, "Should not print this"
  end do
end program
....
&lt;/PRE&gt;

&lt;P&gt;The above is sketch (you debug)&lt;/P&gt;
&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Mon, 12 Aug 2019 16:18:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142173#M137497</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2019-08-12T16:18:00Z</dc:date>
    </item>
    <item>
      <title>Note, compare the fill1 with</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142174#M137498</link>
      <description>&lt;P&gt;Note, compare the fill1 with a(1:6,1:10000) to all the rest (to meet the timing question as relating to your post #1)&lt;/P&gt;&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Mon, 12 Aug 2019 16:20:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142174#M137498</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2019-08-12T16:20:59Z</dc:date>
    </item>
    <item>
      <title>Just my two cents.</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142175#M137499</link>
      <description>&lt;P&gt;Just my two cents.&lt;/P&gt;&lt;P&gt;The best performance will be achieved for the following code:&lt;/P&gt;
&lt;PRE class="brush:fortran; class-name:dark;"&gt;real(8), allocatable :: dblResults(:,:)
integer :: intNumMem , j

allocate(dblResults(10000,6))

intNumMem  = 100

do j = 1, 6
    dblResults(1:intNumMem , j) = 0.d0
end do&lt;/PRE&gt;

&lt;P&gt;The key here is to have the largest dimension first as this is how Fortran allocates the memory - column by column. You want to access your results in the same way as they are allocated. If you do everything correctly, intel compiler will replace your loops with a library function.&lt;/P&gt;
&lt;P&gt;You possibly could further improve performance by vectorizing your code.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Second, somebody had subroutine parameters as&lt;/P&gt;

&lt;PRE class="brush:fortran; class-name:dark;"&gt;real(8), intent(out) :: a(:,:)&lt;/PRE&gt;

&lt;P&gt;Really bad idea. For &lt;STRONG&gt;out &lt;/STRONG&gt;parameters Fortran is allowed to reallocate the memory which becomes a performance issue for large arrays.&lt;/P&gt;
&lt;P&gt;Use &lt;STRONG&gt;in&lt;/STRONG&gt; or &lt;STRONG&gt;inout &lt;/STRONG&gt;for arrays and it will never fail you.&lt;/P&gt;</description>
      <pubDate>Thu, 15 Aug 2019 18:16:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142175#M137499</guid>
      <dc:creator>Andriy</dc:creator>
      <dc:date>2019-08-15T18:16:10Z</dc:date>
    </item>
    <item>
      <title>INTENT(OUT) only matters for</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142176#M137500</link>
      <description>&lt;P&gt;INTENT(OUT) only matters for allocatable array dummy arguments, where the array will be deallocated if it is allocated on entry. If your procedure just writes the array, and you don't want (re)allocation, don't use ALLOCATABLE when declaring the dummy argument.&lt;/P&gt;</description>
      <pubDate>Thu, 15 Aug 2019 18:36:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142176#M137500</guid>
      <dc:creator>Steve_Lionel</dc:creator>
      <dc:date>2019-08-15T18:36:21Z</dc:date>
    </item>
    <item>
      <title>Andriy,</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142177#M137501</link>
      <description>&lt;P&gt;Andriy,&lt;/P&gt;&lt;P&gt;The first dimension is the closest proximity of cells, the last dimension is the furthest. IOW&lt;/P&gt;&lt;P&gt;dblResults(I,J) is adjacent to dblResults(I+1,J)&lt;/P&gt;&lt;P&gt;whereas&lt;/P&gt;&lt;P&gt;dblResults(I,J) is separated from dblResults(I,J+1) by the number of elements of dim(dblResults,1) * sizeof(dblResults(1,1))&lt;/P&gt;&lt;P&gt;Therefore, for a partially filled array, i.e. part of the 10000 dimension, when this dimension comes last, the used elements are contiguous.&lt;BR /&gt;Should the 10000 dimension come first, say 100 of 10000, the memory usage would be:&lt;/P&gt;&lt;P&gt;100 cells used, 9900 cells skipped, 100 cells used, 9900 cells skipped, 100 cells&amp;nbsp;used, 9900 cells skipped, 100 cells&amp;nbsp;used, 9900 cells skipped, 100 cells used, 9900 cells skipped, 100 cells used&lt;/P&gt;&lt;P&gt;Processing the array in this manner would require:&lt;/P&gt;&lt;P&gt;6x more peel and remainder code segments (the inner and outer loop cannot be fused)&lt;BR /&gt;Additional memory pages accessed: from 2 required&amp;nbsp;to 6:12 required (assuming 4KB page size)&lt;/P&gt;&lt;P&gt;While the number of pages required might not incur significant overhead in a simple test program, it may be significant in the actual application. A TLB miss has significant overhead.&lt;/P&gt;&lt;P&gt;Jim Dempsey&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 15 Aug 2019 20:24:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142177#M137501</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2019-08-15T20:24:51Z</dc:date>
    </item>
    <item>
      <title>Quote:jimdempseyatthecove</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142178#M137502</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;jimdempseyatthecove (Blackbelt) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;dblResults(I,J) is adjacent to dblResults(I+1,J)&lt;/P&gt;&lt;P&gt;whereas&lt;/P&gt;&lt;P&gt;dblResults(I,J) is separated from dblResults(I,J+1) by the number of elements of dim(dblResults,1) * sizeof(dblResults(1,1))&lt;/P&gt;&lt;P&gt;Therefore, for a partially filled array, i.e. part of the 10000 dimension, when this dimension comes last, the used elements are contiguous.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Either my brain is atrophying, or this doesn't make sense.&amp;nbsp; Surely if dblResults(I,J) is adjacent to dblResults(I+1,J), then&lt;/P&gt;&lt;P&gt;dblResults(1:100,J) refers to 100 contiguous locations.&amp;nbsp; Am I out to lunch?&lt;/P&gt;&lt;P&gt;Gib&lt;/P&gt;</description>
      <pubDate>Fri, 16 Aug 2019 05:44:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142178#M137502</guid>
      <dc:creator>gib</dc:creator>
      <dc:date>2019-08-16T05:44:40Z</dc:date>
    </item>
    <item>
      <title>You haven't discussed whether</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142179#M137503</link>
      <description>&lt;P&gt;You haven't discussed whether you want an answer independent of choice of compiler or optimization flags. If you use ifort with default options, I would expect the better options to&amp;nbsp;invoke a library memset function call, on the assumption that a majority of the array will be reset. The features of this include automatic alignment adjustment and use of non temporal SSE or AVX instructions. A possible scenario where this may not be optimum would be where you could keep those arrays in cache as you alternate between setting and reusing them. In that case, you might wish to prevent nontemporal stores by directive, although that may not work for the nested loops.&lt;/P&gt;&lt;P&gt;By the way, I have been blocked from reaching the login server from my laptop for weeks now. It's a pain to have access only on the phone. I don't know if that is a feature of the politics of our state (internet access is still political here).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Aug 2019 12:06:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142179#M137503</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2019-08-16T12:06:16Z</dc:date>
    </item>
    <item>
      <title>While dblResults(1:100,J)</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142180#M137504</link>
      <description>&lt;P&gt;While dblResults(1:100,J) refers to 100 contiguous locations,&lt;BR /&gt;dblResults(1:100,J+1) leaps 10000-100 cells from &amp;nbsp;dblResults(1:100,J), well into some other one/two page region of memory.&lt;/P&gt;&lt;P&gt;Thus as you iterate the used portion of the array you have periodic leaps.&lt;/P&gt;&lt;P&gt;Indexed the other way around (and coding with the left most index as the inner most loop) permits any variable portion of the array to be used in a contiguous manner. This will reduce the number of TLB's required to map the used portion of the array, and when accessing the used portion as a whole, will eliminate the inter group loop peel and remainder processing. TLB's are Translation Look-aside Buffers. Each CPU design has a limited number of these.&lt;/P&gt;&lt;P&gt;&lt;A href="https://en.wikipedia.org/wiki/Thrashing_(computer_science)#TLB_thrashing"&gt;https://en.wikipedia.org/wiki/Thrashing_(computer_science)#TLB_thrashing&lt;/A&gt;:&lt;/P&gt;&lt;P&gt;&lt;EM&gt;TLB thrashing&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Where the translation lookaside buffer (TLB) acting as a cache for the memory management unit (MMU) which translates virtual addresses to physical addresses is too small for the working set of pages. TLB thrashing can occur even if instruction cache or data cache thrashing are not occurring, because these are cached in different sizes. Instructions and data are cached in small blocks (cache lines), not entire pages, but address lookup is done at the page level. Thus even if the code and data working sets fit into cache, if the working sets are fragmented across many pages, the virtual address working set may not fit into TLB, causing TLB thrashing.&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;And as an example Xeon Gold 6130 (from &lt;A href="http://www.cpu-world.com/CPUs/Xeon/Intel-Xeon%206154.html"&gt;http://www.cpu-world.com/CPUs/Xeon/Intel-Xeon%206154.html&lt;/A&gt;)&lt;/P&gt;&lt;P&gt;&lt;EM&gt;64-byte Prefetching&lt;BR /&gt;Data TLB: 1-GB pages, 4-way set associative, 4 entries&lt;BR /&gt;Data TLB: 4-KB Pages, 4-way set associative, 64 entries&lt;BR /&gt;Instruction TLB: 4-KByte pages, 8-way set associative, 64 entries&lt;BR /&gt;L2 TLB: 1-MB, 4-way set associative, 64-byte line size&lt;BR /&gt;&lt;STRONG&gt;Shared &lt;/STRONG&gt;2nd-Level TLB: 4-KB / 2-MB pages, 6-way associative, &lt;STRONG&gt;1536 entries&lt;/STRONG&gt;. Plus, 1-GB pages, 4-way, 16 entries&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;Shared amongst 18 cores (~85)&lt;/P&gt;&lt;P&gt;Older Server CPU's had fewer (~64) and desktops even fewer.&lt;/P&gt;&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Sat, 17 Aug 2019 12:47:50 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Fastest-Way-to-Initialize-Variables/m-p/1142180#M137504</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2019-08-17T12:47:50Z</dc:date>
    </item>
  </channel>
</rss>

