<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Array index order in loops not behaving as expected slow/fast. in Intel® Fortran Compiler</title>
    <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Array-index-order-in-loops-not-behaving-as-expected-slow-fast/m-p/937512#M89190</link>
    <description>&lt;P&gt;I realize that F90 gives us some array operations but just trying to figure this out. Old school thinking has us looping over the last array index in the outer most loop to address memory consecutively.&lt;BR /&gt;The results I'm getting are not what I expect.&amp;nbsp; With default optimization I used -opt-report and for the "slow" code the compiler is optimizing and switching the order of the loops. For the "fast" code (where I loop over the last index first) it does not and that runs *slower*.&amp;nbsp; What is going on? If I set -O0 then I get the expected result, code below runs faster with j in outer loop.&lt;/P&gt;
&lt;P&gt;Source codes attached.&lt;BR /&gt;&lt;BR /&gt;What do I take away from this? Should we not try and be smart about the index order in loops?&amp;nbsp; Thanks for any insight.&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; integer ndimi,ndimj,ntimes&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; parameter (ndimi=2000, ndimj=3000, ntimes=1000)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; integer x(ndimi,ndimj),y(ndimi,ndimj), i,j,k&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; integer timesec1, timesec2&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; call system_clock(timesec1)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; print *, 'time: ', timesec1&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; do k = 1,ntimes&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; do j=1,ndimj&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; do i=1,ndimi&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; x(i,j) = 5&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; y(i,j) = 6&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; x (i,j) = x(i,j) * y(i,j) &lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end do&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end do&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end do&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; call system_clock(timesec2)&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; print *, 'time: ',timesec2&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; print *, 'diff: ' ,timesec2 - timesec1&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end program&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;ifort (IFORT) 12.1.6 20130222&lt;BR /&gt;ifort -mcmodel=medium -shared-intel -opt-report loopindex_slow.f &amp;gt;&amp;amp; report_slow.txt&lt;BR /&gt;./a.out&lt;BR /&gt;&amp;nbsp;time:&amp;nbsp;&amp;nbsp; 2033097649&lt;BR /&gt;&amp;nbsp;time:&amp;nbsp;&amp;nbsp; 2033115630&lt;BR /&gt;&amp;nbsp;diff:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 17981&lt;BR /&gt;ifort -mcmodel=medium -shared-intel -opt-report loopindex.f &amp;gt; &amp;amp; report.txt&lt;BR /&gt;./a.out&lt;BR /&gt;&amp;nbsp;time:&amp;nbsp;&amp;nbsp; 2033245879&lt;BR /&gt;&amp;nbsp;time:&amp;nbsp;&amp;nbsp; 2033338024&lt;BR /&gt;&amp;nbsp;diff:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 92145&lt;BR /&gt;&lt;BR /&gt;report_slow.txt has:&lt;BR /&gt;&amp;lt;loopindex_slow.f;10:10;hlo_linear_trans;MAIN__;0&amp;gt;&lt;BR /&gt;LOOP INTERCHANGE in loops at line: 10 12 13 &lt;BR /&gt;Loopnest permutation ( 1 2 3 ) --&amp;gt; ( 3 1 2 )&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 28 Jun 2013 18:23:11 GMT</pubDate>
    <dc:creator>jkwi</dc:creator>
    <dc:date>2013-06-28T18:23:11Z</dc:date>
    <item>
      <title>Array index order in loops not behaving as expected slow/fast.</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Array-index-order-in-loops-not-behaving-as-expected-slow-fast/m-p/937512#M89190</link>
      <description>&lt;P&gt;I realize that F90 gives us some array operations but just trying to figure this out. Old school thinking has us looping over the last array index in the outer most loop to address memory consecutively.&lt;BR /&gt;The results I'm getting are not what I expect.&amp;nbsp; With default optimization I used -opt-report and for the "slow" code the compiler is optimizing and switching the order of the loops. For the "fast" code (where I loop over the last index first) it does not and that runs *slower*.&amp;nbsp; What is going on? If I set -O0 then I get the expected result, code below runs faster with j in outer loop.&lt;/P&gt;
&lt;P&gt;Source codes attached.&lt;BR /&gt;&lt;BR /&gt;What do I take away from this? Should we not try and be smart about the index order in loops?&amp;nbsp; Thanks for any insight.&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; integer ndimi,ndimj,ntimes&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; parameter (ndimi=2000, ndimj=3000, ntimes=1000)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; integer x(ndimi,ndimj),y(ndimi,ndimj), i,j,k&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; integer timesec1, timesec2&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; call system_clock(timesec1)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; print *, 'time: ', timesec1&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; do k = 1,ntimes&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; do j=1,ndimj&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; do i=1,ndimi&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; x(i,j) = 5&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; y(i,j) = 6&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; x (i,j) = x(i,j) * y(i,j) &lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end do&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end do&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end do&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; call system_clock(timesec2)&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; print *, 'time: ',timesec2&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; print *, 'diff: ' ,timesec2 - timesec1&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end program&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;ifort (IFORT) 12.1.6 20130222&lt;BR /&gt;ifort -mcmodel=medium -shared-intel -opt-report loopindex_slow.f &amp;gt;&amp;amp; report_slow.txt&lt;BR /&gt;./a.out&lt;BR /&gt;&amp;nbsp;time:&amp;nbsp;&amp;nbsp; 2033097649&lt;BR /&gt;&amp;nbsp;time:&amp;nbsp;&amp;nbsp; 2033115630&lt;BR /&gt;&amp;nbsp;diff:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 17981&lt;BR /&gt;ifort -mcmodel=medium -shared-intel -opt-report loopindex.f &amp;gt; &amp;amp; report.txt&lt;BR /&gt;./a.out&lt;BR /&gt;&amp;nbsp;time:&amp;nbsp;&amp;nbsp; 2033245879&lt;BR /&gt;&amp;nbsp;time:&amp;nbsp;&amp;nbsp; 2033338024&lt;BR /&gt;&amp;nbsp;diff:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 92145&lt;BR /&gt;&lt;BR /&gt;report_slow.txt has:&lt;BR /&gt;&amp;lt;loopindex_slow.f;10:10;hlo_linear_trans;MAIN__;0&amp;gt;&lt;BR /&gt;LOOP INTERCHANGE in loops at line: 10 12 13 &lt;BR /&gt;Loopnest permutation ( 1 2 3 ) --&amp;gt; ( 3 1 2 )&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 28 Jun 2013 18:23:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Array-index-order-in-loops-not-behaving-as-expected-slow-fast/m-p/937512#M89190</guid>
      <dc:creator>jkwi</dc:creator>
      <dc:date>2013-06-28T18:23:11Z</dc:date>
    </item>
    <item>
      <title>My first thought is that your</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Array-index-order-in-loops-not-behaving-as-expected-slow-fast/m-p/937513#M89191</link>
      <description>&lt;P&gt;My first thought is that your observed behavior with -O2 or greater has less to do with your loop iteration sequence but more to do with your operations in the loop. &amp;nbsp;Your statements do not depent on k and though you loop over i and j, your statements do not depend on i or j. &amp;nbsp;A better test of the effects you are attempting to explore would be a statement that depends on i,j,k and incorperates references to things like x(i+1,j-1) so that the compiler cannot as easily optimize away your entire loop. &amp;nbsp; &amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 29 Jun 2013 03:39:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Array-index-order-in-loops-not-behaving-as-expected-slow-fast/m-p/937513#M89191</guid>
      <dc:creator>Casey</dc:creator>
      <dc:date>2013-06-29T03:39:00Z</dc:date>
    </item>
    <item>
      <title>Apparently, the outer loop is</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Array-index-order-in-loops-not-behaving-as-expected-slow-fast/m-p/937514#M89192</link>
      <description>&lt;P&gt;Apparently, the outer loop is shortcut, as well as the inner loops being interchanged, in the case you intended to be slow.&amp;nbsp; As Casey hinted, you should construct a benchmark which focuses on the point you are trying to make.&lt;/P&gt;
&lt;P&gt;You wouldn't need to repeat your benchmark so many times if you would declare the system_clock arguments as integer(8).&amp;nbsp; All currently maintained compilers support this much of Fortran 2003 (although it doesn't help on ifort Windows).&lt;/P&gt;</description>
      <pubDate>Sat, 29 Jun 2013 13:03:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Array-index-order-in-loops-not-behaving-as-expected-slow-fast/m-p/937514#M89192</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2013-06-29T13:03:15Z</dc:date>
    </item>
  </channel>
</rss>

