- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I realize that F90 gives us some array operations but just trying to figure this out. Old school thinking has us looping over the last array index in the outer most loop to address memory consecutively.
The results I'm getting are not what I expect. With default optimization I used -opt-report and for the "slow" code the compiler is optimizing and switching the order of the loops. For the "fast" code (where I loop over the last index first) it does not and that runs *slower*. What is going on? If I set -O0 then I get the expected result, code below runs faster with j in outer loop.
Source codes attached.
What do I take away from this? Should we not try and be smart about the index order in loops? Thanks for any insight.
integer ndimi,ndimj,ntimes
parameter (ndimi=2000, ndimj=3000, ntimes=1000)
integer x(ndimi,ndimj),y(ndimi,ndimj), i,j,k
integer timesec1, timesec2
call system_clock(timesec1)
print *, 'time: ', timesec1
do k = 1,ntimes
do j=1,ndimj
do i=1,ndimi
x(i,j) = 5
y(i,j) = 6
x (i,j) = x(i,j) * y(i,j)
end do
end do
end do
call system_clock(timesec2)
print *, 'time: ',timesec2
print *, 'diff: ' ,timesec2 - timesec1
end program
ifort (IFORT) 12.1.6 20130222
ifort -mcmodel=medium -shared-intel -opt-report loopindex_slow.f >& report_slow.txt
./a.out
time: 2033097649
time: 2033115630
diff: 17981
ifort -mcmodel=medium -shared-intel -opt-report loopindex.f > & report.txt
./a.out
time: 2033245879
time: 2033338024
diff: 92145
report_slow.txt has:
<loopindex_slow.f;10:10;hlo_linear_trans;MAIN__;0>
LOOP INTERCHANGE in loops at line: 10 12 13
Loopnest permutation ( 1 2 3 ) --> ( 3 1 2 )
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My first thought is that your observed behavior with -O2 or greater has less to do with your loop iteration sequence but more to do with your operations in the loop. Your statements do not depent on k and though you loop over i and j, your statements do not depend on i or j. A better test of the effects you are attempting to explore would be a statement that depends on i,j,k and incorperates references to things like x(i+1,j-1) so that the compiler cannot as easily optimize away your entire loop.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Apparently, the outer loop is shortcut, as well as the inner loops being interchanged, in the case you intended to be slow. As Casey hinted, you should construct a benchmark which focuses on the point you are trying to make.
You wouldn't need to repeat your benchmark so many times if you would declare the system_clock arguments as integer(8). All currently maintained compilers support this much of Fortran 2003 (although it doesn't help on ifort Windows).

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page