Automatic Loop Permutation

matiasr83 · ‎02-15-2010

Hi all

I`m compiling and running a simulation code in Fortran 90 using ifort v11.0 (20081108 and 20090318) and ifort v11.1 (20091130). They all are non-commercial version for Linux.

This is how I compile and link:

ifort c source1.f source2.f

ifort o executable source1.o source2.o

I`ve been measuring the simulation wall-clock time using the time command. For the same simulation case the results were (approximately):

64 minutes, using ifort v11.1 (20091130)

18 minutes, using ifort v11.0 (20081108)

65 minutes, using ifort v11.0 (20090318)

I suppose this behavior is due to automatic loop permutation but Im not sure

Is there any reason to suppose that ifortran v11.0 20081108 has a different heuristics to select and permute the loops??

Thanksfor your help,

Matas.

jimdempseyatthecove · ‎02-15-2010

Assuming your input data (simulation parameters) are the same. Add the option switched to each compilation to produce a verbose report. The different revisions may have different default options.

A 3x difference in run time could be due to

a) one compilation manages to keep (most)everything in cache and the other does not

b) one compilation results in virtual memory paging, the other does not.

The compilation report may point to the root of the difference.

Jim Dempsey

TimP · ‎02-16-2010

A change in loop optimizations would not be surprising over a year of compiler development, particularly if you set -O3 (specifically requesting such optimizations).

However, as Jim suggested, you should at least look at the opt-report result to see if the compiler has missed a reported optimization.

Martyn_C_Intel · ‎02-24-2010

To be more specific:

add -opt-report-phase hlo to your command line. Then look in the report for messages saying that loops were interchanged, (and for any other differences between the two compiler version).If you add also-vec-report2, (or -opt-report-phase hpo), you can also see whether some loops were vectorized in the older compiler that are not vectorized in more recent ones.

If you are able to attach a not-too-large test case that reproduces the behavior, we can take a look at that.