Optimization

billaustralia · ‎05-22-2003

I have over 100 source files, so for my release version, I want to get the interprocedural optimizations. The manual says put in /object:filename. When I do this in the compile options box in vf, I get a pop up error box with the message
'The source file a.for b.for are both configured to produce the output file w.obj. The project cannot be built.'

What am I doing wrong, or can this only be done from the command line?

Steven_L_Intel1 · ‎05-22-2003

The method the manual describes is for the command line only. Here's how to do it in the IDE.

Create a new executable project in the same workspace. In that project, create a new file in the same folder as your other source files, call it all.f90 (or .f if you use fixed-form source). You can change the name if you want.) Make all.f90 be a series of INCLUDE lines, one for each of your source files, for example:

include 'firstfile.f90'
include 'secondfile.f90'
...

Note - JUST the INCLUDE lines - no END or anything else. Build this project - the compiler will put everything together.

Let me comment that this will 1) make compiles take a lot longer, and require a LOT more virtual memory. (You may want to experiment with combining smaller groups of files if this is a problem.) 2) You may not see much benefit - but try it and check the performance both ways.

Steve

billaustralia · ‎05-23-2003

thanks Steve but I hope you are wrong on both counts. Slightly earlier version on Lahey takes only 3 mins compiling all in one lump (50000 lines or so)

kdkeefer · ‎05-23-2003

Bill,
If your program is mostly pre-F90 style code, I would doubt that you'd benefit too much, especially given the effort, from the interprocedural optimizations. Classic Fortran compiles into lumps of binary numbers, identified by a single global symbol for a relocating linker. There is very little to optimize in going from lump to lump.
I've had much better luck with some of the other CVF optimization options, particularly inline. Alignment of common is important, too.
The P4 especially is *very* different internally from the machines of even ten years ago. For example, its 20 stage execution pipeline means that a branch misprediction, resulting in a pipeline flush can cost as much time as a floating point multiply. The savings from not loading a register with a value it already contains are trivial by comparison.
Regards,
Keith