Solved: ifort takes very long time (2 days) and then crashes on simple

gurnemanz · ‎06-14-2011

Hi,

I have been coding a simple ODE Runge-Kutta solver. The central routine is the computation of the derivative vector v1 = derivative(v0). Using Wolfram's Mathematica, I have produced the source code in the form of a long list of assignments

v1(1) = a11 * v0(1) + a12 * v0(2) + ...

v2(2) =a21 * v0(1) + a22 * v0(2) + ...

...

Some of the coefficients aij are constants known at compile time, others are parameters, and the very large majority are exactly zero, so they are omitted from the source code.

The dimension of the vectors v1 and v0 is of the order 10000.

I tried to compile the code using only the option -O3. The result was that compilation took more than two days on a powerful Intel server, using several GB of RAM. Eventually, the compilation failed and the compiler issued a "catastrophic error".

The compilation of the same code but with a smaller vector (4000) took around two hours, but eventually succeded, so I assume this is not an issue of my code.

I would like to know: Why should the compiler choke on some very very straightforward (albeit long) code like that? I mean, it is just a long sequence of assignments to vector components. Is there some flag, or option that should be used for such a source code?

Any suggestion will be appreciated!

Regards.

jimdempseyatthecove · ‎06-15-2011

If I were to guess, your FORTRANsource code is written by another program. If so, I suggest you write a little program that breaks the initialization into multiple subroutines (as multiple files preferrably). This should be relatively easy to do since the initialization has no loops or if statements. Run an experiment by hand partitioning the problematic program.

Jim Dempsey

View solution in original post

jimdempseyatthecove · ‎06-14-2011

Try adding option "-Qip-" this will disable interprocedural optimizations.

Jim Dempsey

dajum · ‎06-14-2011

if you are using the -list option remove it. That is a compiler problem we have hit with Composer. Optimization can be a problem too. We have had programs take hours to compile with Composer that only take a couple seconds in 11.1. It would be nice to see the samples I sent in fixed for all compiler options.

TimP · ‎06-14-2011

If the compiler attempts to block vectorize this sort of source code, it may thrash memory (particularly for the 32-bit compiler). In such a case, if -O1 is sufficient, it may complete more expeditiously. If you wish to compile at -O3, perhaps you could find specific optimization limits which are set too high.

If the vector length influences compilation time, I would ask if you have static data initialization.

gurnemanz · ‎06-15-2011

Thank you for your suggestion! I have launched the compiler without the option that you mention. The compiler has been running for almost a day so far, so I'm not getting sensible speed increase but, maybe, it will at least complete the task. I will update the post when I know the outcome.

gurnemanz · ‎06-15-2011

Thanks for helping! I probably should have mentioned in the first post that I am using ifort 11.0 on GNU/Linux. I am not using explicitly the option -list, and I was not able to find reference to it in the reference manual of the compiler!?

jimdempseyatthecove · ‎06-15-2011

If I were to guess, your FORTRANsource code is written by another program. If so, I suggest you write a little program that breaks the initialization into multiple subroutines (as multiple files preferrably). This should be relatively easy to do since the initialization has no loops or if statements. Run an experiment by hand partitioning the problematic program.

Jim Dempsey

gurnemanz · ‎06-15-2011

Indeed, I initialized a large vector at compile time, and now I have changed the code so that the vector is initialized at runtime. Moreover, I have changed the optimization option from -O3 to -O1. Unfortunately, the compiler has been running for several hours (then I killed it) so I deduce your suggestions do not solve the problem I have. Thank you anyway for helping!

gurnemanz · ‎06-16-2011

Thank you very much for your suggestion! Indeed, truncating the long subroutine into smaller subroutines (from one routine with 10000 assignments, to 100 routines with 100 assignments each) and calling them in sequence from the main file, the compilation is correctly performed in a reasonably short time (of the order of minutes). I compile each subroutine, and then link all together, with the option -O3. Would you recommend some other option?

As happy as I am that at least I can obtain an executable, still I cannot understand how a professional tool like ifort can choke on a code, and then work on exactly the same code split over several source files.

Regards.

mecej4 · ‎06-16-2011

I do not know the internals of the Intel compiler, but we can take a stab at the question in your second paragraph. Let us say that to perform the optimization of a piece of code with N lines the compiler takes time F(N). If we chop up the code into n pieces, the time now is n.F(N/n). Try a couple of plausible functions F:

Monolithic Segmented

k.N k.N

k.N.lg N k.N.lg(N/n)

k.N² k N²/n

k.N³ k N³/n²

Thus, breaking up the code decreases the scope for optimization and prevents the compiler from breaking its heart trying to optimize a big chunk of code.

As to what a "professional tool" is supposed to do: it could be assumed that when a professional specifies -O3, "optimize the heck out of the code, hang the expense and take no prisoners" was intended. A professional user would probably not specify -O3 (and other high-expense optimization options) for unusually long routines.

TimP · ‎06-16-2011

As was suggested earlier (except that the Windows option was quoted; for linux, use -fno-inline-functions), disabling in-lining should give the same effect as breaking the source code into individual files. There are individual interprocedural limits which you could play with if you don't want to disable it entirely.
The idea of processing a large source file through an fsplit utility before compiling, then re-combining .o files via ld -r, and perl script implementation, came to us from the HPUX compiler, which didn't even have a satisfactory option to cut back on overly aggressive in-lining.

gurnemanz · ‎06-16-2011

I do agree that the time required for optimization can scale wildly with the length of the code. I would also like to clarify that my observation on the quality of the ifort output is more an expression of surprise than a criticism. ifort is certainly a professional tool, and one that is graciously offered to non-professionals like me, what's more. I am sure that I am doing something not properly, because professionals won't start chopping their code into chunks, or wait two days and get a "catastrophic error" out of the compiler, won't they?

gurnemanz · ‎06-16-2011

Thank you for answering again. Actually, I had found on the manual that the equivalent of /Qip- was -no-ip on Linux. I had tried that, but with no luck. Now I have also tried the option -O3 -fno-inline-functions (on a smaller code, whose compilation takes 2 hours with -O3 only), but I do not see any dramatic speed-up.

Let me also point out that the very long routine that I have in my code is called only a few times in the main program. So, I guess that inlining should be a minor overhead, relatively to the routine itself (?). The long routine, moreover, does not call any other routine: it is just a long list of assignments, +, *, and complex numbers and variables.

mecej4 · ‎06-16-2011

I found myself in a similar situation a few years ago. The code in question was about 1 Mb long (over 10,000 lines), and the compilation time increased from about 1 second with no optimization to about half-an-hour with full optimizations.

In the Usenet group comp.lang.fortran, see

http://coding.derkeiler.com/Archive/Fortran/comp.lang.fortran/2008-05/msg00683.html ,

I posted my expectation that

"no compiler ought to try so hard to give an additional five percent
execution speed increase at the cost of a seven-hundred-fold increase
in compilation time."

and the responses left me much chastened.

A more recent example by Eli Osherovich involved a much shorter C source (about 40 lines) but involved deeply nested loops, in this forum:

Very slow compilation

The compilation times ranged from 0.04 s to over 60 s, depending on the optimization level used. The issue was reported to the compiler development team, but I do not see any update to the thread yet.

Since then I have become used to aborting slow compiles and retrying with lower levels of optimization when faced with such problematic sources.

ifort takes very long time (2 days) and then crashes on simple but long source code