I suspect the compiler is

Ferdinand_T_ · ‎03-03-2016

Dear Intel community,

below is an example program which takes unusually long time to be compiled with "-O3" optimization (up to -O2 is fine), tested with ifort versions 14.0.3 and 15.0.3. This might be a compiler issue.
(But I don't know what is going on. Mabye the compiler is really doing something very sophisticated and the long compile-time is justified).

[fortran]
program p

! Compound type with alloc. components

type :: a_type

end type

type :: b_type

! The more fields, the longer takes -O3 compiling.

integer, allocatable :: &

i00,i01,i02,i03,i04,i05,i06,i07,i08,i09, &

i10,i11,i12,i13,i14,i15,i16,i17,i18,i19, &

i20,i21,i22,i23,i24,i25,i26,i27,i28,i29, &

i30,i31,i32,i33,i34,i35,i36,i37,i38,i39

class(a_type), allocatable :: &

a00,a01,a02,a03,a04,a05,a06,a07,a08,a09, &

a10,a11,a12,a13,a14,a15,a16,a17,a18,a19, &

a20,a21,a22,a23,a24,a25,a26,a27,a28,a29, &

a30,a31,a32,a33,a34,a35,a36,a37,a38,a39

end type

type(b_type), allocatable :: b

! Switch, undetermined at compile-time

logical :: switch

! Allocate anything (doesn't have to, but can, be b)

integer, allocatable :: i

allocate(i)

! Deallocate compound type

deallocate(b)

! Specific repeated conditional loop structure

read (*,*) switch

if (switch) then

do ! any do-loop

end do

end if

if (switch) then

do ! any do-loop

end do

end if

end program
[/fortran]

Usually, changing the order of statements does not change the compile-time delay. However, after changing the code in any other way the reported delay is gone (e.g. removing one loop, removing the switch, removing the allocation, removing the allocatable-attribute, changing the class keyword to type, or removing entirely one of the fields in b_type) .

I post this issue only out of curiosity and for your information. For me, it is sufficient to use -O2 optimization on the affected code.

With regards
Ferdinand

jimdempseyatthecove · ‎03-03-2016

I suspect the compiler is caught in a quandary of if it is safe to remove the dead code referring to b. b is declared, but not allocated, and the only reference is deallocate(b).

Did you see this behavior with real code?

Jim Dempsey

Steven_L_Intel1 · ‎03-03-2016

I can reproduce an extended compile time with 16.0.2, and will look into this some more.

Steven_L_Intel1 · ‎03-03-2016

Escalated as issue DPD200407723. Looking at some output from a debug compiler build, it doesn't look to me as if the delay is in an actual optimization step but rather some setup for it. I will let you know what the developers tell me.

Ferdinand_T_ · ‎03-03-2016

@Steve Lionel
Thank you. I am always interested in what is going on behind the scenes!

@Jim Dempsey

jimdempseyatthecove wrote:

Did you see this behavior with real code?

Yes. The compilation-time jumped from several minutes to beyond hours (I terminated it and first thought the compiler crashed entirely) when I added a command-line argument (the undetermined "switch") to optionally skip certain computations (the do-loops). The allocations where hidden in unrelated modules. It took me a while to boil the code down to the example given, bringing all these specific ingredients together in just the right combination. Which is also why your question made me wonder how else - if not by pure accident in actual code - one could come up with such an uncommon program triggering such unexpected behaviour?

Ferdinand

Ferdinand_T_ · ‎03-03-2016

jimdempseyatthecove wrote:

I suspect the compiler is caught in a quandary of if it is safe to remove the dead code referring to b. b is declared, but not allocated, and the only reference is deallocate(b).

I might add that the delay occurs also on a single allocation (no deallocations at all) using a structure constructor

[fortran]

! ...

! Allocate anything (doesn't have to, but can, be b)

allocate(b, source=b_type())

! Specific repeated conditional loop structure

! ...

[/fortran]

Jack_S_ · ‎04-09-2016

Hi all,

Is there any update on the the unusual long compilation ?

I have my Fortran application compiled with Version 14.0.230.144 and Version 16.0.0.109, where in the latter case I have aborted the compilation after 30 min! (suspected it could go for hours ...).

Thank you,

Jack.

Steven_L_Intel1 · ‎04-11-2016

I have asked the developers for an update - hadn't seen anything yet.

Jack_S_ · ‎04-12-2016

Hi Steve,

Thanks for the note.

BTW - I can verify that the long compilation I experience using (a real code) is also with Version 16.0.2.181 (update 2) -- a compilation with the following set of flags increases from ~1 minute of compilation to unbounded time (I have stopped the compilation after 30 minutes, it was still in the linking process).

Using Version 16.0.2.181 (update 2) the code compiles successfully only using -O2.

The compilation flags :

ifort -g -O3 -openmp -xHost -ipo -fpconstant -fp-model precise -fpe0 -traceback -ftrapuv -gen-interfaces -warn interfaces

Thanks in advance for any update in this issue,

Jack.

jimdempseyatthecove · ‎04-12-2016

>>it was still in the linking process

As a work around, disable multi-file inter-procedural optimizations -ipo- (for the problem file).
-O3 (I think) implies -ipo

Jim Dempsey

Jack_S_ · ‎04-12-2016

Hi Jim,

Your suggested workaround works -- the compilation is comparable to previous versions.

The question is then what is the relative importance of -ipo, as in the user & reference guide the -fast flag implies (separately) -O3 and -ipo.

Thank you,

Jack.

jimdempseyatthecove · ‎04-12-2016

You should read the documentation on Inter-procedural Optimizations. It is really not as comprehensive as it should be.

IPO is (can be) performed in two passes:

Compilation with -O3 (or -ipo) generates a special form intermediary object file (files when compiling multiple sources). When these intermediary files are passed to the linker, the linker will consult the intermediary files together, and when necessary... call back the compiler to perform inter-file optimizations to produce yet another intermediary object file. It is not clear (documented) as to if this process is recursive, but I suspect it is, and in your case, it seems to be caught in an infinite recursion loop (the loop could be iterative as well to avoid stack overflow).

The compiler has an -ipo-c option which you may or may not wish to investigate using.

Not documented, but assumed by me, is I assume -ipo and -ipo- affect the files following the option such that you can perform

ifort -g -O3 ... a.f90 b.f90 c.f90 -ipo- d.f90

Where a, b and c are ipo'd but d is not

(you'd probably incorporate the options into your make file, one group as objs: and the other as objs_no_ipo:)

Jim Dempsey

Lorri_M_Intel · ‎04-12-2016

Jim is mostly correct.

Using -O3 or -ipo does provoke the compiler to generate intermediate language in the .o files rather than (COFF) object code.

At link time, you use either "ifort" or "xild" to pull all these .o files into an executable. The list of .o files is passed - once - to the compiler again, where interprocess optimizations happen, and a real object file is created. Then, control is returned to either "ifort" or "xild", and the linker is called to create the executable.

There are no callbacks from "ld" and no interation, other than inside the compiler itself. (where the infinite loop is happening)

Finally, no, the "-ipo" switch is not positional. That is, if "ifort" detects -ipo anywhere on the line, it applies it to all compilations. To compile some with -ipo and some without, you'd need separate compilation lines.

--Lorri

Steven_L_Intel1 · ‎04-12-2016

A modest correction. -ipo is the only option that puts intermediate code in the .o files. -O3 does not do this on its own. -fast is a "group option" which sets both -O3 and -ipo (and -xHost and some more.)

Jack_S_ · ‎04-12-2016

@Jim, Lorri and Steve -- thank you all very much for answering and for sharing your experience.

Jack.

Steven_L_Intel1 · ‎04-18-2016

The developers have found and fixed the problem - an order N-cubed sort algorithm in part of the optimizer. We're hoping to get the fix in for the 17.0 release later this year.

Unusually long -O3 optimized compile time on specific code