Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Unusually long -O3 optimized compile time on specific code

Ferdinand_T_
New Contributor II
1,381 Views

Dear Intel community,

below is an example program which takes unusually long time to be compiled with "-O3" optimization (up to -O2 is fine), tested with ifort versions 14.0.3 and 15.0.3. This might be a compiler issue. 
(But I don't know what is going on. Mabye the compiler is really doing something very sophisticated and the long compile-time is justified).

[fortran]
program p

    ! Compound type with alloc. components

    type :: a_type

    end type

    type :: b_type

        ! The more fields, the longer takes -O3 compiling.

        integer, allocatable :: &

            i00,i01,i02,i03,i04,i05,i06,i07,i08,i09, &

            i10,i11,i12,i13,i14,i15,i16,i17,i18,i19, &

            i20,i21,i22,i23,i24,i25,i26,i27,i28,i29, &

            i30,i31,i32,i33,i34,i35,i36,i37,i38,i39

        class(a_type), allocatable :: &

            a00,a01,a02,a03,a04,a05,a06,a07,a08,a09, &

            a10,a11,a12,a13,a14,a15,a16,a17,a18,a19, &

            a20,a21,a22,a23,a24,a25,a26,a27,a28,a29, &

            a30,a31,a32,a33,a34,a35,a36,a37,a38,a39

    end type

    type(b_type), allocatable :: b

 

    ! Switch, undetermined at compile-time

    logical :: switch

 

    ! Allocate anything (doesn't have to, but can, be b)

    integer, allocatable :: i

    allocate(i)

 

    ! Deallocate compound type

    deallocate(b)

 

    ! Specific repeated conditional loop structure

    read (*,*) switch

    if (switch) then

        do              ! any do-loop

        end do

    end if

    if (switch) then

        do              ! any do-loop

        end do

    end if

end program
[/fortran]

Usually, changing the order of statements does not change the compile-time delay. However, after changing the code in any other way the reported delay is gone (e.g. removing one loop, removing the switch, removing the allocation, removing the allocatable-attribute, changing the class keyword to type, or removing entirely one of the fields in b_type) .

I post this issue only out of curiosity and for your information. For me, it is sufficient to use -O2 optimization on the affected code.

With regards
Ferdinand

 

0 Kudos
15 Replies
jimdempseyatthecove
Honored Contributor III
1,381 Views

I suspect the compiler is caught in a quandary of if it is safe to remove the dead code referring to b. b is declared, but not allocated, and the only reference is deallocate(b).

Did you see this behavior with real code?

Jim Dempsey

0 Kudos
Steven_L_Intel1
Employee
1,381 Views

I can reproduce an extended compile time with 16.0.2, and will look into this some more.

0 Kudos
Steven_L_Intel1
Employee
1,381 Views

Escalated as issue DPD200407723. Looking at some output from a debug compiler build, it doesn't look to me as if the delay is in an actual optimization step but rather some setup for it. I will let you know what the developers tell me.

0 Kudos
Ferdinand_T_
New Contributor II
1,381 Views

@Steve Lionel
Thank you. I am always interested in what is going on behind the scenes!

@Jim Dempsey

jimdempseyatthecove wrote:

Did you see this behavior with real code?

Yes. The compilation-time jumped from several minutes to beyond hours (I terminated it and first thought the compiler crashed entirely) when I added a command-line argument (the undetermined "switch") to optionally skip certain computations (the do-loops). The allocations where hidden in unrelated modules. It took me a while to boil the code down to the example given, bringing all these specific ingredients together in just the right combination. Which is also why your question made me wonder how else - if not by pure accident in actual code - one could come up with such an uncommon program triggering such unexpected behaviour?

Ferdinand
 

0 Kudos
Ferdinand_T_
New Contributor II
1,381 Views

jimdempseyatthecove wrote:

I suspect the compiler is caught in a quandary of if it is safe to remove the dead code referring to b. b is declared, but not allocated, and the only reference is deallocate(b).

I might add that the delay occurs also on a single allocation (no deallocations at all) using a structure constructor

[fortran]

    ! ...
 

    ! Allocate anything (doesn't have to, but can, be b)

    allocate(b, source=b_type())
 

    ! Specific repeated conditional loop structure

    ! ...

[/fortran]

0 Kudos
Jack_S_
Beginner
1,381 Views

Hi all,

Is there any update on the the unusual long compilation ? 

I have my Fortran application compiled with Version 14.0.230.144 and Version 16.0.0.109, where in the latter case I have aborted the compilation after 30 min! (suspected it could go for hours ...).

Thank you,

Jack.

0 Kudos
Steven_L_Intel1
Employee
1,381 Views

I have asked the developers for an update - hadn't seen anything yet.

0 Kudos
Jack_S_
Beginner
1,379 Views

Hi Steve,

Thanks for the note.

BTW - I can verify that the long compilation I experience using (a real code) is also with Version 16.0.2.181 (update 2) -- a compilation with the following set of flags increases from ~1 minute of compilation to unbounded time (I have stopped the compilation after 30 minutes, it was still in the linking process).

Using Version 16.0.2.181 (update 2) the code compiles successfully only using -O2.

The compilation flags :

ifort -g -O3 -openmp -xHost -ipo -fpconstant -fp-model precise -fpe0 -traceback -ftrapuv -gen-interfaces -warn interfaces

 

Thanks in advance for any update in this issue,

Jack.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,379 Views

>>it was still in the linking process

As a work around, disable multi-file inter-procedural optimizations -ipo- (for the problem file).
-O3 (I think) implies -ipo

Jim Dempsey

0 Kudos
Jack_S_
Beginner
1,379 Views

Hi Jim,

Your suggested workaround works -- the compilation is comparable to previous versions.

The question is then what is the relative importance of -ipo, as in the user & reference guide the -fast flag implies (separately) -O3 and -ipo.

Thank you,

Jack.

 

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,379 Views

You should read the documentation on Inter-procedural Optimizations. It is really not as comprehensive as it should be.

IPO is (can be) performed in two passes:

Compilation with -O3 (or -ipo) generates a special form intermediary object file (files when compiling multiple sources). When these intermediary files are passed to the linker, the linker will consult the intermediary files together, and when necessary... call back the compiler to perform inter-file optimizations to produce yet another intermediary object file. It is not clear (documented) as to if this process is recursive, but I suspect it is, and in your case, it seems to be caught in an infinite recursion loop (the loop could be iterative as well to avoid stack overflow).

The compiler has an -ipo-c option which you may or may not wish to investigate using.

Not documented, but assumed by me, is I assume -ipo and -ipo- affect the files following the option such that you can perform

ifort -g -O3 ... a.f90 b.f90 c.f90 -ipo- d.f90

Where a, b and c are ipo'd but d is not

(you'd probably incorporate the options into your make file, one group as objs: and the other as objs_no_ipo:)

Jim Dempsey

0 Kudos
Lorri_M_Intel
Employee
1,381 Views

Jim is mostly correct.

Using -O3 or -ipo does provoke the compiler to generate intermediate language in the .o files rather than (COFF) object code.

At link time, you use either "ifort" or "xild" to pull all these .o files into an executable.  The list of .o files is passed - once - to the compiler again, where interprocess optimizations happen, and a real object file is created.   Then, control is returned to either "ifort" or "xild", and the linker is called to create the executable.

There are no callbacks from "ld" and no interation, other than inside the compiler itself. (where the infinite loop is happening)

Finally, no, the "-ipo" switch is not positional.  That is, if "ifort" detects -ipo anywhere on the line, it applies it to all compilations.  To compile some with -ipo and some without, you'd need separate compilation lines.

                  --Lorri

 

 

 

0 Kudos
Steven_L_Intel1
Employee
1,381 Views

A modest correction. -ipo is the only option that puts intermediate code in the .o files. -O3 does not do this on its own. -fast is a "group option" which sets both -O3 and -ipo (and -xHost and some more.)

0 Kudos
Jack_S_
Beginner
1,381 Views

@Jim, Lorri and Steve -- thank you all very much for answering and for sharing your experience.

Jack.
 

0 Kudos
Steven_L_Intel1
Employee
1,381 Views

The developers have found and fixed the problem - an order N-cubed sort algorithm in part of the optimizer. We're hoping to get the fix in for the 17.0 release later this year.

0 Kudos
Reply