- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Intel community,
below is an example program which takes unusually long time to be compiled with "-O3" optimization (up to -O2 is fine), tested with ifort versions 14.0.3 and 15.0.3. This might be a compiler issue.
(But I don't know what is going on. Mabye the compiler is really doing something very sophisticated and the long compile-time is justified).
[fortran]
program p
! Compound type with alloc. components
type :: a_type
end type
type :: b_type
! The more fields, the longer takes -O3 compiling.
integer, allocatable :: &
i00,i01,i02,i03,i04,i05,i06,i07,i08,i09, &
i10,i11,i12,i13,i14,i15,i16,i17,i18,i19, &
i20,i21,i22,i23,i24,i25,i26,i27,i28,i29, &
i30,i31,i32,i33,i34,i35,i36,i37,i38,i39
class(a_type), allocatable :: &
a00,a01,a02,a03,a04,a05,a06,a07,a08,a09, &
a10,a11,a12,a13,a14,a15,a16,a17,a18,a19, &
a20,a21,a22,a23,a24,a25,a26,a27,a28,a29, &
a30,a31,a32,a33,a34,a35,a36,a37,a38,a39
end type
type(b_type), allocatable :: b
! Switch, undetermined at compile-time
logical :: switch
! Allocate anything (doesn't have to, but can, be b)
integer, allocatable :: i
allocate(i)
! Deallocate compound type
deallocate(b)
! Specific repeated conditional loop structure
read (*,*) switch
if (switch) then
do ! any do-loop
end do
end if
if (switch) then
do ! any do-loop
end do
end if
end program
[/fortran]
Usually, changing the order of statements does not change the compile-time delay. However, after changing the code in any other way the reported delay is gone (e.g. removing one loop, removing the switch, removing the allocation, removing the allocatable-attribute, changing the class keyword to type, or removing entirely one of the fields in b_type) .
I post this issue only out of curiosity and for your information. For me, it is sufficient to use -O2 optimization on the affected code.
With regards
Ferdinand
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I suspect the compiler is caught in a quandary of if it is safe to remove the dead code referring to b. b is declared, but not allocated, and the only reference is deallocate(b).
Did you see this behavior with real code?
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I can reproduce an extended compile time with 16.0.2, and will look into this some more.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Escalated as issue DPD200407723. Looking at some output from a debug compiler build, it doesn't look to me as if the delay is in an actual optimization step but rather some setup for it. I will let you know what the developers tell me.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Steve Lionel
Thank you. I am always interested in what is going on behind the scenes!
@Jim Dempsey
jimdempseyatthecove wrote:
Did you see this behavior with real code?
Yes. The compilation-time jumped from several minutes to beyond hours (I terminated it and first thought the compiler crashed entirely) when I added a command-line argument (the undetermined "switch") to optionally skip certain computations (the do-loops). The allocations where hidden in unrelated modules. It took me a while to boil the code down to the example given, bringing all these specific ingredients together in just the right combination. Which is also why your question made me wonder how else - if not by pure accident in actual code - one could come up with such an uncommon program triggering such unexpected behaviour?
Ferdinand
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
jimdempseyatthecove wrote:
I suspect the compiler is caught in a quandary of if it is safe to remove the dead code referring to b. b is declared, but not allocated, and the only reference is deallocate(b).
I might add that the delay occurs also on a single allocation (no deallocations at all) using a structure constructor
[fortran]
! ...
! Allocate anything (doesn't have to, but can, be b)
allocate(b, source=b_type())
! Specific repeated conditional loop structure
! ...
[/fortran]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
Is there any update on the the unusual long compilation ?
I have my Fortran application compiled with Version 14.0.230.144 and Version 16.0.0.109, where in the latter case I have aborted the compilation after 30 min! (suspected it could go for hours ...).
Thank you,
Jack.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have asked the developers for an update - hadn't seen anything yet.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Steve,
Thanks for the note.
BTW - I can verify that the long compilation I experience using (a real code) is also with Version 16.0.2.181 (update 2) -- a compilation with the following set of flags increases from ~1 minute of compilation to unbounded time (I have stopped the compilation after 30 minutes, it was still in the linking process).
Using Version 16.0.2.181 (update 2) the code compiles successfully only using -O2.
The compilation flags :
ifort -g -O3 -openmp -xHost -ipo -fpconstant -fp-model precise -fpe0 -traceback -ftrapuv -gen-interfaces -warn interfaces
Thanks in advance for any update in this issue,
Jack.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>it was still in the linking process
As a work around, disable multi-file inter-procedural optimizations -ipo- (for the problem file).
-O3 (I think) implies -ipo
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jim,
Your suggested workaround works -- the compilation is comparable to previous versions.
The question is then what is the relative importance of -ipo, as in the user & reference guide the -fast flag implies (separately) -O3 and -ipo.
Thank you,
Jack.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You should read the documentation on Inter-procedural Optimizations. It is really not as comprehensive as it should be.
IPO is (can be) performed in two passes:
Compilation with -O3 (or -ipo) generates a special form intermediary object file (files when compiling multiple sources). When these intermediary files are passed to the linker, the linker will consult the intermediary files together, and when necessary... call back the compiler to perform inter-file optimizations to produce yet another intermediary object file. It is not clear (documented) as to if this process is recursive, but I suspect it is, and in your case, it seems to be caught in an infinite recursion loop (the loop could be iterative as well to avoid stack overflow).
The compiler has an -ipo-c option which you may or may not wish to investigate using.
Not documented, but assumed by me, is I assume -ipo and -ipo- affect the files following the option such that you can perform
ifort -g -O3 ... a.f90 b.f90 c.f90 -ipo- d.f90
Where a, b and c are ipo'd but d is not
(you'd probably incorporate the options into your make file, one group as objs: and the other as objs_no_ipo:)
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jim is mostly correct.
Using -O3 or -ipo does provoke the compiler to generate intermediate language in the .o files rather than (COFF) object code.
At link time, you use either "ifort" or "xild" to pull all these .o files into an executable. The list of .o files is passed - once - to the compiler again, where interprocess optimizations happen, and a real object file is created. Then, control is returned to either "ifort" or "xild", and the linker is called to create the executable.
There are no callbacks from "ld" and no interation, other than inside the compiler itself. (where the infinite loop is happening)
Finally, no, the "-ipo" switch is not positional. That is, if "ifort" detects -ipo anywhere on the line, it applies it to all compilations. To compile some with -ipo and some without, you'd need separate compilation lines.
--Lorri
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A modest correction. -ipo is the only option that puts intermediate code in the .o files. -O3 does not do this on its own. -fast is a "group option" which sets both -O3 and -ipo (and -xHost and some more.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Jim, Lorri and Steve -- thank you all very much for answering and for sharing your experience.
Jack.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The developers have found and fixed the problem - an order N-cubed sort algorithm in part of the optimizer. We're hoping to get the fix in for the 17.0 release later this year.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page