I am trying to use the IPO while building a fairly large Fortran program (a simulation model) that has highly modular object oriented design and includes several modules and files. The -ipo (and therefore -fast) started issuing the "Access violation or stack overflow. Please contact Intel Support for assistance." error.
The -ipoN or
-ipo-separate options recommended in the "IPO for Large Programs" do not help either.
I always get errors like this:
ifort -sox -parallel -O3 -static -heap-arrays -fp-model fast=2 -ipo-separate -ipo-c -xHost -finline-functions -qopt-report=3 -qopt-report-file=zzz_ipo.txt -o MODEL.exe HEDG2_DRV.f90 HEDG2_04.o BASE_UTILS.o BASE_STRINGS.o BASE_CSV_IO.o BASE_LOGGER.o BASE_RANDOM.o ** The compiler has encountered an unexpected problem. ** Segmentation violation signal raised. ** Access violation or stack overflow. Please contact Intel Support for assistance. fortcom: Severe: **Internal compiler error: internal abort** Please report this error along with the circumstances in which it occurred in a Software Problem Report. Note: File and line given may not be explicit cause of this error. ifort: error #10106: Fatal error in /home/opt/intel/compilers_and_libraries_2017.4.196/linux/bin/intel64/fortcom, terminated by segmentation violation ifort: error #10014: problem during multi-file optimization compilation (code 1) make: *** [MODEL.exe] Error 1
Does it seem that there are strict limits on the IPO and they have been reached by the code? I am also wondering what performance bonus could be achieved by IPO with large programs (I understand this is a difficult question, and could be different for different codes: ), from the real experience. Does it make sense to dig deeper into the option combinations at all?
Have you tried
ulimit -s unlimited
ulimit -m unlimited
prior to ifort?
(you may need to sudo this)
IPO will attempt to inline as much as possible. While inlining does eliminate call/return it also bloats the code. In some cases this will be counter-productive when a tight code loop increases from fitting into the L1 instruction cache to not fitting in the L1 instruction cache.
IOW you may get what you ask for, but not necessarily what you wanted.