Linux gfortran/mkl fits where ifort/mkl does not?

braver · ‎04-07-2008

I have a system which I compile on Linux as well as Mac OSX (the Mac OSX problems were described in a previous post). It actually compiles and works on Linux under gfortran with MKL and OpenMP. Furthermore, gfortran+mkl reaches actual linear speedup in the number of cores.

However, when I try to compile the same code with ifort, I get the following:

/tmp/ipo_ifortecAxRi.f:(.text+0x249e): relocation truncated to fit: R_X86_64_PC32 against `second_$TARRAY.8.11'

-- and so forth. Note that same Intel's own MKL compiles and runs with the same code under gfortran! I'm using the fce, x86_64 compiler, and my BLAS are

LIBPATH=-L/opt/intel/mkl/10.0.1.014/lib/em64t
BLAS=-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_lapack -lmkl_core

The ifort options are

FFLAGS = -w95 -cm -assume buffered_io -vec_report0 -O3 -ipo

Same failure with R_X86_64_PC32 occurs both with or without -openmp (passed to ifort at compile and link time when specified).

Again, I specified ulimit -x unlimited for all x's shown by ulimit -a. Is there an option I'm missing to tell ifort we're on a huge Xeon server with x86_64 and nocona architecture?

TimP · ‎04-07-2008

This appears to be a 2GB limit on static data or code size. Given that your gfortran result appears to be satisfactory, my first guess is that your ifort options are excessively aggressive. Unfortunately, you didn't mention your compiler versions or gfortran options.
gfortran doesn't yet support any option comparable to -ipo, so you may want to shut that off.
If you didn't set gfortran -ffast-math, the comparable option would be ifort -fp-model precise, although such a discrepancy is not likely to produce your quoted error.
gfortran generally is not nearly as aggressive in auto-vectorization as ifort, assuming you even invoked it. Since ifort 10, it is on by default. Does -O1 (no vectorization) or -fno-inline-functions work for you?
The default architecture setting of ifort is OK. You could set -xP, in case you get an advantage from SSE3, but that doesn't appear to be your problem.
If you are still on the edge of a memory model problem, you might try -mc-model=medium.

Steven_L_Intel1 · ‎04-07-2008

ulimit isn't going to help here. This has something to do with the linker. Do you have large static arrays? I know there is a problem on MacOS with something called the Global Offset Table that deals with addressing of static data in large (>2GB) applications. This will be resolved in a future release, but the advice for now is to either use ALLOCATABLE arrays (my preference) or put each large array in its own COMMON.

braver · ‎04-07-2008

Tim and Steve -- thank you for your hints! The app worked after I

-- split huge arrays from a single common block into several per Steve's idea

-- added some options per Tim's:

OPTIMIZE = -O3 -xP -mcmodel=medium -i-dynamic -fPIC
Compiler flags
#OPENMP=-openmp
FFLAGS = -w95 -cm -assume buffered_io -vec_report0 $(OPTIMIZE)
$(OPENMP)

It also worked when I added -mp to OPTIMIZE, but fp-model precise per above complained that -fp doesn't take arguments.

Since I'm rather new to ifort, and tweaked an existing option line, can you please elaborate on what the effects of these options are for solving the memory fit problem versus efficiency? -O1 worked also. I'm on ifort 10.1.012, MKL 10.0.1.014.

With OpenMP, the speedup was 8 times for my 8 cores (4 dual Xeons), and the actual run time was 4.5-5 minutes versus gfortran 6.5-7 minutes. Both do about 9.5-10 minutes without OpenMP -- the huge matrices are read and written around the parallel block.

For your reference, my gfortran options which worked right off the bat are

CPUOPT = -march=nocona
OPT = -O3 -funroll-all-loops -ffast-math $(CPUOPT)
# Compiler flags
#DEBUG = -g
OPENMP = -fopenmp
FFLAGS = $(DEBUG) $(OPENMP) -fno-second-underscore -W -Wall $(OPT)
LINKFLAGS = -liomp5 -lpthread

And my BLAS is
LIBPATH=-L/opt/intel/mkl/10.0.1.014/lib/em64t
BLAS=-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_lapack -lmkl_core

-----
BTW, when I've searched this forum for the dreaded R_X86_64_PC32, there was no help in those threads, so to my knowledge this is the first thread where the problem is resolved. Thanks!

Steven_L_Intel1 · ‎04-07-2008

-mcmodel=medium is what you wanted - I missed that you were seeing this problem on Linux and thought it was on Mac. -mcmodel=medium tells the compiler that data can extend past 2GB but that code does not. There may be also some issues with large single common blocks, but I'm not all that familiar with them.

My guess is that gfortran defaults to -mcmodel=medium - ifort doesn't as it reduces performance a bit, but it is needed if you are to use the 64-bit address space.