Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Corrupted binary generated by ifort

Anirudh1
Beginner
391 Views

Hi,

I'm trying to compile NAS Parallel benchmark (http://www.nas.nasa.gov/publications/npb.html) FT (Fourier transform ) kernel using Intel Fortran Compiler. Initially I get "relocation truncated to fit" error. Then, I compiled again using -shared-intel -mcmodel=medium options. The build went through well, but, the executable gets killed when I try to run it.

<relocation truncated to fit error>

$ make "ft" class="E"

===========================================
= NAS PARALLEL BENCHMARKS 3.3 =
= Serial Versions =
= F77/C =
===========================================

cd FT; make class="E"
make[1]: Entering directory `/home/anirudh/NPB3.3.1/NPB3.3-SER/FT'
make[2]: Entering directory `/home/anirudh/NPB3.3.1/NPB3.3-SER/sys'
cc -o setparams setparams.c
make[2]: Leaving directory `/home/anirudh/NPB3.3.1/NPB3.3-SER/sys'
../sys/setparams ft E
make.def modified. Rebuilding npbparams.h just in case
rm -f npbparams.h
../sys/setparams ft E
ifort -c -O appft.f
ifort -c -O auxfnct.f
ifort -c -O fft3d.f
ifort -c -O mainft.f
ifort -c -O verify.f
cd ../common; ifort -c -O randi8.f
cd ../common; ifort -c -O print_results.f
cd ../common; ifort -c -O timers.f
cd ../common; cc -c -O -o wtime.o ../common/wtime.c
ifort -O -o ../bin/ft.E.x appft.o auxfnct.o fft3d.o mainft.o verify.o ../common/randi8.o ../common/print_results.o ../common/timers.o ../common/wtime.o
../common/timers.o(.text+0x7): In function `timer_clear_':
: relocation truncated to fit: R_X86_64_32S tt_
../common/timers.o(.text+0x30): In function `timer_start_':
: relocation truncated to fit: R_X86_64_32S tt_
../common/timers.o(.text+0x68): In function `timer_read_':
: relocation truncated to fit: R_X86_64_32S tt_
../common/timers.o(.text+0x92): In function `timer_stop_':
: relocation truncated to fit: R_X86_64_32S tt_
../common/timers.o(.text+0x9b): In function `timer_stop_':
: relocation truncated to fit: R_X86_64_32S tt_
../common/timers.o(.text+0xa4): In function `timer_stop_':
: relocation truncated to fit: R_X86_64_32S tt_
/opt/intel/Compiler/11.1/072/lib/intel64/libifcore.a(for_diags_intel.o)(.text+0x9cf): In function `for__io_return':
: relocation truncated to fit: R_X86_64_PC32 message_catalog
/opt/intel/Compiler/11.1/072/lib/intel64/libifcore.a(for_diags_intel.o)(.text+0xa91): In function `for__io_return':
: relocation truncated to fit: R_X86_64_PC32 message_catalog
/opt/intel/Compiler/11.1/072/lib/intel64/libifcore.a(for_diags_intel.o)(.text+0xa9a): In function `for__io_return':
: relocation truncated to fit: R_X86_64_PC32 message_catalog
/opt/intel/Compiler/11.1/072/lib/intel64/libifcore.a(for_diags_intel.o)(.text+0xaeb): In function `for__io_return':
: relocation truncated to fit: R_X86_64_PC32 message_catalog
/opt/intel/Compiler/11.1/072/lib/intel64/libifcore.a(for_diags_intel.o)(.text+0xc0f): In function `for__issue_diagnostic':
: additional relocation overflows omitted from the output
make[1]: *** [../bin/ft.E.x] Error 1
make[1]: Leaving directory `/home/anirudh/NPB3.3.1/NPB3.3-SER/FT'
make: *** [ft] Error 2

</relocation truncated to fit error>

<Successful compilation>

$ make "ft" class="E"
===========================================
= NAS PARALLEL BENCHMARKS 3.3 =
= Serial Versions =
= F77/C =
===========================================

cd FT; make class="E"
make[1]: Entering directory `/home/anirudh/NPB3.3.1/NPB3.3-SER/FT'
make[2]: Entering directory `/home/anirudh/NPB3.3.1/NPB3.3-SER/sys'
cc -o setparams setparams.c
make[2]: Leaving directory `/home/anirudh/NPB3.3.1/NPB3.3-SER/sys'
../sys/setparams ft E
make.def modified. Rebuilding npbparams.h just in case
rm -f npbparams.h
../sys/setparams ft E
ifort -c -O -shared-intel -mcmodel=medium appft.f
ifort -c -O -shared-intel -mcmodel=medium auxfnct.f
ifort -c -O -shared-intel -mcmodel=medium fft3d.f
ifort -c -O -shared-intel -mcmodel=medium mainft.f
ifort -c -O -shared-intel -mcmodel=medium verify.f
cd ../common; ifort -c -O -shared-intel -mcmodel=medium randi8.f
cd ../common; ifort -c -O -shared-intel -mcmodel=medium print_results.f
cd ../common; ifort -c -O -shared-intel -mcmodel=medium timers.f
cd ../common; cc -c -O -o wtime.o ../common/wtime.c
ifort -O -shared-intel -mcmodel=medium -o ../bin/ft.E.x appft.o auxfnct.o fft3d.o mainft.o verify.o ../common/randi8.o ../common/print_results.o ../common/timers.o ../common/wtime.o
make[1]: Leaving directory `/home/anirudh/NPB3.3.1/NPB3.3-SER/FT'

</Successful compilation>

$ ./bin/ft.E.x
Killed

Other information:

$ uname -a
Linux 2.6.9-34.0.2.ELsmp #1 SMP Fri Jul 7 18:22:55 CDT 2006 x86_64 x86_64 x86_64 GNU/Linux

$ ifort --version
ifort (IFORT) 11.1 20100414
Copyright (C) 1985-2010 Intel Corporation. All rights reserved.

$ file ./bin/ft.E.x
./bin/ft.E.x: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.4.0, dynamically linked (uses shared libs), not stripped

$ ldd ./bin/ft.E.x
not a dynamic executable

$ size ./bin/ft.E.x
text   data   bss  dec   hex   filename
28940    2220    687367103168    687367134328    a00a461c78   ./bin/ft.E.x

$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
pending signals (-i) 1024
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 16383
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

Now, what could the issue be? Any pointers will helpful.

Thanks, 
Anirudh 

0 Kudos
4 Replies
TimP
Honored Contributor III
391 Views

I suspect "class E" is too large to run on a single EP server, even with augmented limits.  Even for class "C" or "D" one would normally use OpenMP or MPI or both.

0 Kudos
Casey
Beginner
391 Views

That is my thought as well.  When I see "Killed" it usually means I've exhauseted my system memory (physical and swap) and the kernel is killing the offending process to reclaim resources.  Look at 'top' or another program that displays resource utilization when you run your program and if "killed" coincides with memory exhaustion, then you know your culprit.  As Tim suggests, you either need a machine with more memory or to run distributed across enough nodes to satisfy the memory requirements. Another alternative is to just add enough swap to your system for the case, but I would not recommend this because the ensuing disk paging will slow your system to a crawl and negate the reason you are running benchmarks for in the first place.

0 Kudos
jimdempseyatthecove
Honored Contributor III
391 Views

I haven't looked at the code to see if REAL(4) or REAL(8) are being used. The FT E build has a grid size of 4096 x 2048 x 2048. If REAL(8) then the grid alone consumes ~137.5GB. Depending on the algorithm, you may require twice this.

Although you are requesting "unlimited", your system administrator (which could be you), may have a limit for unlimited.

Jim Dempsey

0 Kudos
Anirudh1
Beginner
391 Views

Thanks. I was able to build the MPI version on 256 processors. 32, 64 and 128 procs produced similar binary as the serial code. Some of the other kernels in the benchmark like CG(Conjugate Gradient) and MG(Multi-Grid) has similar issues. For E class problem sizes for these kernels, 64 procs were required.

Thanks again for the suggestions

Anirudh

0 Kudos
Reply