Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

[SOLVED] Relocation error - please, help!

dmitry424
Beginner
8,270 Views
Hi everybody,

I compile my code with the following bash script:
LIB_PATH=/opt/intel/Compiler/11.1/073/mkl/lib/em64t/
INCLUDE_PATH=/opt/intel/Compiler/11.1/073/mkl/include/
ifort -w -c $INCLUDE_PATH"mkl_dfti.f90" -o mkl_dfti.o
ifort -static mkl_dfti.o ./my_code.f90 -L$LIB_PATH -Wl,--start-group $LIB_PATH"libmkl_intel_lp64.a" $LIB_PATH"libmkl_intel_thread.a" $LIB_PATH"libmkl_core.a" -Wl,--end-group -liomp5 -o ./my_code.sh
and have this output:
/tmp/ifortSHuua2.o: In function `MAIN__':
./my_code.f90:(.text+0x86): relocation truncated to fit: R_X86_64_PC32 against `function_calculations$STRIDES_IN.0.3'
./my_code.f90:(.text+0x8e): relocation truncated to fit: R_X86_64_PC32 against `function_calculations$STRIDES_OUT.0.3'
./my_code.f90:(.text+0x108): relocation truncated to fit: R_X86_64_PC32 against `function_calculations$LENGTHS.0.3'
./my_code.f90:(.text+0x10f): relocation truncated to fit: R_X86_64_PC32 against `function_calculations$LENGTHS.0.3'
./my_code.f90:(.text+0x116): relocation truncated to fit: R_X86_64_PC32 against `function_calculations$LENGTHS.0.3'
./my_code.f90:(.text+0xd32): relocation truncated to fit: R_X86_64_32S against `function_calculations$var$104.0.3'
./my_code.f90:(.text+0xd49): relocation truncated to fit: R_X86_64_32S against `function_calculations$var$104.0.3'
./my_code.f90:(.text+0xd6a): relocation truncated to fit: R_X86_64_32S against `function_calculations$var$108.0.3'
./my_code.f90:(.text+0xd7d): relocation truncated to fit: R_X86_64_32S against `function_calculations$var$108.0.3'
./my_code.f90:(.text+0xdba): relocation truncated to fit: R_X86_64_32S against `function_calculations$var$100.0.3'
./my_code.f90:(.text+0xdec): additional relocation overflows omitted from the output
If I add-openmp, it goes well, but later I have segmentation fault. "-mcmodel=large -shared-intel" or "-mcmodel=medium -shared-intel" doesn't change the situation at all. When I change -static to -i_dynamic, I have:
./my_code.sh: error while loading shared libraries: libiomp5.so: cannot open shared object file: No such file or directory
and
export LD_LIBRARY_PATH=/opt/intel/Compiler/11.1/073/mkl/lib/em64t:; ./my_code.sh
doesn't help.
In my code I use large 3D arrays and MKL library (Intel Fast Fourier transform). I also use EQUIVALENCE statement (between 3D and 1D arraysfor these large arrays), so, if I understand correctly, allocatable variables will not work. I have similar code, but withoutEQUIVALENCE statements, it works with ~20Gb of RAM without any problems being compiled with the same bash script.
Could you please write me, is it possible to solve the problem without rewriting my code (especially, not removingEQUIVALENCE)?
Thank you very much in advance!
0 Kudos
5 Replies
dmitry424
Beginner
8,269 Views
P.S.Below is how my code initially looked like (with 3D arrays). It is fast, readable, but doesn't work with npt>=256 (I need equivalent 1D arraysBx_1D,By_1D,Bz_1D to work with 3D FFT).
program prog
Use MKL_DFTI
integer, parameter :: npt = 128
FUNCTION_RESULT = function_calculations(...);
contains
function function_calculations(...)
IMPLICIT NONE
real(DP) :: Bx(npt,npt,npt), By(npt,npt,npt), Bz(npt,npt,npt)
real(DP) :: Bx_1D(npt**3), By_1D(npt**3), Bz_1D(npt**3)
equivalence (Bx_1D, Bx);
equivalence (By_1D, By);
equivalence (Bz_1D, Bz);
CODE
end function
end program
I rewrote the above code to work with 3D arrays as 1D arrays. It doesn't use EQUIVALENCE, it is slower with npt=128, significantly less readable, but it works with npt=256, 512, etc.
program prog
Use MKL_DFTI
integer, parameter :: npt=128
FUNCTION_RESULT = function_calculations(npt, ...);
contains
function function_calculations(npt, ...)
IMPLICIT NONE
integer :: npt ...
real(DP) :: Bx(npt**3), By(npt**3), Bz(npt**3)
CODE
end function
end program
Interesting that here I define "npt" variable two times - as global parameter, and then inside "function_calculations". I can remove "npt" definition in"function_calculations" and call this function the same style as I did in 3D code, "FUNCTION_RESULT = function_calculations(...)", but this leads to the same problems I described in my initial post in this thread: to relocation error. I.e. if I remove "npt" definition inside "function_calculations", I cannot compile this code with npt>=256.
Could you please write me, is it possible to compile initial 3D code correctly, without relocation errors or segmentation faults?
0 Kudos
mecej4
Honored Contributor III
8,269 Views
It is difficult, if not impossible, to help when you provide only fragments of source code and leave out compiler and linker invocation command lines. For example, we know nothing about what is in my_code.f90 and my_code.sh.

When running shell scripts, it is helpful while debugging to add the -x option.

0 Kudos
dmitry424
Beginner
8,269 Views
mecej4,
Thanks for your reply. Entire compilation script is in my initial post of this thread. "my_code.sh" - is the output for ifort compiler (i.e., it is my aim to compile this executable "my_code.sh" and then run it as "./my_code.sh").
Concerning my code, OK, let's start from this simple example:
program prog
integer, parameter :: DP = kind(0.0D0)
real(DP), parameter :: pi = 3.141592653589793238_DP
integer, parameter :: npt = 512
real(DP) :: x(npt), tspan, dx
integer :: jx, jy, jz
real(DP) :: Bx(npt,npt,npt), By(npt,npt,npt)
real(DP) :: Yx(npt,npt,npt), Yy(npt,npt,npt), Yz(npt,npt,npt)
real(DP) :: Vx(npt,npt,npt), Vy(npt,npt,npt), Vz(npt,npt,npt)
real(DP) :: Vx_1D(npt**3), Vy_1D(npt**3), Vz_1D(npt**3)
equivalence (Vx_1D, Vx);
equivalence (Vy_1D, Vy);
equivalence (Vz_1D, Vz);
tspan = 2*pi;
dx = tspan/npt;
do j = 1,npt
x(j) = (tspan/npt)*( j - 1 - floor(0.5_DP*npt) );
end do
do jz = 1,npt
do jy = 1,npt
do jx = 1,npt
Bx(jx, jy, jz) = sin(x(jz));
By(jx, jy, jz) = sin(x(jx));
end do
end do
end do
call Derivatives(Bx,dx,Vx,Vy,Vz);
call Derivatives(By,dx,Yx,Yy,Yz);
print *, "All done!"
contains
subroutine Derivatives(X,dx,Xx,Xy,Xz)
IMPLICIT NONE
real(DP) :: X(npt,npt,npt), Xx(npt,npt,npt), Xy(npt,npt,npt), Xz(npt,npt,npt), dx
Xx(2:npt-1,1:npt,1:npt) = (X(3:npt,1:npt,1:npt)-X(1:npt-2,1:npt,1:npt))/(2*dx);
Xx(1,1:npt,1:npt) = (X(2,1:npt,1:npt)-X(npt,1:npt,1:npt))/(2*dx);
Xx(npt,1:npt,1:npt) = (X(1,1:npt,1:npt)-X(npt-1,1:npt,1:npt))/(2*dx);
Xy(1:npt,2:npt-1,1:npt) = (X(1:npt,3:npt,1:npt)-X(1:npt,1:npt-2,1:npt))/(2*dx);
Xy(1:npt,1,1:npt) = (X(1:npt,2,1:npt)-X(1:npt,npt,1:npt))/(2*dx);
Xy(1:npt,npt,1:npt) = (X(1:npt,1,1:npt)-X(1:npt,npt-1,1:npt))/(2*dx);
Xz(1:npt,1:npt,2:npt-1) = (X(1:npt,1:npt,3:npt)-X(1:npt,1:npt,1:npt-2))/(2*dx);
Xz(1:npt,1:npt,1) = (X(1:npt,1:npt,2)-X(1:npt,1:npt,npt))/(2*dx);
Xz(1:npt,1:npt,npt) = (X(1:npt,1:npt,1)-X(1:npt,1:npt,npt-1))/(2*dx);
end subroutine
end program
Compilation with
LIB_PATH=/opt/intel/Compiler/11.1/073/mkl/lib/em64t/
INCLUDE_PATH=/opt/intel/Compiler/11.1/073/mkl/include/
ifort -i8 -w -c $INCLUDE_PATH"mkl_dfti.f90" -o mkl_dfti.o
ifort -i8 -static mkl_dfti.o ./my_code.f90 -L$LIB_PATH -Wl,--start-group $LIB_PATH"libmkl_intel_ilp64.a" $LIB_PATH"libmkl_intel_thread.a" $LIB_PATH"libmkl_core.a" -Wl,--end-group -o ./my_code.sh
gives
/tmp/ifort7fwQWw.o: In function `MAIN__':
./my_code.f90:(.text+0x15c): relocation truncated to fit: R_X86_64_32 against `.bss'
./my_code.f90:(.text+0x17b): relocation truncated to fit: R_X86_64_32 against `.bss'
./my_code.f90:(.text+0x180): relocation truncated to fit: R_X86_64_32 against `.bss'
./my_code.f90:(.text+0x186): relocation truncated to fit: R_X86_64_32 against `.bss'
/opt/intel/Compiler/11.1/073/lib/intel64/libifcore.a(for_init.o): In function `for__signal_handler':
for_init.c:(.text+0xec): relocation truncated to fit: R_X86_64_PC32 against `for__protect_handler_ops'
for_init.c:(.text+0x117): relocation truncated to fit: R_X86_64_PC32 against `for__protect_handler_ops'
for_init.c:(.text+0x131): relocation truncated to fit: R_X86_64_PC32 against symbol `for__l_excpt_info' defined in .bss section in /opt/intel/Compiler/11.1/073/lib/intel64/libifcore.a(for_init.o)
for_init.c:(.text+0x14b): relocation truncated to fit: R_X86_64_PC32 against symbol `for__l_fpe_mask' defined in .bss section in /opt/intel/Compiler/11.1/073/lib/intel64/libifcore.a(for_init.o)
for_init.c:(.text+0x3a7): relocation truncated to fit: R_X86_64_PC32 against symbol `for__l_excpt_info' defined in .bss section in /opt/intel/Compiler/11.1/073/lib/intel64/libifcore.a(for_init.o)
for_init.c:(.text+0x3cd): relocation truncated to fit: R_X86_64_PC32 against symbol `for__l_excpt_info' defined in .bss section in /opt/intel/Compiler/11.1/073/lib/intel64/libifcore.a(for_init.o)
for_init.c:(.text+0x3fc): additional relocation overflows omitted from the output
When I add "-mcmodel=medium -shared-intel" to the last "ifort" line, I have
/opt/intel/Compiler/11.1/073/lib/intel64/libifcore.a(for_init.o): In function `for__signal_handler':
for_init.c:(.text+0xec): relocation truncated to fit: R_X86_64_PC32 against `for__protect_handler_ops'
for_init.c:(.text+0x117): relocation truncated to fit: R_X86_64_PC32 against `for__protect_handler_ops'
for_init.c:(.text+0x131): relocation truncated to fit: R_X86_64_PC32 against symbol `for__l_excpt_info' defined in .bss section in /opt/intel/Compiler/11.1/073/lib/intel64/libifcore.a(for_init.o)
for_init.c:(.text+0x14b): relocation truncated to fit: R_X86_64_PC32 against symbol `for__l_fpe_mask' defined in .bss section in /opt/intel/Compiler/11.1/073/lib/intel64/libifcore.a(for_init.o)
for_init.c:(.text+0x3a7): relocation truncated to fit: R_X86_64_PC32 against symbol `for__l_excpt_info' defined in .bss section in /opt/intel/Compiler/11.1/073/lib/intel64/libifcore.a(for_init.o)
for_init.c:(.text+0x3cd): relocation truncated to fit: R_X86_64_PC32 against symbol `for__l_excpt_info' defined in .bss section in /opt/intel/Compiler/11.1/073/lib/intel64/libifcore.a(for_init.o)
for_init.c:(.text+0x3fc): relocation truncated to fit: R_X86_64_PC32 against symbol `for__l_excpt_info' defined in .bss section in /opt/intel/Compiler/11.1/073/lib/intel64/libifcore.a(for_init.o)
for_init.c:(.text+0x402): relocation truncated to fit: R_X86_64_PC32 against symbol `for__l_undcnt' defined in .bss section in /opt/intel/Compiler/11.1/073/lib/intel64/libifcore.a(for_init.o)
for_init.c:(.text+0x423): relocation truncated to fit: R_X86_64_PC32 against symbol `for__l_excpt_info' defined in .bss section in /opt/intel/Compiler/11.1/073/lib/intel64/libifcore.a(for_init.o)
for_init.c:(.text+0x45c): relocation truncated to fit: R_X86_64_PC32 against symbol `for__l_excpt_info' defined in .bss section in /opt/intel/Compiler/11.1/073/lib/intel64/libifcore.a(for_init.o)
for_init.c:(.text+0x47d): additional relocation overflows omitted from the output
When I additionally add "-openmp", I have smooth compilation and segmentation fault of compiled "my_code.sh". "-i8" key changes nothing.
If I use only one call of subroutine "Derivatives" (i.e., if I remove line "call Derivatives(By,dx,Yx,Yy,Yz);" from my code), then with options"-mcmodel=medium -shared-intel -openmp" I have smooth compilation and execution of"my_code.sh" even up to npt=2048, that is, in fact, a problem: one real double precision 3D array 2048^3 must take 66Gb of RAM, while I have only 8Gb. So, in fact, something goes wrong here also.
Link Line Advisor gives me
$MKLROOT/libmkl_solver_ilp64.a -Wl,--start-group $MKLroot/libmkl_intel_ilp64.a $MKLroot/libmkl_intel_thread.a $MKLroot/libmkl_core.a -Wl,--end-group -openmp -lpthread
and this doesn't help.
0 Kudos
mecej4
Honored Contributor III
8,269 Views
For the code in #3, with npt=256, compiling with the command

$ ifort -mcmodel medium -shared-intel my_code.f90

creates an executable that runs to completion. For npt=512, the stack size is slightly over the 8GB of RAM on my machine, so the code does not run. However, raising the virtual memory limit with

$ ulimit -v 16474720

allows the program to run.

For the test program that you gave in #3, I don't see why you need to link in the MKL.

Martyn Corden's comments in this thread may be useful.
0 Kudos
dmitry424
Beginner
8,269 Views
Thank you, mecej4!
I need MKL for fast Fourier transform in my main code, of course I don't need it for the above test example.
You gave me the right direction - I solved my problem (for the main code) by changing static linking to dynamic one. Seems, static linking is not possible in my case (though, I don't understand why)..
Step 1: I set libraries path environment variable. Since MKL libraries are in/opt/intel/Compiler/11.1/073/mkl/lib/em64t, while ifort libraries - in/opt/intel/Compiler/11.1/073/lib/intel64, and I need them all, I had to use
export LD_LIBRARY_PATH=/opt/intel/Compiler/11.1/073/mkl/lib/em64t:/opt/intel/Compiler/11.1/073/lib/intel64
Step 2. I asked Intel Math Kernel Library Link Line Advisor concerning what are my options in case of dynamic linking. It gave me
-L$MKLROOT $MKLROOT/libmkl_solver_ilp64.a -Wl,--start-group -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -openmp -lpthread
I don't know why, but "-openmp" option constantly leads to segmentation fault. So I had to change it by "-liomp5".I now successfully compile my code with the help of the following bash script:
MKLROOT=/opt/intel/Compiler/11.1/073/mkl/lib/em64t/
INCLUDE_PATH=/opt/intel/Compiler/11.1/073/mkl/include/
ifort -w -c $INCLUDE_PATH"mkl_dfti.f90" -o mkl_dfti.o
ifort -xHOST -O2 -mcmodel=medium -shared-intel mkl_dfti.o ./my_code.f90 -L$MKLROOT $MKLROOT"libmkl_solver_ilp64.a" -Wl,--start-group -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -liomp5 -lpthread -o ./my_code.sh
Again, thank you very much!
0 Kudos
Reply