Can you think of a little

ron7000 · ‎03-28-2011

Hi, trying to compile software, having problems.
My code compiles and runs fine on an itanium system running SLES 10.3 with intel fortran 11.1.075 and MKL 10.2.7.041 using the link line as told by mkl advisor:
-Wl,--start-group $(MKL)/lib/64/libmkl_solver_lp64.a $(MKL)/lib/64/libmkl_intel_lp64.a $(MKL)/lib/64/libmkl_intel_thread.a $(MKL)/lib/64/libmkl_core.a -Wl,--end-group -openmp -lguide -lpthread

Problem is on an x86_64 system running SLES 11.1 with intel fortran 11.1.073 and MKL 10.2.6.038.

Compiled my supporting libraries as *.a files no problem.
When compiling third piece of software, about 50 files to get the executable that I run, all the .F's compile but when linking I get errors below.
This is with "-mcmodel large -shared-intel'.
If I don't use this I get similar errors but not soley related to the mkl library, instead I get:
"relocation truncated to fit: R_X86_64_PC32 against symbol `units_com_' defined in COMMON section in mymainprogram.o".
And sometimes the errors don't have PC32, it's just R_X86_64_32.
I have also tried using -fpic without the mcmodel large and that didn't help.

Is it an MKL version error, does the 10.2.7.041 fix something in the 10.2.6.038 MKL ???
What do these errors mean?
What can I correct to get it to compile/link ?

my link line on the x86_64 system that i'm using, from the mkl advisor, is:
$(MKL)/lib/em64t/libmkl_solver_ilp64.a -Wl,--start-group $(MKL)/lib/em64t/libmkl_intel_ilp64.a $(MKL)/lib/em64t/libmkl_intel_thread.a $(MKL)/lib/em64t/libmkl_core.a -Wl,--end-group -openmp -lguide -lpthread

the link errors with -mcmodel large:

/opt/intel/mkl/10.2.6.038/lib/em64t/libmkl_intel_thread.a(zgemm_drv.o): In function `mkl_blas_zgemm':
../../../../blas/thread/32e/level3/zgemm.c:(.text+0x7d3): relocation truncated to fit: R_X86_64_PC32 against `___kmpv_zeromkl_blas_zgemm_1'
../../../../blas/thread/32e/level3/zgemm.c:(.text+0xf83): relocation truncated to fit: R_X86_64_PC32 against `___kmpv_zeromkl_blas_zgemm_0'
/opt/intel/mkl/10.2.6.038/lib/em64t/libmkl_intel_thread.a(xerbla.o): In function `mkl_serv_setxer':
../../../../serv/kernel/xerbla.c:(.text+0x3): relocation truncated to fit: R_X86_64_PC32 against `mkl_xerbla_address'
/opt/intel/mkl/10.2.6.038/lib/em64t/libmkl_intel_thread.a(xerbla.o): In function `mkl_serv_xerbla':
../../../../serv/kernel/xerbla.c:(.text+0x54): relocation truncated to fit: R_X86_64_PC32 against `mkl_xerbla_address'
/opt/intel/mkl/10.2.6.038/lib/em64t/libmkl_intel_thread.a(mkl_threading.o): In function `mkl_serv_mkl_get_max_threads':
../../../../serv/kernel/mkl_threading.c:(.text+0x40): relocation truncated to fit: R_X86_64_PC32 against `__get_N_Cores_called'
/opt/intel/mkl/10.2.6.038/lib/em64t/libmkl_intel_thread.a(mkl_threading.o): In function `MKL_get_N_Cores':
../../../../serv/kernel/mkl_threading.c:(.text+0x847): relocation truncated to fit: R_X86_64_PC32 against `__get_N_Cores_called'
../../../../serv/kernel/mkl_threading.c:(.text+0x8d1): relocation truncated to fit: R_X86_64_PC32 against `mklaff_len'
../../../../serv/kernel/mkl_threading.c:(.text+0x8ec): relocation truncated to fit: R_X86_64_PC32 against `mklaff_len'
../../../../serv/kernel/mkl_threading.c:(.text+0x9a9): relocation truncated to fit: R_X86_64_PC32 against `mklaff_len'
../../../../serv/kernel/mkl_threading.c:(.text+0xa06): relocation truncated to fit: R_X86_64_PC32 against `mklaff_len'
../../../../serv/kernel/mkl_threading.c:(.text+0xcf2): additional relocation overflows omitted from the output

Ron_Green · ‎03-29-2011

even with -mcmodel large there is a limit to the amount of data you can have allocated in static arrays for x86_64 architecture. This data segment is limited to 2GB. I would guess you have a number of COMMON blocks or arrays declared in your main program statically. The data in COMMON plus any other statically declared arrays PLUS your code must fit in 2GB.

To fix this, make your data allocatable, get it out of COMMON and into modules OR reduce the size of your arrays.

ron

TimP · ‎03-29-2011

Use of dynamic libraries, including MKL, may ease the static data limit. I don't remember if there is specific advice about whether dynamic or static libraries should be used with -mcmodel large. -mcmodel medium would be preferable, if there's no specific need for large.

ron7000 · ‎03-29-2011

thanks.

question: we are in the process of getting new servers from SGI which will have the nehalem cpu architecture, running SLES11.
Will this 2GB static limit still exist on this new cpu architecture?
Is this limit a cpu thing, or a compiler thing?

TimP · ‎03-29-2011

The mcmodel characteristics are a feature of the x86_64 linux operating system, including SLES11.

jkwi · ‎03-30-2011

This is very helpful. I've never seen it explained in this context. I was under the impression that if
you had > 2GB data, just use -mcmodel=medium -shared_intel and you are good.
So, the memory model only helps if your data is dynamically allocated?

The sum of all statically allocated data (including COMMON) + code cannot be >2GB for x86_64.

Could you point to a reference about the modules, allocating data >2GB within a module is ok?
Heap vs stack?

Thanks

jkwi · ‎03-30-2011

from ifort 11 help:

-mcmodel=
use a specific memory model to generate code and store data
small - Restricts code and data to the first 2GB of address
space (DEFAULT)
medium - Restricts code to the first 2GB; it places no memory
restriction on data
large - Places no memory restriction on code or data

Doesn't say anything about static/dynamic?

TimP · ‎03-30-2011

I would guess that use of the dynamic libraries helps keep some run-time support data structures out of the static data area. ALLOCATABLE arrays bypass the static data region limit, regardless of -mcmodel setting.
I haven't seen an authoritative document about whether the recent ifort versions are using dynamic heap allocation for module data referred to in the main program, so that they don't count against the static data limit. If so, this may be a change in the last 2 years. I suppose heap usage when setting up main program (before any possibility of reaching a threaded region) doesn't raise any thread safety issues.
Stack usage in subroutines seems to be the only generally practical strategy for thread safety (e.g. when declared RECURSIVE). Again, I suppose expert clarification would be desirable.
I don't see how you can hope to use any significant fraction of 2GB for stack, particularly in threaded applications. The 64-bit Intel OpenMP default thread stack size is 4MB; you can adjust it by KMP_STACKSIZE environment variable or library function call. We've run into situations where we had to cut back on the use of threadprivate arrays in order to increase the number of threads which could run efficiently.
I was using a browser search to see whether SLES11 has a stated size for "unlimited" overall stack, but didn't find an answer.
Heap usage by the ifort /heap-arrays option is fine for non-threaded programs, but this question certainly raises issues, as you are likely to want parallel execution when a program is large enough to encounter issues with data size limits. As has been mentioned before on this forum, there is an optional numerical parameter which would automatically use stack allocation up to the stated size, in the unusual situation where the compiler knows the required allocation would never exceed that size.

drMikeT · ‎12-18-2012

I understand this seems to related to the Linux linker, but has this behavior changed in newer compilers sets (12.1 or 2013) ? Mike

Steven_L_Intel1 · ‎12-18-2012

This is not a compiler issue - it is fundamental to the design of the Linux executable format.

okkebas · ‎12-18-2012

is the comment from Ronald W Green (Intel) Tue, 03/29/2011 - 07:34 correct? According to the other intel articles (e.g. http://software.intel.com/en-us/articles/avoiding-relocation-errors-when-building-applications-with-large-global-or-static-data-on-intel64/) the flag "mcmodel=medium" flag is specifically used in cases where static data exceeds 2gb. If it's Linux issue what is the use of -mcmodel=medium

Steven_L_Intel1 · ‎12-18-2012

The -mcmodel option changes how instructions reference code and/or data. While the compiler has to do something to implement the option, it is not an Intel compiler specific thing.

okkebas · ‎12-18-2012

I guess it's partly an intel compiler specific thing since it doesn't happen with gfortran. Ifort uses relative addressing by default (resulting in possibly reduced code size and a bit faster execution). Regarding the linux issue; are there still linux (64bit) distributions that limit the data segment to 2GB?

Steven_L_Intel1 · ‎12-19-2012

I am not aware of such distributions, but, as I say, Linux is "a twisty maze of little distros, all different", so it would not astonish me if there was one out there. But see also http://en.wikipedia.org/wiki/X32_ABI

okkebas · ‎01-07-2013

I tried to compile the following snippet: integer*8,parameter :: N=1000000000 real*8, dimension(N) :: MY_BIG_ARRAY COMMON /myc/ MY_BIG_ARRAY (using both ifort 12.1 and 11.1). I tried in two different ways; only a main program and a main plus called subroutine (both contain the COMMON block). When I don't use "-mcmodel=medium -shared-intel" it complains about the relocation errors as expected. However, when I only use the "-shared-intel" flag it works fine. I didn't expect that since the size of common block array MY_BIG_ARRAY is > 2gb. Why does it compile when I don't add the -mcmodel compiler flag?

jimdempseyatthecove · ‎01-09-2013

The 2GB limitations (as Steve indicates) are Linker issues (and object file header issues). The object file format consist of data packets consisting of a header, blob of data, (optional trailer). The header contains information as to segment name ("text", "data", "bcc", ... "yourName",...), sizeOfBlob, (optional offset into former use of same named segment). Where sizeOfBlob (and optional offset) are limited to 32-bits. The linker you use may or may not have a 32-bit limitation on the segment size (accumulation of non-overlapping sizeOfBlob packets targeting a given segment name, or optional offset + sizeOfBlob of any packet, whichever is greater). This should be easy enough to fix, excepting that the compiler writers and linker writers are not on the same team. Though where a compiler vendor is also the linker vendor they (and runtime loader vendor) they may have a solution. A potential work around is some compilers permit you to specify a segment name for the subsequent emission of data (the "yourName" listed above). Think of this as a variation of the COMMON (though I think these fall into the "data" segment). Meaning you may be able to partition your data into .lt. 2GB packets. (not a soloution for a .gt. 2GB array). The better alternative is to make your .gt. 2GB arrays allocatable (when "static"), or allocatable/heap array when local to subroutine/function. This does requie addition of initialization code to perform the allocation. Jim Dempsey

okkebas · ‎01-09-2013

I thought the 2GB limits are indexing issues, e.g. normally ifort uses IP-relative indexing (32 bit) to save space (and potentially faster code). This is fine since most static data is smaller than that anyway. However when data is too large absolute addressing is required (64 bit). This is what I understand from the explanation of mcmodel in the ifort man page. Without -mcmodel=medium ifort will use the relative addressing and I expected that should cause problems when iterating over all the elements in MY_BIG_ARRAY. The -shared-intel is what links in the correct libraries.

okkebas · ‎01-09-2013

Can you think of a little sample program that will compile when adding the -mcmodel=medium but will not compile when omitting the -mcmodel=medium flag. Maybe somewhat unrelated, I thought maybe it depends on where the array is stored (bss or data) so I tried both. When I compile the above snippet, MY_BIG_ARRAY is stored in the bss section according to the "size" command (which makes sense since it's not initialized). It compiles fine without mcmodel flag. When I add a data statement it is stored in the data section (also makes sense). However, the compiler will complain with the following message: test.f90(6): error #5524: Variable MY_BIG_ARRAY is larger than 2147483647 bytes and cannot be initialized no matter if I use mcmodel=medium. What I thought to be a straightforward flag seems "mcmodel" is getting more and more confusing to me. For reference, this is the complete sample code I used: program test integer*8,parameter :: N=1000000000 integer*8 :: i real*8, dimension(N) :: MY_BIG_ARRAY COMMON /myc/ MY_BIG_ARRAY !data MY_BIG_ARRAY/N*1.0/ do i=1,N MY_BIG_ARRAY(i) = N-i enddo print *,MY_BIG_ARRAY(N) stop end

jimdempseyatthecove · ‎01-09-2013

FWIW, here is a link to Microsoft Portable Executable and Common Object File Format Specification http://msdn.microsoft.com/en-us/library/windows/hardware/gg463119.aspx This is a suggestion (observation) for Steve and Intel IVF (on Windows) links in/to the C Runtime Library C++ apps also link into the C Runtime Library (or derived varient), but also contain code to run the ctor's of the statically declared objects containing ctors. IVF, with little change, could, for very large static array declarations, potentially generate an array descriptor to be placed in the 2GB limited static memory together with the equivilent of a C++-like ctor that performs the allocation _prior_ to entry to the main PROGRAM. The user could be given an option to specify the cutoff and/or a !DEC$ ATTRIBUTE to indicate the programmers prefrence. Effectively the static array (descriptor) is transparently converted into a static pointer to equivilent array (to be allocated by the IVF-to-C runtime system initialization). Jim Dempsey

Steven_L_Intel1 · ‎01-09-2013

This is, however, the Linux/Mac forum, so links to Windows information is probably of less interest. As far as I know, the issue in this thread is about instruction addressing modes and not object code format. Jim, your suggestion already exists, though we're loathe to tell anyone about it. Read up on "Dynamic COMMON". But really, we'd much rather people use allocatable arrays rather than a vendor-specific hack.

okkebas · ‎01-09-2013

right, allocatable arrays should always be preferred over static arrays in COMMON blocks and it will not cause problems like the ones mentioned above. However, I'm still not sure about the IP relative addressing in ifort. My little sample program should have failed when not including the -mcmodel=medium flag but that's not the case and I don't understand why that is. It makes me wonder what the use of the mcmodel=medium flag is. Can anybody provide some small sample code that shows the use of mcmodel=medium (i.e. compiles fine when including -mcmodel=medium flag but doesn't compile when omitting it)? It would really help me understand better. Regarding the other error message (from code in previous message): test.f90(6): error #5524: Variable MY_BIG_ARRAY is larger than 2147483647 bytes and cannot be initialized Is that somehow related to the 2GB limit? The article at http://software.intel.com/en-us/articles/fdiag5524 also covers this error but didn't provide any explanation.

help, relocation truncated to fit R_X86_64_PC32 against symbol