We have a Fortran code that uses Pardiso. Everything was fine until we had to increase the size of the main matrices that are being solved. Unfortunately the old way of compiling/linking the code started issuing 'truncation' errors, similar to this:
relocation truncated to fit: R_X86_64_PC32 against symbol `message_catalog' defined in COMMON section
So we were told that we needed to use dynamic linking and '-shared-intel' option to fix the memory problem above.
Since then, the performance has seen a huge hit. Before all 8-core on our machine were used but now we see only 2 to 6 cores being used.
So My main question: is this something expected and non-fixable or we are doing something wrong?
We are using Intel Compiler 11.0, which seems to include 10.1 version of MKL.
This is how we statically compiled our code:
ifort foo.f -O -tune host -L/path/to/mkl/em64t /path/to/em64t/libmkl_solver.a /path/to/mkl/em64t/libmkl_lapack.a /path/to/fce/lib/libguide.a -lmkl_em64t -lpthread
The new way (causing reduced performance) is:
ifort foo.f -mcmodel=large -shared-intel -O -tune host -lmkl_core -lguide -lmkl_intel_lp64 -lmkl -lmkl_solver -lmkl_lapack -lguide -lmkl_em64t -lpthread
where LD_LIBRARY_PATH includes/paht/to/Compiler/11.0/081/mkl/lib/em64t
I should add that I've tried the 'Link Advisor' too, but the resulting executable 'seg faults'!
Any hint/input regarding this issue is greatly appreciated,
thanks in advance
ifort -mcmodel=large -shared-intel -O2 -tune host -lmkl -lmkl_em64t -liomp5 -lmkl_intel_thread foo.f
Well, Composer XE 2011 contains MKL version 10.3 ( see here the link for getting these version) and setting mkl=parallel
It will allow you to link with standard threaded Intel MKL, see link to the linkier adviser on the top of the forum.
In the case if you see the real degradation with the 10.3 version too, then what is the level of that degradation?
What is the size of the input?
It would be better if you give us the test case for checking the problem on our side.