Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
6956 Discussions

Mysterious behavior of mkl_mc.so and mkl_def.so

dbacchus
Beginner
416 Views
IFC: latest
MKL: 10.1.1.022 (due to the bug in MKL 10.2.2.025)
LINKING:
[cpp]LIBS=-L/$(FEASTROOT)/lib/x64/ -lfeast $(MKLPATH)/libmkl_solver_lp64.a -Wl,--start-group $(MKLPATH)/libmkl_intel_lp64.a $(MKLPATH)/libmkl_intel_thread.a $(MKLPATH)/libmkl_core.a -Wl,--end-group -lpthread[/cpp]

I discovered a bug so mysterious that I was only able to identify its cause/resolutionby a mere chance (or luck).
Iwas trying to run my code on a cluster, but within a singe node only, i.e. with OpenMP only.

Very soon I discovered that the code that runs fine on any regular linux workstation,on the cluster produces an incorrect result afterthe third iteration. Two previous iterationsgave identical to workstations (correct) results.

I figured out the reason was in dynamic MKL libraries that somehow were used, even though the code was compiled with -static and linked with static libraries (see above).
A logical explanation would be that the error arises due to an interference between outdated dynamic libraries installed on the cluster and the newer MKL with which the code was linked.
I tried to compile the code (obviously I compile on MY machine, not the cluster) with dynamic libraries:
[cpp]LIBS=-L/$(FEASTROOT)/lib/x64/ -lfeast $(MKLPATH)/libmkl_solver_lp64.a -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -lpthread[/cpp]

Since theexecutable now required dynamic libraries I had to copy them, one by one, to the cluster too (and, of course, LD_LIBRARY_PATH had to be set).
I had to copy the following .so one by one, until the code was able to launch:
1. libmkl_intel_lp64.so
2. libmkl_intel_thread.so
3. libmkl_core.so
4. libmkl_lapack.so
5. libiomp5.so

So far so good. Then, before the first iteration was over, the code again stopped and complained that

MKL FATAL ERROR: Cannot load neither libmkl_mc.so nor libmkl_def.so

which was not unexpected.
Now the mysterious part. I first copied libmkl_mc.so to the cluster. The code ran fine, but on the third iteration produced an incorrect result (as before).
Then, I don't know why, I decided to try to delete libmkl_mc.so and copy libmkl_def.so instead. The code now runs fine producing correct results!!

What is going on here? What are these dynamic libraries mkl_mc and mkl_def, why are they "interchangeable", and why only one (mkl_def) works fine?!

P.S. I since read that mkl_mc is the kernel for core duo and mkl_def is the default kernel. The cluster has Xeon CPU X5355. Should I use mkl_p4nand if so - how?

P.P.S. Now when I understand mkl_mc vs mkl_def issue, my main question is:
Why the code with static linking (eventually) fails, and the one with dynamic linking works on the cluster?
Why it happens only on a cluster (on any workstation both dynamic and static versions work identically)?
0 Kudos
3 Replies
Vladimir_Lunev
New Contributor I
416 Views

Hello dbacchus,

You asked: "Why the code with static linking (eventually) fails, and the one with dynamic linking works on the cluster?"

MKL always use the most appropriate/effective code depending on your processor. For the static case you can't influence on MKL behaviour: MKL uses the "mc" code on your system.
For the dynamic case MKL firstly tryes to use the optimized code (mkl_mc.so). If MKL failed to load mkl_mc.so, then MKL uses libmkl_def.so. This is your the second experiment - mc.so is unavailable but def.so is present.

So the answer to your question is: the code "mc" is used with static linking but the code "def" is used with dynamic linking (mc.so is absent). Commonly these codes differ and you can see the different results in your application.

To investigate and address the possible problem to proper person I need more informations.
Which MKL functionality (functions) do you use? Is it possible to provide for us a small testcase which demonstrates the issue? Any other additional info is welcomed.

Note you can't use the mkl_p4n.so while MKL requires mkl_mc.so or mkl_def.so.

Thanks,
-Vladimir

0 Kudos
dbacchus
Beginner
416 Views
Hi Vladimir!
Thanks so much, this, actually, explains it!

Did I understand you correctly thatthere is no way to force statically linked code to use mkl_def set instead of mkl_mc? Good that I can still use the dynamic linking then!

The problem with a test case is that my code is rather complex: a non-linearsystem of diff. equations of second order (open system Schrodinger + Poisson) is being solved by computing a large (N~10,000-100,000) Hermitian eigenvalue problem (only<2% of the bottom eigenstates are needed) and repeated "small" (Nc~100) complex matrix inversions. Plus some Newton and linesearch techniquesare being used, where a sparse linear system(size of N)is solved to find the Jacobian.
But the actual problem is thatfor mytestsystem the error occurs only in the third iteration, i.e. the code on the cluster produces correct output resultsfor the first two iterations.And, to make things more interesting, for smaller, easier-to-handle toy systems, there is no error at all...
List of MKL functions being used directly:
ZGESV
ZGEMM/DGEMM
DSBMV
Plus, I'm using either FEAST (uses PARDISO) or ARPACK (DSAUPD/DSEUPD that usemany BLAS and LAPACK routines) to find the eigenfunctions. For both FEAST and ARPACK I get the same problem during the third iteration.
So, asyou see, providing a good test case would be extremely difficult without submitting the entire code, which, in principle, could be done. And of course, I could provide binaries any time.

Perhaps this info may shed some light: when linked with mkl_mc, the code runs slowerand MKL_MEM_STAT reports much larger use of memory ~700 MiB, than when the code linked with mkl_def (~10 MiB). And this is for the same number of CPUs (16).
Another side info: the statically linked code runs fine and produces correct results on Win7 x64, linux x64, linux x86 systems. Generally (if the kernels are compatible) I can use the same linux binary on different workstations and the code works fine. This cluster is the first case when I had to use the dynamic linking...

P.S. When linking dynamically on the cluster, I get the following warning message (the code works fine though):
OMP: Error #15: Initializing libiomp5.so, but found libiomp5.so already initialized.
OMP: Hint: This may cause performance degradation and correctness issues. Set environment variable KMP_DUPLICATE_LIB_OK=TRUE to ignore this problem and force the program to continue anyway. Please note that the use of KMP_DUPLICATE_LIB_OK is unsupported and using it may cause undefined behavior.
0 Kudos
TimP
Honored Contributor III
416 Views
Quoting - dbacchus

P.S. When linking dynamically on the cluster, I get the following warning message (the code works fine though):
OMP: Error #15: Initializing libiomp5.so, but found libiomp5.so already initialized.
OMP: Hint: This may cause performance degradation and correctness issues. Set environment variable KMP_DUPLICATE_LIB_OK=TRUE to ignore this problem and force the program to continue anyway. Please note that the use of KMP_DUPLICATE_LIB_OK is unsupported and using it may cause undefined behavior.
This seems to indicate that you linked the OpenMP library statically somewhere, as well as linking dynamic at the final stage. It might happen that threadprivate data don't persist correctly among consistent threaded regions, but your program might not rely on that.
0 Kudos
Reply