MKL function load error: cpu specific dynamic library is not loaded.

Peter_B_9 · ‎08-07-2013

I'm just getting started with MIC and I'm hitting what must be a simple configuration issue.

I want to run a program using OpenMP and MKL on the MIC directly (i.e. no hybrid offload magic). OpenMP is working fine, but when I try to call a simple MKL function (e.g. vdLn) I get the following error: "MKL function load error: cpu specific dynamic library is not loaded."

I've been able to catch this in gdb. The stack trace when the error is printed is:

#0 0x00007ffff08902d0 in write () from /lib64/libc.so.6
#1 0x00007ffff0840a13 in _IO_new_file_write () from /lib64/libc.so.6
#2 0x00007ffff0840917 in new_do_write () from /lib64/libc.so.6
#3 0x00007ffff084127e in _IO_new_file_xsputn () from /lib64/libc.so.6
#4 0x00007ffff081ae1d in buffered_vfprintf () from /lib64/libc.so.6
#5 0x00007ffff0815ece in vfprintf () from /lib64/libc.so.6
#6 0x00007ffff0820099 in fprintf () from /lib64/libc.so.6
#7 0x00007ffff15bee4a in mkl_serv_print () from ../lib/libmkl_core.so
#8 0x00007ffff7171683 in LoadFunctions () from ../lib/libmkl_intel_thread.so
#9 0x00007ffff717347c in mkl_vml_serv_threader_d_1i_1o () from ../lib/libmkl_intel_thread.so
#10 0x00007ffff7a3b82d in vdLn () from ../lib/libmkl_intel_lp64.so
#11 0x0000000000402065 in main ()

I'm using the following simple command line to build the test program:

/opt/intel/composer_xe_2013.5.192/bin/intel64_mic/icc -mmic -openmp -std=c99 -mkl sample.c

I've determined that mkl_serv_cpu_detect() is returning 0. I suspect that this is related to the problem. In fact, the version of mkl_serv_cpu_detect() which I'm picking up from libmkl_core.so is only two instructions long: xor %eax,%eax; retq. Is that expected? Do I perhaps have the wrong version of libmkl_core.so installed?

Frances_R_Intel · ‎08-08-2013

When you compile, are you setting your environment using:

source /opt/intel/composer_xe_2013.5.192/bin/compilervars.sh intel64

Peter_B_9 · ‎08-08-2013

Yes. I ran source /opt/intel/composer_xe_2013.5.192/bin/compilervars.sh intel64 before I compiled.

I tried source /opt/intel/composer_xe_2013.5.192/bin/compilervars.sh mic, but that reported "ERROR: Unknown switch 'mic'. Accepted values: ia32, intel64"

TimP · ‎08-08-2013

In a normal installation, the MKL and OpenMP libraries don't get copied over to MIC. If you have sudo privilege, an easy way is to scp those .so from the host compiler installation to /lib64/ on the MIC. Putting them in your current MIC directory should work.

Peter_B_9 · ‎08-08-2013

Yup. I did copy the libraries over once I discovered that they weren't installed by default. That got my program to load, but it still fails with the message I pasted above. I guess it's possible that I copied over the wrong version of the libraries, but stepping through the code gdb shows me what appear to be VPU instructions, so I'm pretty confident that these are indeed the MIC MKL libraries.

Loc_N_Intel · ‎08-08-2013

Can you try to copy the libraries again to /lib64 on MIC? Thank you.

Peter_B_9 · ‎08-08-2013

Hi Loc,

I copied all of the libraries over again, and put them in /lib64 so that I'm not modifying my LD_LIBRARY_PATH anymore. I can still reproduce the problem, although I did confuse myself for a while when I discovered that vdLn works for small vectors but not large ones. Not surprisingly, MKL has different codepaths for the two cases.

I misspoke before when I claimed that mkl_serv_cpu_detect() was only two instructions. In fact, it is six instructions long. It reads a static int field named vml_cpu_type. That field is initialized to -1. mkl_serv_cpu_detect that and changes the value to 0 and returns that. I couldn't find any other code which writes to vml_cpu_type. Where is vml_cpu_type expected to be initialized?

Here is the complete source of my test case:

#include <mkl.h>
#include <stddef.h>
#include <stdio.h>

int main(int argc, char ** argv)
{
const size_t c = 1024*1024*10;
double * v = calloc(c, sizeof(*v));

vdLn(c, v, v);
printf("done\n");
return 0;
}

I compile this with /opt/intel/composer_xe_2013.5.192/bin/intel64_mic/icc -mmic -std=c99 -mkl sample.c

Does this work for you?

Peter_B_9 · ‎08-08-2013

A bit more info:

vdLn() calls mkl_vml_serv_threader_d_1i_1o(), setting the fifth argument (%r8) to zero (xor %r8d, %r8d).

It looks like the fifth argument is expected to be a handle to the MKL shared library. 0 isn't a completely unreasonable value for this, since it corresponds to RTLD_DEFAULT. However, mkl_vml_serv_threader_d_1i_1o passes this value on to LoadFunctions(). LoadFunctions() explicitly tests for 0 at the top of the function, prints an error message and exits.

If LoadFunctions() were to tolerate 0 (RTLD_DEFAULT), I think everything would just work. Certainly in my simple test case I can make it work by using gdb to skip over that test in LoadFunctions().

Loc_N_Intel · ‎08-08-2013

Hi Peter,

I was able to reproduce the problem. Let me investigate it and get back to you. Thanks.

Loc_N_Intel · ‎08-08-2013

Hi Peter. The problem is that the original vector is not initialized. After I initialized the vector "v", it works perfectly in both host and mic. Please see the version I modified below:

#include <mkl.h>
#include <stddef.h>
#include <stdio.h>

int main(int argc, char ** argv)
{
//const size_t c = 1024*1024*10;
const size_t c = 10; // for testing only, I reduce the size
double * v = calloc(c, sizeof(*v));
double *v1 = calloc(c, sizeof(*v1));

int i;
for (i=0; i<c; i++)
v = i+1;

vdLn(c, v, v1);

for (i=0; i<c; i++)
printf("%4.2f ", v);

printf("\n");
for (i=0; i<c; i++)
printf("%4.2f ", v1);

printf("\ndone\n");
return 0;
}

Peter_B_9 · ‎08-09-2013

Hi Loc, I do not believe that initialization is the problem. Your example works because you've reduced the size of the vector below the threshold of parallelization. MKL detects that the vector is very small and elects not to parallelize the work. The bug is only exposed when large vectors are involved. Try increasing the size back up to 10M.

Peter_B_9 · ‎08-09-2013

I've been able to workaround the problem by disabling the zero check in LoadFunctions.

If anyone else is hitting this, you can apply the same patch (AT YOUR OWN RISK) with this command:

printf '\xeb' | dd conv=notrunc of=/opt/intel/composer_xe_2013.5.192/mkl/lib/mic/libmkl_intel_thread.so bs=1 seek=$((0xd84661))

This changes the JNZ instruction to an unconditional JMP instruction.

Loc_N_Intel · ‎08-09-2013

Hi Peter,

You are right. The problem is seen again as I increase the vector size to 2540. It is OK if the vector size is less than 2540.

I communicated with a MKL person this morning regarding this issue. He reproduced the issue and believed it is a MKL bug, this issue will be escalated to the engineering team soon. We will address this issue as soon as we can.

Regards.

Loc_N_Intel · ‎08-28-2013

I learned that this problem will be fixed in MKL 11.1.1, which is available in November-December timeframe. A temporary workaround is to call sequential version of VML functions on MIC; that is, set OMP_NUM_THREADS=1 when calling a VML function.