MKL crashes in mkl_vml_kernel_sSub

Mats_S_ · ‎01-07-2014

Hi,

I'm porting a Windows C/C++ application that uses MKL to the Macintosh, and am running into a crash almost immediately.

The call I'm doing is this:

vsSub(&len, src1, src2, dest);

, where len is declared as int, src1, src2 and dest are declared as float*.

What happens is that my application crashes on EXC_BAD_ACCESS in mkl_vml_sSub. The exact same code executes and behaves well on Windows.

I've linked the single library (-lmkl_rt) version, and am using XCode 4.6 with LLVM GCC 4.2. I have tried linking the static libraries, the dynamic libraries and both in the threaded and non-threaded versions. I have also tried explicitly switching off threading via environment variables.

I compile with the "Use Open MP" flag set to "yes" in all parts of my application, and my application consists of a number of statically linked libraries plus some 3rd party libraries and .dylibs, including Qt.

My guess is that I've missed something in the configuration or linking of the MKL library, but does anyone know what that may be?

Ying_H_Intel · ‎01-08-2014

Hi mats,

Could you please attach a small test code and let us know more details about your OS, MKL version etc?

I just try the vsSub.cpp in mkl example under MKL install directory on our lab machine with 10.8.4. It looks everything is fine.

> bash-3.2$ cat source/vssub.c

>gcc -w -m64 -I/opt/intel/composer_xe_2013_sp1.1.103/mkl/include source/vssub.c -L/opt/intel/composer_xe_2013_sp1.1.103/mkl/lib -lmkl_rt

>export DYLD_LIBRARY_PATH="/opt/intel/composer_xe_2013_sp1.1.103/mkl/lib"::"/opt/intel/composer_xe_2013_sp1.1.103/mkl/../compiler/lib"

macmini01:vmlc yhu5$ ./a.out
vsSub test/example program

Arguments vsSub
===============================================================================
-10000.00000000000000 -10000.00000000000000 0.00000000000000e+00
-7777.77783203125000 -7777.77783203125000 0.00000000000000e+00
-5555.55566406250000 -5555.55566406250000 0.00000000000000e+00
-3333.33325195312500 -3333.33325195312500 0.00000000000000e+00
-1111.11108398437500 -1111.11108398437500 0.00000000000000e+00
1111.11108398437500 1111.11108398437500 0.00000000000000e+00
3333.33325195312500 3333.33325195312500 0.00000000000000e+00
5555.55566406250000 5555.55566406250000 0.00000000000000e+00
7777.77783203125000 7777.77783203125000 0.00000000000000e+00
10000.00000000000000 10000.00000000000000 0.00000000000000e+00

Relative accuracy is 0.0000000000000000

#include <stdio.h>
#include "mkl_vml.h"

#include "_rms.h"

int main()
{
float fA[10],fB[10];
float fBha0[10],fBha1[10],fBha2[10];
float fBla1[10],fBla2[10];
float fBep1[10],fBep2[10];
float CurRMS,MaxRMS=0.0;

MKL_INT i=0,vec_len=10;

fA[0]=-10000.0000;
fA[1]=-7777.7778;

vsSub(vec_len,fA,fA,fBha0);

vmsSub(vec_len,fA,fA,fBep1,VML_EP);

vmlSetMode(VML_EP);
vsSub(vec_len,fA,fA,fBep2);

}

macmini01:vmlc yhu5$ more /System/Library/CoreServices/SystemVersion.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>ProductBuildVersion</key>
<string>12E55</string>
<key>ProductCopyright</key>
<string>1983-2013 Apple Inc.</string>
<key>ProductName</key>
<string>Mac OS X</string>
<key>ProductUserVisibleVersion</key>
<string>10.8.4</string>
<key>ProductVersion</key>
<string>10.8.4</string>
</dict>
</plist>

macmini01:vmlc yhu5$ system_profiler|grep Processor
Processor Name: Intel Core i7
Processor Speed: 2 GHz
Number of Processors: 1

Mats_S_ · ‎01-08-2014

Accidental double-post removed.

Mats_S_ · ‎01-08-2014

OS version: Mac OS X 10.8.5

MKL version: Composer XE 2013 SP 1.103

Compiler: XCode with LLVM GCC 4.2

Processor: Intel Core i7, 2GHz, 1 processor (MacBook Air)

The code that executes is the end of a normalization kernel that subtracts the mean mu (which is in a vector) from the columns of a matrix and places the result in another matrix:

~ for(int j = 0; j < A.width(); ++j)
sub_(A.height(), &A(0,j), &mu(0,0), &B(0,j));

The sub_() function interfaces to vssub:

~static inline void sub_(int len, const float* src1, const float* src2, float* dest)
{
::vssub(&len, src1, src2, dest);
}

The matrix class is our own, and it uses MKL_alloc()/MKL_free() for the memory management. Again, the code runs fine in Windows since several years, but on the mac it crashes on one of the vmovups here:

~~0x1008326d6: jl     0x100832acc               ; mkl_vml_kernel_sSub_E9HAynn + 1164
0x1008326dc: movl   %ebx, %edx
0x1008326de: andl   $-32, %edx
0x1008326e1: movslq %edx, %rax
0x1008326e4: xorl   %ecx, %ecx
0x1008326e6: vmovups(%r12,%rcx,4), %xmm0
0x1008326ec: vmovups16(%r12,%rcx,4), %xmm2
0x1008326f3: vmovups32(%r12,%rcx,4), %xmm4
0x1008326fa: vmovups48(%r12,%rcx,4), %xmm6
0x100832701: vmovups64(%r12,%rcx,4), %xmm8
0x100832708: vmovups80(%r12,%rcx,4), %xmm10
0x10083270f: vmovups96(%r12,%rcx,4), %xmm12
0x100832716: vmovups112(%r12,%rcx,4), %xmm14

Which of the instructions it crashes on seems to be dependent on the size of the matrix I'm working on.

A typical crash:

Error message: "Thread 1: EXC_BAD_ACCESS (code = 1, address=0x10dfaa000)"

Crash on the vmovups64 instruction.

r12 contains 10dea9040, which is consistent with the start of the data for matrix A (containing 512 x 512 floats)

As a side note, 0x10dfaa000 is not near any of the data owned by the involved matrices A, mu or B.

I have tried running the vsSub example, and it runs fine and produces the expected result.

Gennady_F_Intel · ‎01-08-2014

in the case if " vsSub example, and it runs fine and produces the expected result.", then please give us the comprehensive test whci can be compiled and executed on our side to check the problem.

Mats_S_ · ‎01-08-2014

Hi,

I found the problem. It appears that the gcc compiler "helpfully" directs me to a version of vsSub that has the signature (int, float*, float* , float*) even when I call it with an int* as first argument. Removing the "&" in our own, inlined wrapper function fixed it:

~~~static inline void sub_(int len, const float* src1, const float* src2, float* dest)
{
::vssub(len, src1, src2, dest);
}

The same problem showed itself with the other vml functions later on in the code.

I found this by stepping the assembly code inside the vml kernel and realizing it was comparing a loop variable to a ginormous number, reminiscent of what could be a pointer.