I'm trying to use mkl_dcscmm from inside a Matlab mex file (on 64-bit Linux, have tried Matlab versions 2009b and 2012a), but having trouble avoiding segfaults. I've tried static and dynamic linking to MKL, sequential vs gomp vs iomp, LP64 vs ILP64, tried changing BLAS_VERSION so Matlab would use the more-complete MKL library I'm trying to use instead of its internal one, nothing seems to work. I've attached the source for my mex file, and the mexopts.sh (change the .dat extension to .sh) with the extra compiler and linker flags I used. I was compiling the mex file and running a test from inside Matlab as follows:
mex -v -largeArrayDims -g mkl_sfmult.cpp
testA = sprand(10,5,0.5);
testB = rand(5,8);
testC = mkl_sfmult(testA, testB)
When I start Matlab in gdb, it appears the segmentation fault occurs in mkl_spblas_lp64_mc_dcsr1tg__f__mmout_par (), in mkl_spblas_lp64_dcsr1tg__f__mmout_par (), in mkl_sfmult.mexa64, otherwise just question marks in the stack trace.
I've tried making a standalone executable where I hard-code the data for a few test matrices, and I get the right result with no segfaults outside of Matlab. I'd really prefer to have a Matlab interface if possible, without needing to save the matrix data to disk or generating new source code for each problem instance. Maybe I'll try using loadlibrary instead? I've been searching through the literature and available libraries and MKL is the only multi-threaded implementation of the sparse matrix times dense matrix kernel that I can find.
Any ideas, recommendations, more info you'd need to figure out what's going on here? Thanks.
Thanks. Indeed it was a dumb indexing mistake, too much switching between 0-based and 1-based languages.
Where I had used A_Jc[k+1] for the number of nonzeros, it should've been A_Jc
The simple mex file works with that change, though sequential MKL appears to be slower than Matlab's own implementation of this operation. Multithreaded MKL is a little faster than serial Matlab, if the matrices are big enough.
If you use OpenMP outside the MKL function calls, you would need to set OMP_NESTED in order to activate mkl threaded. You probably need to fiddle with number of threads etc in an effort to give each MKL thread its own core in hope of making this advantageous.
Wouldn't MATLAB itself use a threaded library?
Wow, thanks for the quick reply. I had forgotten to remove the debug flag, and a few other things. Whether the mex file will use multiple cores is being inconsistent - it seems to depend on the value of MKL_NUM_THREADS when I compile the mex file, which doesn't make much sense to me. Matlab might be doing something quirky with the environment variables.
As best I can tell, Matlab doesn't do multithreaded sparse operations yet, at least not sparse*dense matmul. It uses its own version of MKL for dense BLAS, but not sparse.