mkl_dcsrmm multiplies a CSR sparse matrix by a dense matrix, storing the result in a dense matrix.
From what I can tell, this function expects the B and C dense matrices to be in row-major order, as opposed to the column-major order used in most other BLAS routines. I've attached a test case which demonstrates that, given the same B and C inputs in the same format, mkl_dcsrmm produces completely different results than dgemm. dgemm calculates C = alpha*A*B + beta*C (as expected), but mkl_dcsrmm appears to calculate C' = alpha*A*B' + beta*C' instead.
This seems to make it very difficult to mix sparse functions with normal BLAS function. Must I transpose my matrices before and after using the sparse functions?
I tried using mkl_dcscmm instead, but it seems to do the same thing. In fact, I think that mkl_dcsrmm and mkl_dcscmm are identical except that they invert the meaning of the transa parameter.
Do all of the sparse matrix-matrix multiply functions expect the dense parts to be row-major? If so, what is the best practise for using these beside traditional column-major BLAS functions?
(I've found that this question has been asked before, but not answered.)
I'm going to answer my own question.
There appears to some pretty important undocumented behavior in mkl_dcsrmm. (At least I can't find it documented anywhere!)
If the sparse matrix uses zero-based indexing (as indicated by the matdescra array) then mkl_dcsrmm assumes that the B and C matrices are row major. If the sparse matrix uses one-based indexing it assumes that the B and C matrices are column major.
I can easily change my sparse matrices to be one-based so that mkl_dcsrmm will work with column-major inputs.
The documentation at http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/GUID-78C55D9B-86FF-4A9F-B5D5-D2F61B9314FC.htm#GUID-78C55D9B-86FF-4A9F-B5D5-D2F61B9314FC really ought to explain this important detail. While you're fixing the doc, there are couple of places where it looks like "m-" has been replaced by an em dash (e.g. pntre(—1) should be pntre(m-1)). Also the description for 'c' is incorrect. It says "On entry, the leading..." but should say "On entry with transa= 'N' or 'n', the leading..."