I am porting code (C, so row major) which makes use of ?gemm, ?syrk and ?syr2k calls from dense matrices to sparse matrices, and would to know if there is a simpler way of calculating the various internal matrix products than the following:
?syrk: use ?csrmultd. I assume since this method only allows 1 based indexing the resulting dense matrix is column major, but I would like confirmation.
?syr2k: use two ?gemm calls here instead.
?gemm: This case gets fairly complicated, and it would be extremely nice if someone can tell me if there are methods / options I am overlooking which would simplify this. A and D are the dense result and multiplicand matrices, S['] is a sparse matrix which may be transposed.
A = S['] * D + A
- Use mkl_dcsrmm directly (using zero based indexing)
A = S['] * D' + A
- Transpose D -> Dt
=> A = S['] * Dt + A
- use mkl_dcsrmm (using zero based indexing)
- Convert S to one based indexing, forces mkl_dcsrmm to implicitly use col major C arrays (D' -> Dt)
- Calculate temp. matrix Tt = S['] * Dt via mkl_dcsrmm
- T' (row major) = Tt (col. major)
- calculate A = T' + A;
A = D * S['] + A
- Transpose equation:
-> A' = (D * S['])' + A' = S[!'] * D' + A'
- Convert S to one based indexing, forces mkl_dcsrmm to implicitly use col major C arrays (A' -> At, D' -> Dt)
=> At = S[!'] * Dt + At
-> Use mkl_dcsrmm
- Transpose A' => At, D' => Dt
=> At = S[!'] * Dt + At
- Use mkl_dcsrmm (using zero based indexing)
- Transpose At
A = D' * S['] + A
-> Transpose equation:
-> A' = (D' * S['])' + A' = S[!'] * D + A'
- Transpose A' => At
=> At = S[!'] * D + At
-> Use mkl_dcsrmm (using zero based indexing)
-> Transpose At => A'
-> Calculate temp. matrix T = S['] * D via mkl_dcsrmm
-> Calculate A = T' + A
In theory I could always store my sparse matrix as one based, and if I need to treat it as zero based add dummy rows / columns to the dense matrices to catch the additional row/column created when multiplying, which means converting between the indexing won't take any time.
One final question: Could someone confirm that it is possible to use mkl_?omatadd to calculate A = A + B without using a temp matrix? The documentation doesn't state whether the memory is allowed to overlap between input and output if no transposition is being done.
Yes, ?csrmultd like all other 1-based Sparse BLAS functionality assumes that resulting dense matrix is column major. You can also use ?csrmultcsr to produce output in 1-based CSR format.
For replacing ?gemm functionality I see no other methods that could simplify this. BTW, do you consider changing of underlying algorithm instead of direct rewriting it from dense to sparse?
Question about mkl_?omatadd: behavior is uncertain in the case of inplace usage.
There isn't much I can do with the basic algorithm - what I do want to do is reorder the operations to accumulate as many operations as possible in each temp matrix before transposing them, as well as adding an internal "isTranposed" flag to the matrices so I can delay explicit transposing until the data is fed into the next steps.
mkl_?omatadd: That is unfortunate, but its simple enough to replace those calls with manually calculating A = A + B' instead of creating a temporary copy of A. Is there any chance of a future MKL version including a function to calculate A = A + op(B)
Is it possible to calculate A = A + B without using a temp matrix? I think not, for the following reasons. Although common mathematical convention tells us that the A-s on the two sides of the assignment are the same, that is true only in the dense matrix representation. In a sparse matrix representation, the number of nonzero entries of the new A (on the left) may be less than, equal to, or greater than the number of nonzero entries of the old A (on the right). In fact, not even the size of the to-be-returned matrix is known before the addition is completed. Because of this, the user does not have the information necessary to allocate A in advance (unless, of course, when assigning a full array of size m X n). Consider this example, in which the matrices are represented in the COO notation (row, column, value):
A = (1,2, 1.0) (2,1, 0.5) (2,2, -1.0) B = (2,2, 1.0) A+B = (1,2, 1.0) (2,1, 0.5)
In fact, the COO representation does not, by itself, even tell us the size of the matrix that it represents.
According to the MKL Reference Manual, mkl_?omatadd requires 3 non-overlapping matrices.
Though mkl_?omatadd may process overlapping matrices correctly in your use-case, this behavior may change in a future MKL release without prior notice.
If absence of this feature prevents you from using MKL, please submit a feature request at premier.intel.com.