I would like to compute Y^T = AX^T where A is sparse, and Y and X are dense matrices (which results from the concatenation of multiple vectors in row-major order). I've seen that you released an interface to Feast on the last update. When ijob = 30, this is what one needs to compute when using row-major order, and I was wondering if the implentation of such a procedure is available in the MKL.
If I understood your question correctly you've asked about multiplication of sparse matrix by dense matrix where dense matrix is stored in row-major (C-style) order, right? MKL already supports such functionality. For example in ?CSRMM interfaces for 0-based CSR matrix it is supposed that dense matrices are stored in row-major order (C style) while for 1-based indexing they are supposed to be presented in column-major order (Fortran style).
Thanks for your answer. I didn't think that the numbering of the CSR would impact on the order of the dense matrix (I thought it was only dependant on the compiler - C v. Fortran). I can now get the correct results, but what it basically means is that, if I call dcsrmm with the exact same parameters, changing only matdescra from 'F' to 'C' (and changing indx and pntrb accordingly by decrementing all values by 1), I won't get the same results, right ? I saw on the last update that "performance of 0-based DCSRMM improved significantly". Is there some kind of scaling benchmarks available to compare 0-based and 1-based DCSRMM ? Which one would you rather use if you had the choice ?
Thanks a lot for your help !
Yes, you are right: changing only matdescra from 'F' to 'C' and decrementing indices by 1 will not produce the same result. For correct result transposition of dense matrices is also required.
For general non-transposed case I'd prefer to use 0-based DCSRMM instead of 1-based one.
I'm sorry to bother you again about this matter but do you think there is a way to avoid transposing the dense matrices, for example by calling DCSCMM instead of DCSRMM and working with the transpose ?
Thanks for any help that could lead to a way to compute B = A * X where each column of X are stored contiguously in memory without having to transpose dense matrices with a 0-based general CSR matrix.