Using mkl_sparse_spmm with pardiso and QR decomposition of sparse, tall matrices

Ioannis_K_ · ‎09-04-2025

Hello,

I have a Fortran code establishing a sparse matrix [A] representation of a tall, double-precision, real matrix (with full column rank). The number of rows m is much-much larger than the number of columns n.

The matrix is stored in CSR3 format, and described through the variables nrows (number of rows), ncol (number of columns), Array1 (the actual values of the matrix), Irowptr (row pointers) and Icolptr (column pointers).

I want to ask guidance and advice for pursuing the following 2 operations:

I want to create a sparse representation of the matrix [C] = [A]^T [A], through mkl_sparse_spmm, then call pardiso to solve the equation [C]{x} = {y} for {x} (obviously, the vector {y} is given). How can I call mkl_sparse_spmm, then use the handle of matrix [C] with pardiso?
I also want to obtain the reduced [Q][R] decomposition of the tall matrix [A], such that [Q] is m by n. Can this be done? If yes, is there a way to obtain a sparse representation of [Q]?

Many thanks in advance for any help and guidance.

Spencer_P_Intel · ‎09-05-2025

Hi Ioannis_K_,

Great questions!

1. So Inspector-Executor Sparse BLAS routines have a routine called mkl_sparse_?_export_csr which takes a matrix handle, like for instance C from mkl_sparse_spmm() from A^T*A and gives access to internals of C in CSR format. You can then copy into your own arrays for use elsewhere:

//
// setup A as CSR3 matrix with double precision
//

sparse_matrix_t C = NULL;
// creates C = A^T*A in CSR3 format
mkl_sparse_spmm(SPARSE_OPERATION_TRANSPOSE, A, A, &C); 

//
// now extract C data from sparse handle and make a copy for use in pardiso
//
MKL_INT c_nrows;
MKL_INT c_ncols;
MKL_INT *c_rows_start = NULL, *c_rows_end = NULL, *c_col_indx = NULL;
double *c_values = NULL;
sparse_index_base_t indexing;

mkl_sparse_d_export_csr(C, &indexing, &c_nrows, &c_ncols, &c_rows_start, &c_rows_end, &c_col_indx, &c_values); 
// export_csr gives access in CSR4 format, but the spmm routine generates 
// CSR3 format internally, so we can extract and convert it to CSR3.  
// For now, we have access to internals of C matrix handle, but not ownership 
// of those arrays (they are owned by the library and will be deleted when C 
// is released).  To get full ownership, you must allocate your own arrays 
// and copy the data over as follows:

MKL_INT c_ind = indexing == SPARSE_INDEX_BASE_ZERO ? 0 : 1;
MKL_INT c_nnz = c_rows_start[c_nrows] - c_ind;
MKL_INT *ic = (MKL_INT *)mkl_malloc(sizeof(MKL_INT)*(c_nrows+1), 64);
MKL_INT *jc = (MKL_INT *)mkl_malloc(sizeof(MKL_INT)*(c_nnz), 64);
double *c = (double *)mkl_malloc(sizeof(double)*(c_nnz), 64);

// note: feel free to parallelize this copy into your own arrays for pardiso
for (MKL_INT i = 0; i < c_nrows; i++) ic[i] = c_rows_start[i] - c_ind;
ic[c_nrows] = c_rows_end[c_nrows-1] - c_ind;
for (MKL_INT i = 0; i < c_nnz; i++) jc[i] = c_col_indx[i] - c_ind;
for (MKL_INT i = 0; i < c_nnz; i++) c[i] = c_values[i];

// note: you can now put the CSR3 ic/jc/c/c_nrows/c_ncols/0-based indexing matrix into pardiso system

// note: don't forget to mkl_free(ic); mkl_free(jc); mkl_free(c); once you are all done

Honestly, some day it would be awesome to have all these different functionalities be able to use the same matrix handles without needing to extract and provide to different objects or handles, but that is outside our current plans.

For question 2 about sparse_qr, the

Short answer: we don't have support for providing access to Q or R factors.

Long answer: mkl_sparse_qr has the following routines:

mkl_sparse_x_qr_solve -- solves for x in A(=Q*R) with Q*R*x = b
mkl_sparse_x_qr_qmult -- applies x = inv(Q) * b using the internal data structure for Q from the sparse multifrontal algorithm (it is not stored in a sparse matrix format internally)
mkl_sparse_x_qr_rsolve -- applies triangular R to solve x = inv(R) * b using internal data structures for R from the sparse multifrontal algorithm (again, it is not stored in a sparse matrix format directly)

We unfortunately do not have a routine in sparse_qr that converts and exports the internal format into a sparse orthogonal Q or sparse triangular R sparse matrix. Note that a similar thing was asked of us for pardiso in the past and we did add partial support for retrieving the factors there, but sparse_qr does not have any such support yet.

Is there a reason you want to extract Q and R as sparse matrices or do you just need to be able to apply them individually like is available in the qmult and rsolve routines ?

Best Regards,

Spencer

Ioannis_K_ · ‎09-05-2025

Hi Spencer, thank you for the very helpful information. I will try the methods you suggest for finding [C]. Would it be possible to provide the code in Fortran? Or, are there any Fortran source code examples in the oneAPI installation using the functions of your code? If not, no worries, I will try to make the convertion to fortran.

Regarding the extraction of Q: the only part that I want to extract is Q, R is secondary.

The reason I require this is that I have a computation that decomposes a required, m-dimensional vector as {z} = [A]{r} + {s}, where [A] is the (m by n) matrix I mentioned in my original message, {r} is n-dimensional and {s} is m-dimensional.

I want to try and see if {s} has a "projection" onto the space described by the columns of [A], and - if yes- remove that projection from {s}.

If we have established the [Q] matrix from the QR decomposition of [A], such that [Q] is m by n, then the operation that I have to conduct to {s} would be:

{s} <-- {s} - [Q][Q]^T {s}.

I assume the information you have provided so far should allow me to find the [Q] matrix, through mkl_sparse_x_qr_solve(), and then also conduct the computations {u} = [Q]^T {s} and {w} = [Q]{u} through mkl_sparse_x_qr_qmult(). Is this correct? If yes, it will be super-helpful if you provide the relevant code for this operation.

One follow-up question regarding the matrix [Q]: If I use mkl_sparse_x_qr_solve(), I imagine that the [Q] matrix remains in the memory unless I release it, correct? I am asking because I need to do the operation {s} <-- {s} - [Q][Q]^T {s} multiple times (for the same [Q], but a different {s} each time).

Again, many thanks for all your help.

Yannis

Spencer_P_Intel · ‎09-05-2025

Apologies, I naively assumed you were using C, but Fortran is definitely also supported and one-based indexing is more natural there anyway.

In the oneMKL examples folder <mkl_release _folder>/share/doc/mkl/examples/f/sparse_blas/source/sparse_d_spmm_export_csr.f90 you will see the use of fortran APIs to export the arrays out of C matrix handle (you may have to unpack examples_core_f.tgz to see the f/ folder). You will need to write the allocate and copy yourself, though you may find other examples there that help you with that in Fortran.

Recalling that sparse qr decomposes A = Q * R where Q is orthogonal and R is triangular. As we know from linear algebra, the inverse of an orthogonal matrix is it's transpose, so inv(Q) = Q^T and inv(Q)^T = Q. You are looking essentially for Q * Q^T * s which is inv(Q)^T * inv(Q) * s. The mkl_sparse_x_qr_qmult might have been better called the mkl_sparse_x_qr_invqmult, as that is the operation it is supporting. Unfortunately the "transpose" piece in the API, applies to A (as in solving op(A) * x = b with QR factorization) and not to the Q factor. As we know, in general, the QR factors for A have no relation to the QR factors for A^T, so unfortunately there is no interface available to get the inv(Q)^T * t = Q * t operation or to get access to Q as a sparse matrix.

I see what you are going for to remove the projection from "s" but unfortunately I don't actually see a way to accomplish it with our existing implementation... ( I mean if push comes to shove, it is technically possible to extract access to Q piece by piece using qmult(A, nontrans, qt_i, e_i) where e_i is zero vector with 1 in ith place and you get Q^T = {sparsify(qt_i) } ... but this is almost absurd for any reasonable size of A... just putting it out there ...)

And yes, for a given A = Q * R decomposition, the call to mkl_sparse_x_qr() or the broken up qr_reorder() then qr_factorize() in qr process creates the Q and R factors internal to the matrix handle, and it can be repeatedly used to qr_solve Q*R*x = b for different b and x, but same Q/R factors of A matrix (or qr_qmult() or qr_rsolve)... those factors remain usable until the A matrix handle is released.

Best Regards,

Spencer

Ioannis_K_ · ‎09-05-2025

Hi Spencer,

Thank you for the information.

As I mentioned in my original message, the matrix [A] is tall (m > n). When I was mentioning QR decomposition, I was interested in the "reduced" QR decomposition; that is, the tall (m by n) matrix that would give the set [Q] of orthonormal column vectors, which are the basis for the actual column space of [A]. I am mentioning this, because the [Q] I am referring to is non-square, hence it does not have an inverse.

If I can somehow compute the aforementioned "tall" [Q] matrix, and then have a function which can compute:

The product of [Q] times a vector
The product of [Q]^T times a vector

then this is all I need for my computation.

At any rate, I believe that the computations related to the [C] matrix that I had asked about in my original message might be doing what I require.

What I mean is that, if we want to find the projection of a vector {r} onto the column space of a tall matrix [A], then this projection is the vector {p}=[A]{x}, where {x} is the solution of the linear system: [A]^T[A]{x} = [A]^T{r}, or, equivalently: [C]{x} = [A]^T{r}.

I believe that if we then make the correction {r} <-- {r} - {p}, then the projection of the updated {r} on the column space of [A] will be zero, which should also mean that its projection onto the column space of the tall [Q] must be zero.

Thank you,

Yannis

Spencer_P_Intel · ‎09-05-2025

As some follow-up thoughts:

If you are wanting to use C = A^T*A in pardiso as symmetric matrix, you might as well try using mkl_sparse_syrk(SPARSE_OPERATION_TRANSPOSE,A,&C) to do the operation directly and which only outputs the upper triangular part of C (saves some space and maybe time) because pardiso documents (in ja description) that it only needs the upper triangular part of the symmetric matrix.
I should also recommend that pardiso expects everything to be sorted by increasing index per row, so you might want to also call mkl_sparse_order(C) before exporting the arrays to ensure that is true.

Spencer

Spencer_P_Intel · ‎09-05-2025

One more follow up -- pardiso generally supports 0-based indexing if you mark iparm[34]=1 , but by default it is expecting one-based indexing (iparm[34]=0* is one-based indexing). So maybe in your copy above out of export_csr data, it would be better to convert it to one-based indexing instead of zero-based indexing for pardiso.