Hello Nils,

Kempke__Nils-Christi · ‎04-03-2019

Hello,

we are currently trying to integrate mkl_pardiso into our software and are facing some questions regarding mkl_pardiso and the computation of the Schur-complement.
Given a real symmetric matrix we wanted to compute the Schur-complement of a certain size and afterwards use the partial factorization from that computation to solve a part of the original linear equation system. We are using sparse matrices and thus also iparm[35] = -2/-1.

First of all we started out by modifying the supplied pardiso_schur_c.c example and used the pseudocode found here as an orientation. You can find the used code as an attachment. The compile command can be found as a comment in the upper part of the c-File.

First thing was, that when setting

    int iparm35 = -1;
    
    iparm[1-1] = 1;         /* No solver default */
    iparm[2-1] = 2;         /* Fill-in reordering from METIS */
    iparm[5-1] = 0;
    iparm[10-1] = 8;        /* Perturb the pivot elements with 1E-13 */
    iparm[11-1] = 0;        /* Use nonsymmetric permutation and scaling MPS */
    iparm[13-1] = 0;        /* Maximum weighted matching algorithm is switched-off (default for symmetric). Try iparm[12] = 1 in case of inappropriate accuracy */
    iparm[14-1] = 0;        /* Output: Number of perturbed pivots */
    iparm[18-1] = -1;       /* Output: Number of nonzeros in the factor LU */
    iparm[19-1] = -1;       /* Output: Mflops for LU factorization */
    iparm[24 - 1] = 10;
    iparm[31 - 1] = 0;
    iparm[36 - 1] = iparm35;        /* Use Schur complement */

we ran into a segmentation fault thrown by the pardiso_export function. We do not know why this happens, but setting iparm[23] = 1 instead of 10 resolved the issue. The documentation on iparm[23] simply says that it cannot be 0 when setting iparm[35] to either -1 or -2. Did we do something wrong here?
Another thing we found is that when setting iparm[35] = -2 in the above settings (so with iparm[23] = 10) we don't get the number of non-zeros as output in iparm[35] but instead -2 again (even though error == 0).

After 'resolving' the above issue by setting iparm[23] = 1 we were able to compute the Schur-complement and get it returned in sparse format (that was all correct, even though it was quite surprising to us, that pardiso, despite getting all matrices supplied in 1-based indexing, returned the Schur-complement with zero based indexing - is there a parameter to control this ?).
Still we were not quite sure on how to use the parameter perm. If we set perm to something like

perm = {1, 0, 0, 0, 1}

what exactly does "perm specifies elements for a Schur complement" in the documentation mean? Will pardiso perform a Schur-complement computation equivalent to one where we specify perm as

perm = {0, 0, 0, 1, 1}

but swap row 1 and 4? If yes, will pardiso always "stable" sort the rows?

A last question we have that is somewhat more general:

Given a linear system

[A11 A12] [x1] [b1]
[A21 A22] [x2] = [b2]

we want to do two things:

a) compute the Schur-complement S = A22 - A21 A11^-1 A12

b) solve the linear system A11 x1 = b1

We assumed that, similar to pardiso from the pardiso-project, we could do that by computing the Schur-complement with iparm[35] = -2 (so that the factorization is kept for solving phase) and afterwards use that partial factorization in pardiso and phase=33 to compute x1.
From our tests we found that pardiso instead solves the complete system for x1 and x2. Is that the expected behaviour?
If so, is there a way to efficiently only solve A11 x1 = b1?

With best regards,

Nils

Kirill_V_Intel · ‎04-03-2019

Hello Nils,

I'll try to address your questions.

1) For Schur complement, it is recommended that the user sets iparm[23] = 1. Using iparm[23] = 10 is not recommended at all except for certain very specific cases when neither iparm[23] = 0 nor iparm[23] = 1 work. We'll look at our code and check what exactly went wrong with iparm[23] = 10. Most likely a note to the documentation will be added that only iparm[23] =1 should be used for computing the Schur complement.

2) Indexing for Schur complement as of now is always 0-based but we'll consider making it depend on the indexing of the input matrix, it would be more logical.

3) perm defines the submatrix for which the Schur complement will be computed. If you set perm = {1, 0, 0, 0, 1} the Schur complement will be computed for the submatrix formed by the first and last rows and columns of the input matrix. Basically, it will reorder the matrix so that row 1 will become row 4 and row 5 will stay row 5. Thus, the row indices will be grouped in the increasing order in the future Schur complement. This reordering should be stable. If it is not so, let us know.

4) Do you need both Schur complement and solving for a submatrix?

For solving A11 x1 = b1 you could use forward substitution (phase 331), then diagonal substitution (phase 332), then set to zero the second part of the vector and call backward substitution (phase 332). We'd expect that for typical Schur complements of small sizes the overhead of solving a larger system is negligible. Is it really a bottleneck in your case? If so, could you give us a little bit more details about what is your high-level algorithm?

Best,
Kirill

Kempke__Nils-Christi · ‎04-04-2019

Hello Kirill,

first of all thanks for the quick response :).

Regarding 1):
The reason we wanted to use iparm[23]=10 was, that the documentation on iparm[23]=1 says
"Note
Disable iparm[10] (scaling) and iparm[12] = 1 (matching) when using the two-level factorization algorithm. Otherwise Intel MKL PARDISO uses the classic factorization algorithm."
and we still wanted to use both, matching and scaling. But from your answer I assume, that in most of the cases on should anyway either use iparm[23]=0 or iparm[23]=1?
4) Yes, we wanted to do both, we wanted to first compute the Schur complement and afterwards solve A1 x1 = b multiple times for different right hand sides (we're using it for a preconditioned krylov-subspace method (BiCGSTAB) ).

We were just wondering if there was another way to do it (we were using pardiso from the pardiso-project earlier, and there, they implemented a solve after computing a schur complement as solving A11 x1 = b1). We'll implement it as described an check if it is a bottleneck or not and then come back to you. Thanks.

Ah, one additional point that came up: Our matrix
[A11 A12]
[A21 A22]
has many empty rows in A12 (and thus empty columns in A21, since it is symmetric). Would it from your exp. be beneficial to remove those first and later expand the schur complement again in order to get the complete schur complement?

Best,
Nils

Kirill_V_Intel · ‎04-15-2019

Hello Nils,

Sorry for the delayed response.

1) I highly not recommend using iparm[23]=10, it's a very special code path. iparm[23] = 1 should be used for the two-level factorization. However, as of now, when matching and scaling are ON, you cannot use two-level factorization (though we consider adding support for this case in the future). Even if you set iparm[23]=1, internally the code will dispatch to the classic factorization for now. So, if you really want the two-level factorization, I'd suggest to try to switch off the matching and scaling and see how it goes.

2) I cannot give a precise answer to your question about removing the rows in A12 and then restoring them back but I would consider such an option only after I have tried when I am dissatisfied with how it works without this additional trick. I don't think the trick will make things better.

Best,
Kirill

Computing the Schur-complement with MKL_PARDISO