Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
7287 Discussions

PARDISO: Suspected bug in partial / Schur-complement solve

rbunger
New Contributor II
564 Views

Hi,

I have a complex symmetric system (mtype = 6). I want to compute a partial solution, specifically just a small upper part of the solution vector. I have tested two different matrices:

  1. For the first one (ID = 746704) both the partial solve mode (iparm[30] =3) and the sparse Schur-complement mode (iparm[35] = -2) deliver the same correct result.
  2. The second matrix is special: It contains two individual sub-matrices so that the corresponding systems could be solved individually, like a block-diagonal matrix (the matrix is a FEM matrix where two domains are not connected (they are connected only via a hybridization with an integral equation)). The computed reduced solutions are wrong.

Each matrix is about 25 MB in size (compressed), so I can't upload them.

Below, I just print the first six coefficients of the solution vector. "partial_solve = 0": Schur-complement mode; "partial_solve = 1": Partial solve mode. 

Thanks & regards,

Rainer

 

---

 

ID = 746704
partial_solve = 0
N_schur = 12341

Reference solution:
x(0) = (0,26.0314)
x(1) = (0,-25.8455)
x(2) = (0,41.7267)
x(3) = (0,-67.5421)
x(4) = (0,16.0706)
x(5) = (0,-16.0299)

Reduced solution:
x(0) = (0,26.0314)
x(1) = (0,-25.8455)
x(2) = (0,41.7267)
x(3) = (0,-67.5421)
x(4) = (0,16.0706)
x(5) = (0,-16.0299)

Full solution:
x(0) = (0,26.0314)
x(1) = (0,-25.8455)
x(2) = (0,41.7267)
x(3) = (0,-67.5421)
x(4) = (0,16.0706)
x(5) = (0,-16.0299)

---

ID = 746704
partial_solve = 1

Reference solution:
x(0) = (0,26.0314)
x(1) = (0,-25.8455)
x(2) = (0,41.7267)
x(3) = (0,-67.5421)
x(4) = (0,16.0706)
x(5) = (0,-16.0299)

Reduced solution:
x(0) = (0,26.0314)
x(1) = (0,-25.8455)
x(2) = (0,41.7267)
x(3) = (0,-67.5421)
x(4) = (0,16.0706)
x(5) = (0,-16.0299)

======

ID = 202443
partial_solve = 0
N_schur = 6508

Reference solution:
x(0) = (0,-2.94291)
x(1) = (0,4.24624)
x(2) = (0,-7.5356)
x(3) = (0,8.64307)
x(4) = (0,-2.44929)
x(5) = (0,-3.66346)

Reduced solution:
x(0) = (0,0.124542)
x(1) = (0,0.13367)
x(2) = (0,0.133519)
x(3) = (0,0.151462)
x(4) = (0,0.152197)
x(5) = (0,0.148365)

Full solution:
x(0) = (0,0.365116)
x(1) = (0,0.589676)
x(2) = (0,-1.5622)
x(3) = (0,2.30311)
x(4) = (0,-0.417001)
x(5) = (0,0.473597)

---

ID = 202443
partial_solve = 1

Reference solution:
x(0) = (0,-2.94291)
x(1) = (0,4.24624)
x(2) = (0,-7.5356)
x(3) = (0,8.64307)
x(4) = (0,-2.44929)
x(5) = (0,-3.66346)

Reduced solution:
x(0) = (0,0.124542)
x(1) = (0,0.13367)
x(2) = (0,0.133519)
x(3) = (0,0.151462)
x(4) = (0,0.152197)
x(5) = (0,0.148365)

0 Kudos
1 Solution
c_sim
Employee
16 Views

Hi Rainer,

 

Thank you for posting this bug report and interesting question.

The bug that causes second matrix to have numerical error is fixed and will be part of oneMKL 2026.2.0.

 

Regarding performance: Yes, Schur complement and partial solve are basically doing the same--is just a convenient abstraction. However, unlike dense solvers, the performance impact of Schur feature is slightly more complicated.  The slowdown you observe occurs because when these features are enabled, the Schur complement block must be moved to the end of the elimination order. This constraint means that fill-reducing ordering is applied only to the non-Schur part in isolation, without considering the interactions between Schur and non-Schur regions. This has two main performance impacts: 1) limited parallelism in Schur part and 2) high fill-in for Schur part, especially in cases like yours where the Schur part represents a surface layer that is tightly connected to the non-Schur part. This causes higher-memory requirement and data traffic resulting in performance speed down.

 

For the matrix in first case (ID = 746704) the Schur part is fully-dense and in second case it is roughly 50% dense (likely due to its two-submatrix structure). This high fill-in is also visible in the verbose output when msglvl=1 is enabled--compare the "Number of non-zeros in L" with a full-solve approach, where the entire matrix undergoes fill-reducing reordering and this densification doesn't occur.

 

In summary: For your use case, the approach pays off only if Schur part is very small such that N_schur*N_schur << nnz_L (where nnz_L is the number of non-zeros when using a full solve). And in this case probably storing Schur complement as a dense matrix and using LAPACK to solve it might be beneficial.

 

Hope it helps. Feel free to ask follow-up questions.

Thank you once again for the post.

 

Kind Regards,

Chris

View solution in original post

10 Replies
rbunger
New Contributor II
561 Views

I wanted to attach my test code but I got the following error: "The attachment's pardiso_schur.cpp content type (text/x-c++src) does not match its file extension and has been removed."

0 Kudos
rbunger
New Contributor II
555 Views

oneAPI 2026.0.

0 Kudos
Aleksandra_K
Moderator
480 Views

Hi, Rainer

Thank you for reaching out. Could you please try uploading the code again using a .c or .cpp file? If needed, try compressing it or upload it as a text file.


Regards,

Alex


rbunger
New Contributor II
461 Views
Hi Alex, many thanks for your interest. I will upload the code next week, but not before Tuesday. It is more or less a combination of existing demo codes. Since it works for one of the matrices, it should be ok. I think PARDISO analyses the matrix for special features, independent block solve for example, and these special cases are possibly not properly combined with partial / Schur complement solve...

Thanks, Rainer
0 Kudos
rbunger
New Contributor II
408 Views

Hi,

 

I have attached the code as an archive. This is the version I have used to generate the above output. You can't run the code because I can't upload a necessary sparse matrix class I use to generate the reference. I will try to strip down the code and upload it today or tomorrow.

 

Thanks

 

Rainer

0 Kudos
rbunger
New Contributor II
408 Views

Hi,

 

here is a code version which just depends on the Eigen library...

 

Thanks for your efforts,

 

Rainer

 

 

0 Kudos
rbunger
New Contributor II
408 Views

Hi again,

 

here is a code version without any external requirement (apart from the STL).

 

Rainer

 

 

 

0 Kudos
rbunger
New Contributor II
192 Views

Above, I have tested the "partial solve" feature and the Schur-complement feature. Apart from the problem with the wrong result for one of the two matrices, I have compared the PARDISO outputs concerning used resources: They were identical, which is why I think that the partial solve feature is essentially the Schur-complement with a sparse Schur matrix.

I have tested the partial solve feature for another bigger case which delivers the correct result (mtype=6):

  • Partial solve factorization step is orders of magnitude slower than the standard factorization (can't give a factor because I measure in seconds and the standard factorization is < 1 sec.)
  • The factorization for the partial solve needs about 21 times more RAM
  • I call the "solve" step frequently inside an outer iterative solver (another Schur-complement formulation). With partial solve, the outer iterative solver is about 39 times slower. Here my aim was to get a speedup because the outer solver needs just a "surface solution" from PARDISO instead of the full 3D solution

Summary: The partial solve / Schur-complement feature is much slower and much more memory intensive than the std. factorization and solve.

I'm not an expert but it is understandable since a number of (sparse) matrix-matrix products are involved and an efficient implementation seems to be very complicated. This is also explained in a PARDISO manual by O. Schenk.

Regards,

Rainer

0 Kudos
c_sim
Employee
17 Views

Hi Rainer,

 

Thank you for posting this bug report and interesting question.

The bug that causes second matrix to have numerical error is fixed and will be part of oneMKL 2026.2.0.

 

Regarding performance: Yes, Schur complement and partial solve are basically doing the same--is just a convenient abstraction. However, unlike dense solvers, the performance impact of Schur feature is slightly more complicated.  The slowdown you observe occurs because when these features are enabled, the Schur complement block must be moved to the end of the elimination order. This constraint means that fill-reducing ordering is applied only to the non-Schur part in isolation, without considering the interactions between Schur and non-Schur regions. This has two main performance impacts: 1) limited parallelism in Schur part and 2) high fill-in for Schur part, especially in cases like yours where the Schur part represents a surface layer that is tightly connected to the non-Schur part. This causes higher-memory requirement and data traffic resulting in performance speed down.

 

For the matrix in first case (ID = 746704) the Schur part is fully-dense and in second case it is roughly 50% dense (likely due to its two-submatrix structure). This high fill-in is also visible in the verbose output when msglvl=1 is enabled--compare the "Number of non-zeros in L" with a full-solve approach, where the entire matrix undergoes fill-reducing reordering and this densification doesn't occur.

 

In summary: For your use case, the approach pays off only if Schur part is very small such that N_schur*N_schur << nnz_L (where nnz_L is the number of non-zeros when using a full solve). And in this case probably storing Schur complement as a dense matrix and using LAPACK to solve it might be beneficial.

 

Hope it helps. Feel free to ask follow-up questions.

Thank you once again for the post.

 

Kind Regards,

Chris

rbunger
New Contributor II
12 Views

Hi Chris,

 

many thanks for the detailed information & the proposed bugfix. Great service, and all that for free, I really appreciate that!

 

Thanks

 

Rainer

0 Kudos
Reply