Re: Pardiso: How to partially solve a real symmetric linear system with memory efficiency?

fangtcao · ‎02-16-2021

Dear MKL developers,

I use Intel MKL's Pardiso solver on a medical image processing project based on FEM (Finite Element Method). My problem is to solve a linear equation 'Ax=b' where 'A' is a real symmetric sparse matrix, and 'b' is a vector. The dimension of A is about 700K in my problem. And I am only interested in certain elements in 'x' which indices are known in advance.

MKL's example program 'pardiso_sym_reduced_solve_c.c' appears quite relevant for this problem. According to the program's comments, the computation cost can be reduced at the solver step by focusing on certain selected elements in b. In my case, I am interested in about 350K elements of x (namely, 1/2 of the elements). I tried to set 'perm,' but it only worked when the number of selected elements in 'x' is less than 25K. If I select more than 25K elements, I got memory errors (on my computer with 16G RAM):

PardisoError: The Pardiso solver failed with error code -2. See Pardiso documentation for details.

Question: Is there an efficient, less memory-consuming way in Pardiso to solve AX=b on only half of the elements in 'x'? In general, how does the consumed memory (in solving Ax=b) scale up with the number of selected elements in x with the 'perm' option enabled?

Many thanks.

Fang

Kirill_V_Intel · ‎02-16-2021

Hello @fangtcao,

As you are solving a system of equations with a direct method, unless it has a special structure, you have to compute all the factors I think.

So, the partial solve feature is targeting reduced memory consumption on the solving phase (and allows exploiting the potentially known sparsity structure of the righthand sides), but not on the factorization and reordering steps.

Hence I assume you actually don't have enough memory for the factorization. But we better check that. For this, please separate the calls to the phases (instead of a single call with phase = 13 do three calls with phase = 11, 22 and 33) and check error-info after each call.

If you get -2 after the call with phase = 22, then I suspect not much can be done. You can potentially use Out-Of-Core mode (iparm[60]=2 or 1) but it will use disc storage and be slower. Or, if the structure is particularly good, try to solve the Schur complement system for the subset of unknowns which you want to find. But in this case a lot of sparsity is likely to get lost.

But if you are getting this error status after factorization completes successfully, at the solving phase (phase = 33), then we can talk more details.

Best,
Kirill

fangtcao · ‎02-17-2021

Hello Kirill,

Thank you for your swift response.

You are right. I am indeed interested in the time performance on the solving phase 33, not on the reordering phase 11 or factorization phase 22. Thus, I use 'perm' to select elements to save computation for the last step, phase 33.

However, the problem is that I always get the error -2 after phase 11. I tried to set iparm[60]=2 and 1 as you mentioned, but still get error -2 with phase 11. As mentioned earlier, the left side stiffness matrix A is a real symmetric sparse matrix of 700K x 700K; the number of selected elements in 'x' is about 350K. Any ideas on why it is in the phase 11 that memory runs out ?

Fang

Kirill_V_Intel · ‎02-19-2021

Hi @fangtcao,

It would be relatively easy to find the reason if you share your data with us (a small reproducer with PARDISO settings and your matrix data + its reader). In general, it is not easy (at least for me) to estimate memory for the reordering step.

A correction to my previous reply: you should have iparm[60] = 0 to be able to use the partial solve feature as the docs say. So, please use the in-core mode.

Also, VBSR feature (see iparm[38]) helps to reduce the memory consumption for the reordering phase. But it has restrictions on when it works and the exact benefit depends on the sparsity pattern.

Best,
Kirill

fangtcao · ‎02-22-2021

Hi Kirill,

Thank you for the help. We will prepare the data and the code and share with you shortly.

Regards,

Fang

fangtcao · ‎03-01-2021

Dear Kirill,

I have uploaded the data and a small reproducer with PARDISO settings to GitHub:

https://github.com/fangcao/intel_mkl_pardiso_error

Please let me know if you have questions. Thank you very much.

Regards,

Fang

Kirill_V_Intel · ‎03-06-2021

Hi @fangtcao,

Thanks for preparing this reproducer! I observed the same behavior that you're seeing.

With iparm[30] = 0 the code works fine (with the rest iparm's as in your repro), but with iparm[30] = 1 the memory consumption jumps high. I didn't got an error -2 but the solver used much more than 16 Gb of RAM.

We'll make an update once we find out more details but I suspect it is implied by the partial solve algorithm. In your reproducer you're trying to do a partial solve for ~300k unknowns when the system size is ~ 600k. Likely, you're restricting too much solver capabilities by telling it to keep an eye on 50% of the unknowns (since this implies a limitation on how much freedom the reordering algorithm has, less freedom for your workload (implies) -> worse reordering -> more fill-in in the factors -> more memory required).

As a quick workaround (and the recommended way to proceed), I suggest setting iparm[30] = 0 and do full (not partial) solves. 50% of rhs sparsity is likely too dense for the partial solve feature.

Best,
Kirill

Khang_N_Intel · ‎05-24-2021

We are assuming that the solution provided helped and would no longer be monitoring this issue. Please raise a new thread if you have further issues.