Solved: High Memory Demand of Pardiso in OOC mode

Hassan-Ebrahimi · ‎04-22-2024

Hi

I am trying to solve a large unsymmetric matrix using MKL Pardiso solver.

From the stats, which are posted below, it can be verified that MKL is running in OOC mode.

The matrix has 871814023 nonzeros which takes about 6.5 GB of memory. 58.8 GB of memory is available at the start of the calclculation, nonetheless the Pardiso aborts with the following error:

*** Error in PARDISO ( insufficient_memory) error_num= 8
*** Error in PARDISO memory allocation: FACTORIZE_SOLVING_LU_DATA, allocation of 21815293 bytes failed
total memory wanted here: 59854440 kbyte

Why Pardiso needs 59854440 kbyte ( about 57 GB) of memory in OOC calculation mode where as the matrix size is less than 10 GB ?
Aren't the L+U factors supposed to be written to temporary files in OOC mode?

=== PARDISO: solving a real nonsymmetric system ===
1-based array indexing is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON

Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 21.841203 s
Time spent in reordering of the initial matrix (reorder) : 78.421270 s
Time spent in symbolic factorization (symbfct) : 17.473810 s
Time spent in data preparations for factorization (parlist) : 0.143885 s
Time spent in allocation of internal data structures (malloc) : 0.402535 s
Time spent in matching/scaling : 23.728350 s
Time spent in additional calculations : 35.696885 s
Total time spent : 177.707939 s

Statistics:
===========
Parallel Direct Factorization is running on 8 OpenMP

< Linear system Ax = b >
number of equations: 6078755
number of non-zeros in A: 871814023
number of non-zeros in A (%): 0.002359

number of right-hand sides: 1

< Factors L and U >
number of columns for each panel: 192
number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
number of supernodes: 399517
size of largest supernode: 10854
number of non-zeros in L: 5815155158
number of non-zeros in U: 5265988113
number of non-zeros in L+U: 11081143271
iparm(17) = 93834220
Reordering completed ...
ooc_max_core_size got by Env=58800
The file .\pardiso_ooc.cfg was not opened
=== PARDISO is running in Out-Of-Core mode, because iparam(60)=1 and there is not enough RAM for In-Core ===
*** Error in PARDISO ( insufficient_memory) error_num= 8
*** Error in PARDISO memory allocation: FACTORIZE_SOLVING_LU_DATA, allocation of 21815293 bytes failed
total memory wanted here: 59854440 kbyte

=== PARDISO: solving a real nonsymmetric system ===

Summary: ( factorization phase )
================

Times:
======
Time spent in additional calculations : 380.438875 s
Total time spent : 380.438875 s

Statistics:
===========
Parallel Direct Factorization is running on 8 OpenMP

< Linear system Ax = b >
number of equations: 6078755
number of non-zeros in A: 871814023
number of non-zeros in A (%): 0.002359

number of right-hand sides: 1

< Factors L and U >
number of columns for each panel: 192
number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
number of supernodes: 399517
size of largest supernode: 10854
number of non-zeros in L: 5815155158
number of non-zeros in U: 5265988113
number of non-zeros in L+U: 11081143271
gflop for the numerical factorization: 34402.131252

The following ERROR was detected: -2

mecej4 · ‎04-23-2024

It is a well-known fact that the inverse of a sparse matrix and the L and U factors of a sparse matrix are often more "filled-in", i.e., dense, than the original matrix. For instance, in this instructional example, you can see a sparse symmetric matrix with 794 non-zero entries which has an inverse with 7148 entries and a Cholesky factor with 1819 entries.

Many sparse matrix methods in the Krylov class use "Incomplete Factorizations" in which the fill-in is controlled, at the expense of having to iterate in order to obtain the solution of the true linear equation system rather than an approximation of it.

An out-of-core method attempts to avoid holding the entire result matrix in memory, but it does need to store an in-core working subset. Pardiso has told you that the number of non-zeros in the factors is about 11 billion, which would take 89 Gbytes to store. The out-of-core algorithm has reduced this cost to 60 Gbytes.

View solution in original post

mecej4 · ‎04-23-2024

It is a well-known fact that the inverse of a sparse matrix and the L and U factors of a sparse matrix are often more "filled-in", i.e., dense, than the original matrix. For instance, in this instructional example, you can see a sparse symmetric matrix with 794 non-zero entries which has an inverse with 7148 entries and a Cholesky factor with 1819 entries.

Many sparse matrix methods in the Krylov class use "Incomplete Factorizations" in which the fill-in is controlled, at the expense of having to iterate in order to obtain the solution of the true linear equation system rather than an approximation of it.

An out-of-core method attempts to avoid holding the entire result matrix in memory, but it does need to store an in-core working subset. Pardiso has told you that the number of non-zeros in the factors is about 11 billion, which would take 89 Gbytes to store. The out-of-core algorithm has reduced this cost to 60 Gbytes.