Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
6878 Discussions

## Pardiso and Cluster Pardiso Questions Beginner
394 Views

Hello everyone!

I'm using PARDISO to solve Navier Stoks and Temperature equations in one program.

I use phase 11 only at program start to tell solver what my matrixes look like. I need to refill both matrixes, make factorization and find solution at every time layer. To do so I use maxfct parameter equal 2 and change mnum from 1 to 2 for different equations. I also use 2 different pt arrays for different equations. I need to mention that I use phase 0 after each time layer in order to free memory.

I want to accelerate computations by using several cluster nodes. I found Cluster version of PARDISO and it was a surprise for me that maxfct and mnum parameters are ignored. I also didnt find phase 0.

My first question is: do I really need those parameters to solve my problem on a cluster? I dont want to use phase 11 at every time layer because it will be too slow.

Secondly I'm having a problem using PARDISO on a single cluster node when the number of equations is ~ 13 million. I recieve error -2. Output below I get after phase 11 for both matrixes and phase 23 for Navier Stoks equation.

```=== PARDISO: solving a real nonsymmetric system ===
0-based array is turned ON
PARDISO double precision computation is turned ON
Parallel METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON

Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 1.680254 s
Time spent in reordering of the initial matrix (reorder)         : 15.632488 s
Time spent in symbolic factorization (symbfct)                   : 128.883790 s
Time spent in data preparations for factorization (parlist)      : 2.423303 s
Time spent in allocation of internal data structures (malloc)    : 168.633973 s
Time spent in additional calculations                            : 13.837179 s
Total time spent                                                 : 331.090987 s

Statistics:
===========
Parallel Direct Factorization is running on 28 OpenMP

< Linear system Ax = b >
number of equations:           13198336
number of non-zeros in A:      107134147
number of non-zeros in A (%): 0.000062

number of right-hand sides:    1

< Factors L and U >
number of columns for each panel: 72
number of independent subgraphs:  0
number of supernodes:                    6792104
size of largest supernode:               56189
number of non-zeros in L:                52781978834
number of non-zeros in U:                52521656342
number of non-zeros in L+U:              105303635176

=== PARDISO: solving a real nonsymmetric system ===
0-based array is turned ON
PARDISO double precision computation is turned ON
Parallel METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON

Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.309069 s
Time spent in reordering of the initial matrix (reorder)         : 2.541601 s
Time spent in symbolic factorization (symbfct)                   : 2.516607 s
Time spent in data preparations for factorization (parlist)      : 0.269381 s
Time spent in allocation of internal data structures (malloc)    : 1.306915 s
Time spent in additional calculations                            : 1.014548 s
Total time spent                                                 : 7.958121 s

Statistics:
===========
Parallel Direct Factorization is running on 28 OpenMP

< Linear system Ax = b >
number of equations:           3368499
number of non-zeros in A:      22736533
number of non-zeros in A (%): 0.000200

number of right-hand sides:    1

< Factors L and U >
number of columns for each panel: 72
number of independent subgraphs:  0
number of supernodes:                    2244305
size of largest supernode:               13571
number of non-zeros in L:                2906625075
number of non-zeros in U:                2863523632
number of non-zeros in L+U:              5770148707
=== PARDISO is running in In-Core mode, because iparam(60)=0 ===
*** Error in PARDISO  (     insufficient_memory) error_num= 8
*** Error in PARDISO memory allocation: FACTORIZE_SOLVING_LU_DATA, allocation of 412359210 bytes failed
total memory wanted here: 435961139 kbyte

=== PARDISO: solving a real nonsymmetric system ===

Summary: ( starting phase is factorization, ending phase is solution )
================

Times:
======
Time spent in additional calculations                            : 0.000074 s
Total time spent                                                 : 0.000074 s

Statistics:
===========
Parallel Direct Factorization is running on 28 OpenMP

< Linear system Ax = b >
number of equations:           13198336
number of non-zeros in A:      107134147
number of non-zeros in A (%): 0.000062

number of right-hand sides:    1

< Factors L and U >
number of columns for each panel: 72
number of independent subgraphs:  0
number of supernodes:                    6792104
size of largest supernode:               56189
number of non-zeros in L:                52781978834
number of non-zeros in U:                52521656342
number of non-zeros in L+U:              105303635176
gflop   for the numerical factorization: 3841175.475158

PARDISO ERROR =           -2
```
` `

Do I really need so much memory to solve system or it is some kind of bug? If it is not, can this problem be solved by using cluster PARDISO?

My iparm setup is below.

```    IPARM(1) = 1 ! NO SOLVER DEFAULT
IPARM(2) = 3 ! FILL-IN REORDERING FROM METIS
IPARM(3) = 0 ! NUMBERS OF PROCESSORS
IPARM(4) = 0 ! NO ITERATIVE-DIRECT ALGORITHM
IPARM(5) = 0 ! NO USER FILL-IN REDUCING PERMUTATION
IPARM(6) = 0 ! =0 SOLUTION ON THE FIRST N COMPONENTS OF X
IPARM(7) = 0 ! NOT IN USE
IPARM(8) = 5 ! NUMBERS OF ITERATIVE REFINEMENT STEPS
IPARM(9) = 0 ! NOT IN USE
IPARM(10) = 16 ! PERTURB THE PIVOT ELEMENTS WITH 1E-13
IPARM(11) = 1 ! USE NONSYMMETRIC PERMUTATION AND SCALING MPS
IPARM(12) = 0 ! NOT IN USE
IPARM(13) = 1 ! MAXIMUM WEIGHTED MATCHING ALGORITHM IS SWITCHED-ON (DEFAULT FOR NON-SYMMETRIC)
IPARM(14) = 0 ! OUTPUT: NUMBER OF PERTURBED PIVOTS
IPARM(15) = 0 ! NOT IN USE
IPARM(16) = 0 ! NOT IN USE
IPARM(17) = 0 ! NOT IN USE
IPARM(18) = -1 ! OUTPUT: NUMBER OF NONZEROS IN THE FACTOR LU
IPARM(19) = -1 ! OUTPUT: MFLOPS FOR LU FACTORIZATION
IPARM(20) = 0 ! OUTPUT: NUMBERS OF CG ITERATIONS
IPARM(24) = 0
IPARM(34) = 0
IPARM(27) = 0
IPARM(35) = 1 ! ZERO BASE INDEXING
IPARM(39) = 0```

Im using Parallel Studio XE 2017.4.196 on a single cluster node. Each node has 2 х Intel Xeon E5-2690 v4 and 256GB of RAM and it runs on CentOS 7.3

3 Replies Moderator
394 Views

>> Do I really need so much memory to solve system or it is some kind of bug? If it is not, can this problem be solved by using cluster PARDISO?

yes, this is not a bug. Solver reports you the number of nnz aftere factorization is nnz ~ 57*10^9, therefore the size of memory needed would be sizeof(double) * nnz ~ 420 Gb.

you may solve this case by using iparm(60) = 1 or 2 ( hybrid or OOC modes). Please refer to the Developer Reference manual for details. Beginner
394 Views

Thank you for response. Will it be enough memory if I use 5 nodes? 10? How do Cluster Pardiso allocate memory? Novice
394 Views

Hello,

I have a problem causing the same error message, but using iparm (60) = 1, so OOC mode is enabled. Here is the log:

```ooc_max_core_size got from config file=200000
=== PARDISO is running in Out-Of-Core mode, because iparam(60)=1 and there is not enough RAM for In-Core ===
*** Error in PARDISO  (     insufficient_memory) error_num= 8
*** Error in PARDISO memory allocation: FACTORIZE_SOLVING_LU_DATA, allocation of 136172344 bytes failed
total memory wanted here: 195700928 kbyte

=== PARDISO: solving a symmetric indefinite system ===
1-based array indexing is turned ON
PARDISO double precision computation is turned ON
Parallel METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON

Summary: ( starting phase is reordering, ending phase is solution )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 37.209805 s
Time spent in reordering of the initial matrix (reorder)         : 0.174363 s
Time spent in symbolic factorization (symbfct)                   : 244.759732 s
Time spent in data preparations for factorization (parlist)      : 7.739215 s
Time spent in allocation of internal data structures (malloc)    : 1371.208639 s
Time spent in additional calculations                            : 230.071368 s
Total time spent                                                 : 1891.163122 s
==============================================================
----------- Out of core time (in percent (%)) --------------
Factorization step (100 (%)):
write to files : 0 %
Solution step (100 (%)):
Total time (100 (%)):
----------- Out of core Mb --------------
Factorization step:
write to files :      0.000 Mb
Solution step:
Total size of data transferred:
==============================================================

Statistics:
===========
Parallel Direct Factorization is running on 16 OpenMP

< Linear system Ax = b >
number of equations:           16651601
number of non-zeros in A:      1070209671
number of non-zeros in A (%): 0.000386

number of right-hand sides:    1

< Factors L and U >
number of columns for each panel: 192
number of independent subgraphs:  0
number of supernodes:                    3858946
size of largest supernode:               76371
number of non-zeros in L:                24985066246
number of non-zeros in U:                1
number of non-zeros in L+U:              24985066247
gflop   for the numerical factorization: 700621.967324```

I think this is strange because these large arrays should be stored out of core.

Also you write

Solver reports you the number of nnz aftere factorization is nnz ~ 57*10^9

Where do you find that number? I find in Oleg's error message

```number of non-zeros in L+U:              105303635176
```

which would give a necessary memory of about 840 GB which does not correspond to the 420 GB the solver says it needs.

In the case of my problem,

`number of non-zeros in L+U:              24985066247`

which would require about 192 GB of memory. That corresponds to the

```total memory wanted here: 195700928 kbyte
```

which is about 195 GB. So in my case, those numbers match. Also I do not understand why 195 GB is too much when

`ooc_max_core_size`

is set to 200 GB. 