Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Pardiso and Cluster Pardiso Questions

Oleg_S_
Beginner
1,107 Views

Hello everyone!

I'm using PARDISO to solve Navier Stoks and Temperature equations in one program.

I use phase 11 only at program start to tell solver what my matrixes look like. I need to refill both matrixes, make factorization and find solution at every time layer. To do so I use maxfct parameter equal 2 and change mnum from 1 to 2 for different equations. I also use 2 different pt arrays for different equations. I need to mention that I use phase 0 after each time layer in order to free memory. 

I want to accelerate computations by using several cluster nodes. I found Cluster version of PARDISO and it was a surprise for me that maxfct and mnum parameters are ignored. I also didnt find phase 0.

My first question is: do I really need those parameters to solve my problem on a cluster? I dont want to use phase 11 at every time layer because it will be too slow.

Secondly I'm having a problem using PARDISO on a single cluster node when the number of equations is ~ 13 million. I recieve error -2. Output below I get after phase 11 for both matrixes and phase 23 for Navier Stoks equation. 

=== PARDISO: solving a real nonsymmetric system ===
0-based array is turned ON
PARDISO double precision computation is turned ON
Parallel METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON


Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 1.680254 s
Time spent in reordering of the initial matrix (reorder)         : 15.632488 s
Time spent in symbolic factorization (symbfct)                   : 128.883790 s
Time spent in data preparations for factorization (parlist)      : 2.423303 s
Time spent in allocation of internal data structures (malloc)    : 168.633973 s
Time spent in additional calculations                            : 13.837179 s
Total time spent                                                 : 331.090987 s

Statistics:
===========
Parallel Direct Factorization is running on 28 OpenMP

< Linear system Ax = b >
             number of equations:           13198336
             number of non-zeros in A:      107134147
             number of non-zeros in A (%): 0.000062

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 72
             number of independent subgraphs:  0
             number of supernodes:                    6792104
             size of largest supernode:               56189
             number of non-zeros in L:                52781978834
             number of non-zeros in U:                52521656342
             number of non-zeros in L+U:              105303635176

=== PARDISO: solving a real nonsymmetric system ===
0-based array is turned ON
PARDISO double precision computation is turned ON
Parallel METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON


Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.309069 s
Time spent in reordering of the initial matrix (reorder)         : 2.541601 s
Time spent in symbolic factorization (symbfct)                   : 2.516607 s
Time spent in data preparations for factorization (parlist)      : 0.269381 s
Time spent in allocation of internal data structures (malloc)    : 1.306915 s
Time spent in additional calculations                            : 1.014548 s
Total time spent                                                 : 7.958121 s

Statistics:
===========
Parallel Direct Factorization is running on 28 OpenMP

< Linear system Ax = b >
             number of equations:           3368499
             number of non-zeros in A:      22736533
             number of non-zeros in A (%): 0.000200

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 72
             number of independent subgraphs:  0
             number of supernodes:                    2244305
             size of largest supernode:               13571
             number of non-zeros in L:                2906625075
             number of non-zeros in U:                2863523632
             number of non-zeros in L+U:              5770148707
=== PARDISO is running in In-Core mode, because iparam(60)=0 ===
*** Error in PARDISO  (     insufficient_memory) error_num= 8
*** Error in PARDISO memory allocation: FACTORIZE_SOLVING_LU_DATA, allocation of 412359210 bytes failed
total memory wanted here: 435961139 kbyte

=== PARDISO: solving a real nonsymmetric system ===


Summary: ( starting phase is factorization, ending phase is solution )
================

Times:
======
Time spent in additional calculations                            : 0.000074 s
Total time spent                                                 : 0.000074 s

Statistics:
===========
Parallel Direct Factorization is running on 28 OpenMP

< Linear system Ax = b >
             number of equations:           13198336
             number of non-zeros in A:      107134147
             number of non-zeros in A (%): 0.000062

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 72
             number of independent subgraphs:  0
             number of supernodes:                    6792104
             size of largest supernode:               56189
             number of non-zeros in L:                52781978834
             number of non-zeros in U:                52521656342
             number of non-zeros in L+U:              105303635176
             gflop   for the numerical factorization: 3841175.475158

 PARDISO ERROR =           -2
 

Do I really need so much memory to solve system or it is some kind of bug? If it is not, can this problem be solved by using cluster PARDISO?

 

My iparm setup is below.

    IPARM(1) = 1 ! NO SOLVER DEFAULT
    IPARM(2) = 3 ! FILL-IN REORDERING FROM METIS
    IPARM(3) = 0 ! NUMBERS OF PROCESSORS
    IPARM(4) = 0 ! NO ITERATIVE-DIRECT ALGORITHM
    IPARM(5) = 0 ! NO USER FILL-IN REDUCING PERMUTATION
    IPARM(6) = 0 ! =0 SOLUTION ON THE FIRST N COMPONENTS OF X
    IPARM(7) = 0 ! NOT IN USE
    IPARM(8) = 5 ! NUMBERS OF ITERATIVE REFINEMENT STEPS
    IPARM(9) = 0 ! NOT IN USE
    IPARM(10) = 16 ! PERTURB THE PIVOT ELEMENTS WITH 1E-13
    IPARM(11) = 1 ! USE NONSYMMETRIC PERMUTATION AND SCALING MPS
    IPARM(12) = 0 ! NOT IN USE
    IPARM(13) = 1 ! MAXIMUM WEIGHTED MATCHING ALGORITHM IS SWITCHED-ON (DEFAULT FOR NON-SYMMETRIC)
    IPARM(14) = 0 ! OUTPUT: NUMBER OF PERTURBED PIVOTS
    IPARM(15) = 0 ! NOT IN USE
    IPARM(16) = 0 ! NOT IN USE
    IPARM(17) = 0 ! NOT IN USE
    IPARM(18) = -1 ! OUTPUT: NUMBER OF NONZEROS IN THE FACTOR LU
    IPARM(19) = -1 ! OUTPUT: MFLOPS FOR LU FACTORIZATION
    IPARM(20) = 0 ! OUTPUT: NUMBERS OF CG ITERATIONS
    IPARM(24) = 0
    IPARM(34) = 0
    IPARM(27) = 0
    IPARM(35) = 1 ! ZERO BASE INDEXING
    IPARM(39) = 0

Im using Parallel Studio XE 2017.4.196 on a single cluster node. Each node has 2 х Intel Xeon E5-2690 v4 and 256GB of RAM and it runs on CentOS 7.3

 

Thanks in advance for your help.

0 Kudos
3 Replies
Gennady_F_Intel
Moderator
1,107 Views

>> Do I really need so much memory to solve system or it is some kind of bug? If it is not, can this problem be solved by using cluster PARDISO?

yes, this is not a bug. Solver reports you the number of nnz aftere factorization is nnz ~ 57*10^9, therefore the size of memory needed would be sizeof(double) * nnz ~ 420 Gb.

you may solve this case by using iparm(60) = 1 or 2 ( hybrid or OOC modes). Please refer to the Developer Reference manual for details.

0 Kudos
Oleg_S_
Beginner
1,107 Views

Thank you for response. Will it be enough memory if I use 5 nodes? 10? How do Cluster Pardiso allocate memory? 

What about my first question?

0 Kudos
Robin_T_
Novice
1,107 Views

Hello,

I have a problem causing the same error message, but using iparm (60) = 1, so OOC mode is enabled. Here is the log:

ooc_max_core_size got from config file=200000
=== PARDISO is running in Out-Of-Core mode, because iparam(60)=1 and there is not enough RAM for In-Core ===
*** Error in PARDISO  (     insufficient_memory) error_num= 8
*** Error in PARDISO memory allocation: FACTORIZE_SOLVING_LU_DATA, allocation of 136172344 bytes failed
total memory wanted here: 195700928 kbyte

=== PARDISO: solving a symmetric indefinite system ===
1-based array indexing is turned ON
PARDISO double precision computation is turned ON
Parallel METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON


Summary: ( starting phase is reordering, ending phase is solution )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 37.209805 s
Time spent in reordering of the initial matrix (reorder)         : 0.174363 s
Time spent in symbolic factorization (symbfct)                   : 244.759732 s
Time spent in data preparations for factorization (parlist)      : 7.739215 s
Time spent in allocation of internal data structures (malloc)    : 1371.208639 s
Time spent in additional calculations                            : 230.071368 s
Total time spent                                                 : 1891.163122 s
==============================================================
----------- Out of core time (in percent (%)) --------------
Factorization step (100 (%)):
      write to files : 0 %
      read from files: 0 %
      factorization - write&read: 100 %
Solution step (100 (%)):
      read from files: 0 %
      solve - write&read: 100 %
Total time (100 (%)):
      read from files: 0 %
      total - write&read: 100 %
----------- Out of core Mb --------------
Factorization step:
      write to files :      0.000 Mb
      read from files:      0.000 Mb
Solution step:
      read from files:      0.000 Mb
Total size of data transferred:
      write&read     :      0.000 Mb
==============================================================

Statistics:
===========
Parallel Direct Factorization is running on 16 OpenMP

< Linear system Ax = b >
             number of equations:           16651601
             number of non-zeros in A:      1070209671
             number of non-zeros in A (%): 0.000386

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 192
             number of independent subgraphs:  0
             number of supernodes:                    3858946
             size of largest supernode:               76371
             number of non-zeros in L:                24985066246
             number of non-zeros in U:                1
             number of non-zeros in L+U:              24985066247
             gflop   for the numerical factorization: 700621.967324

I think this is strange because these large arrays should be stored out of core.

Also you write

Solver reports you the number of nnz aftere factorization is nnz ~ 57*10^9

Where do you find that number? I find in Oleg's error message

number of non-zeros in L+U:              105303635176

which would give a necessary memory of about 840 GB which does not correspond to the 420 GB the solver says it needs.

In the case of my problem, 

number of non-zeros in L+U:              24985066247

which would require about 192 GB of memory. That corresponds to the

total memory wanted here: 195700928 kbyte

which is about 195 GB. So in my case, those numbers match. Also I do not understand why 195 GB is too much when 

ooc_max_core_size

is set to 200 GB.

Thanks in advance for your help.

0 Kudos
Reply