Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library
- Pardiso and Cluster Pardiso Questions

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Oleg_S_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-31-2017
08:00 AM

80 Views

Pardiso and Cluster Pardiso Questions

Hello everyone!

I'm using PARDISO to solve Navier Stoks and Temperature equations in one program.

I use phase 11 only at program start to tell solver what my matrixes look like. I need to refill both matrixes, make factorization and find solution at every time layer. To do so I use *maxfct *parameter equal 2 and change *mnum *from 1 to 2 for different equations. I also use 2 different *pt *arrays for different equations. I need to mention that I use phase 0 after each time layer in order to free memory.

I want to accelerate computations by using several cluster nodes. I found Cluster version of PARDISO and it was a surprise for me that *maxfct *and *mnum* parameters are ignored. I also didnt find phase 0.

My first question is: do I really need those parameters to solve my problem on a cluster? I dont want to use phase 11 at every time layer because it will be too slow.

Secondly I'm having a problem using PARDISO on a single cluster node when the number of equations is ~ 13 million. I recieve error -2. Output below I get after phase 11 for both matrixes and phase 23 for Navier Stoks equation.

=== PARDISO: solving a real nonsymmetric system === 0-based array is turned ON PARDISO double precision computation is turned ON Parallel METIS algorithm at reorder step is turned ON Scaling is turned ON Matching is turned ON Summary: ( reordering phase ) ================ Times: ====== Time spent in calculations of symmetric matrix portrait (fulladj): 1.680254 s Time spent in reordering of the initial matrix (reorder) : 15.632488 s Time spent in symbolic factorization (symbfct) : 128.883790 s Time spent in data preparations for factorization (parlist) : 2.423303 s Time spent in allocation of internal data structures (malloc) : 168.633973 s Time spent in additional calculations : 13.837179 s Total time spent : 331.090987 s Statistics: =========== Parallel Direct Factorization is running on 28 OpenMP < Linear system Ax = b > number of equations: 13198336 number of non-zeros in A: 107134147 number of non-zeros in A (%): 0.000062 number of right-hand sides: 1 < Factors L and U > number of columns for each panel: 72 number of independent subgraphs: 0 number of supernodes: 6792104 size of largest supernode: 56189 number of non-zeros in L: 52781978834 number of non-zeros in U: 52521656342 number of non-zeros in L+U: 105303635176 === PARDISO: solving a real nonsymmetric system === 0-based array is turned ON PARDISO double precision computation is turned ON Parallel METIS algorithm at reorder step is turned ON Scaling is turned ON Matching is turned ON Summary: ( reordering phase ) ================ Times: ====== Time spent in calculations of symmetric matrix portrait (fulladj): 0.309069 s Time spent in reordering of the initial matrix (reorder) : 2.541601 s Time spent in symbolic factorization (symbfct) : 2.516607 s Time spent in data preparations for factorization (parlist) : 0.269381 s Time spent in allocation of internal data structures (malloc) : 1.306915 s Time spent in additional calculations : 1.014548 s Total time spent : 7.958121 s Statistics: =========== Parallel Direct Factorization is running on 28 OpenMP < Linear system Ax = b > number of equations: 3368499 number of non-zeros in A: 22736533 number of non-zeros in A (%): 0.000200 number of right-hand sides: 1 < Factors L and U > number of columns for each panel: 72 number of independent subgraphs: 0 number of supernodes: 2244305 size of largest supernode: 13571 number of non-zeros in L: 2906625075 number of non-zeros in U: 2863523632 number of non-zeros in L+U: 5770148707 === PARDISO is running in In-Core mode, because iparam(60)=0 === *** Error in PARDISO ( insufficient_memory) error_num= 8 *** Error in PARDISO memory allocation: FACTORIZE_SOLVING_LU_DATA, allocation of 412359210 bytes failed total memory wanted here: 435961139 kbyte === PARDISO: solving a real nonsymmetric system === Summary: ( starting phase is factorization, ending phase is solution ) ================ Times: ====== Time spent in additional calculations : 0.000074 s Total time spent : 0.000074 s Statistics: =========== Parallel Direct Factorization is running on 28 OpenMP < Linear system Ax = b > number of equations: 13198336 number of non-zeros in A: 107134147 number of non-zeros in A (%): 0.000062 number of right-hand sides: 1 < Factors L and U > number of columns for each panel: 72 number of independent subgraphs: 0 number of supernodes: 6792104 size of largest supernode: 56189 number of non-zeros in L: 52781978834 number of non-zeros in U: 52521656342 number of non-zeros in L+U: 105303635176 gflop for the numerical factorization: 3841175.475158 PARDISO ERROR = -2

Do I really need so much memory to solve system or it is some kind of bug? If it is not, can this problem be solved by using cluster PARDISO?

My iparm setup is below.

IPARM(1) = 1 ! NO SOLVER DEFAULT IPARM(2) = 3 ! FILL-IN REORDERING FROM METIS IPARM(3) = 0 ! NUMBERS OF PROCESSORS IPARM(4) = 0 ! NO ITERATIVE-DIRECT ALGORITHM IPARM(5) = 0 ! NO USER FILL-IN REDUCING PERMUTATION IPARM(6) = 0 ! =0 SOLUTION ON THE FIRST N COMPONENTS OF X IPARM(7) = 0 ! NOT IN USE IPARM(8) = 5 ! NUMBERS OF ITERATIVE REFINEMENT STEPS IPARM(9) = 0 ! NOT IN USE IPARM(10) = 16 ! PERTURB THE PIVOT ELEMENTS WITH 1E-13 IPARM(11) = 1 ! USE NONSYMMETRIC PERMUTATION AND SCALING MPS IPARM(12) = 0 ! NOT IN USE IPARM(13) = 1 ! MAXIMUM WEIGHTED MATCHING ALGORITHM IS SWITCHED-ON (DEFAULT FOR NON-SYMMETRIC) IPARM(14) = 0 ! OUTPUT: NUMBER OF PERTURBED PIVOTS IPARM(15) = 0 ! NOT IN USE IPARM(16) = 0 ! NOT IN USE IPARM(17) = 0 ! NOT IN USE IPARM(18) = -1 ! OUTPUT: NUMBER OF NONZEROS IN THE FACTOR LU IPARM(19) = -1 ! OUTPUT: MFLOPS FOR LU FACTORIZATION IPARM(20) = 0 ! OUTPUT: NUMBERS OF CG ITERATIONS IPARM(24) = 0 IPARM(34) = 0 IPARM(27) = 0 IPARM(35) = 1 ! ZERO BASE INDEXING IPARM(39) = 0

Im using Parallel Studio XE 2017.4.196 on a single cluster node. Each node has 2 х Intel Xeon E5-2690 v4 and 256GB of RAM and it runs on CentOS 7.3

Thanks in advance for your help.

Link Copied

3 Replies

Gennady_F_Intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-31-2017
09:07 AM

80 Views

>> Do I really need so much memory to solve system or it is some kind of bug? If it is not, can this problem be solved by using cluster PARDISO?

yes, this is not a bug. Solver reports you the number of nnz aftere factorization is nnz ~ 57*10^9, therefore the size of memory needed would be sizeof(double) * nnz ~ 420 Gb.

you may solve this case by using iparm(60) = 1 or 2 ( hybrid or OOC modes). Please refer to the Developer Reference manual for details.

Oleg_S_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-01-2017
11:07 AM

80 Views

Thank you for response. Will it be enough memory if I use 5 nodes? 10? How do Cluster Pardiso allocate memory?

What about my first question?

Robin_T_

Novice

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-01-2017
01:40 AM

80 Views

Hello,

I have a problem causing the same error message, but using iparm (60) = 1, so OOC mode is enabled. Here is the log:

ooc_max_core_size got from config file=200000 === PARDISO is running in Out-Of-Core mode, because iparam(60)=1 and there is not enough RAM for In-Core === *** Error in PARDISO ( insufficient_memory) error_num= 8 *** Error in PARDISO memory allocation: FACTORIZE_SOLVING_LU_DATA, allocation of 136172344 bytes failed total memory wanted here: 195700928 kbyte === PARDISO: solving a symmetric indefinite system === 1-based array indexing is turned ON PARDISO double precision computation is turned ON Parallel METIS algorithm at reorder step is turned ON Scaling is turned ON Matching is turned ON Summary: ( starting phase is reordering, ending phase is solution ) ================ Times: ====== Time spent in calculations of symmetric matrix portrait (fulladj): 37.209805 s Time spent in reordering of the initial matrix (reorder) : 0.174363 s Time spent in symbolic factorization (symbfct) : 244.759732 s Time spent in data preparations for factorization (parlist) : 7.739215 s Time spent in allocation of internal data structures (malloc) : 1371.208639 s Time spent in additional calculations : 230.071368 s Total time spent : 1891.163122 s ============================================================== ----------- Out of core time (in percent (%)) -------------- Factorization step (100 (%)): write to files : 0 % read from files: 0 % factorization - write&read: 100 % Solution step (100 (%)): read from files: 0 % solve - write&read: 100 % Total time (100 (%)): read from files: 0 % total - write&read: 100 % ----------- Out of core Mb -------------- Factorization step: write to files : 0.000 Mb read from files: 0.000 Mb Solution step: read from files: 0.000 Mb Total size of data transferred: write&read : 0.000 Mb ============================================================== Statistics: =========== Parallel Direct Factorization is running on 16 OpenMP < Linear system Ax = b > number of equations: 16651601 number of non-zeros in A: 1070209671 number of non-zeros in A (%): 0.000386 number of right-hand sides: 1 < Factors L and U > number of columns for each panel: 192 number of independent subgraphs: 0 number of supernodes: 3858946 size of largest supernode: 76371 number of non-zeros in L: 24985066246 number of non-zeros in U: 1 number of non-zeros in L+U: 24985066247 gflop for the numerical factorization: 700621.967324

I think this is strange because these large arrays should be stored out of core.

Also you write

Solver reports you the number of nnz aftere factorization is nnz ~ 57*10^9

Where do you find that number? I find in Oleg's error message

number of non-zeros in L+U: 105303635176

which would give a necessary memory of about 840 GB which does not correspond to the 420 GB the solver says it needs.

In the case of my problem,

number of non-zeros in L+U: 24985066247

which would require about 192 GB of memory. That corresponds to the

total memory wanted here: 195700928 kbyte

which is about 195 GB. So in my case, those numbers match. Also I do not understand why 195 GB is too much when

ooc_max_core_size

is set to 200 GB.

Thanks in advance for your help.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.