MKL PARDISO statistics

xian-zhong_guous_cd- · ‎01-24-2011

From the following PARDISO statistics, it seems solving for rhs (solve to solve) is domainant. This does not seem to consistent with my expection: I expect factorization (factorize to factorize) is domainant.

PARDISO stattistics starts here:

ooc_max_core_size got by Env = 20000

The file ./pardiso_ooc.cfg was not opened

=== PARDISO is running in Out-Of-Core mode, because iparam(60)=1 and there is not enough RAM for In-Core ===

================ PARDISO: solving a symm. posit. def. system ================

Summary PARDISO: ( reorder to reorder )

================

Times:

======

Time fulladj: 0.684705 s

Time reorder: 89.446126 s

Time symbfct: 9.093952 s

Time parlist: 23.178850 s

Time malloc : 1.119903 s

Time total : 155.063945 s total - sum: 31.540409 s

Statistics:

===========

< Parallel Direct Factorization with #processors: > 8

< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>

#equations: 5389726

#non-zeros in A: 42250735

non-zeros in A (%): 0.000145

#right-hand sides: 1

< Factors L and U >

#columns for each panel: 96

#independent subgraphs: 0

< Preprocessing with state of the art partitioning metis>

#supernodes: 2241484

size of largest supernode: 9951

number of nonzeros in L 3046299967

number of nonzeros in U 1

number of nonzeros in L+U 3046299968

Percentage of computed non-zeros for LL^T factorization

0 % 1 % 2 % 3 % 4 % 5 % 6 % 7 % 8 % 9 % 10 % 11 % 12 % 13 % 14 % 15 % 16 % 17 % 18 % 19 % 20 % 21 % 22 % 23 % 24 % 25 % 26 % 27 % 28 % 29 % 30 % 31 % 32 % 33 % 34 % 35 % 36 % 37 % 38 % 39 % 40 % 41 % 42 % 43 % 44 % 45 % 46 % 47 % 48 % 49 % 50 % 51 % 52 % 53 % 54 % 55 % 56 % 57 % 58 % 59 % 60 % 61 % 62 % 63 % 64 % 65 % 66 % 67 % 68 % 69 % 70 % 71 % 72 % 73 % 74 % 75 % 76 % 77 % 78 % 79 % 80 % 81 % 82 % 83 % 84 % 85 % 86 % 87 % 88 % 89 % 90 % 91 % 92 % 93 % 94 % 95 % 96 % 97 % 98 % 100 %

================ PARDISO: solving a symm. posit. def. system ================

Summary PARDISO: ( factorize to factorize )

================

Times:

======

Time A to LU: 0.000000 s

Factorization: Time for writing to files : 0.000000

Factorization: Time for reading from files : 0.000000

Time numfct : 663.914768 s

Time malloc : 0.000054 s

Time total : 663.954133 s total - sum: 0.039310 s

Statistics:

===========

< Parallel Direct Factorization with #processors: > 8

< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>

#equations: 5389726

#non-zeros in A: 42250735

non-zeros in A (%): 0.000145

#right-hand sides: 1

< Factors L and U >

#columns for each panel: 96

#independent subgraphs: 0

< Preprocessing with state of the art partitioning metis>

#supernodes: 2241484

size of largest supernode: 9951

number of nonzeros in L 3046299967

number of nonzeros in U 1

number of nonzeros in L+U 3046299968

gflop for the numerical factorization: 11175.376693

gflop/s for the numerical factorization: 16.832547

================ PARDISO: solving a symm. posit. def. system ================

Summary PARDISO: ( solve to solve )

================

Times:

======

Solution: Time for reading from files : 0.000000

Time solve : 716.844178 s

Time total : 1867.447714 s total - sum: 1150.603536 s

==============================================================

----------- Out of core time (in percent (%)) --------------

Factorization step (100 (%)):

write to files : 0

read from files: 0

factorization - write&read : 100

Solution step (100 (%)):

read from files: 0

solve - write&read: 100

Total time (100 (%)):

read from files: 0

total - write&read: 100

----------- Out of core Mb --------------

Factorization step:

write to files : 0.000 Mb

read from files: 0.000 Mb

Solution step:

read from files: 0.000 Mb

Total size of data transferred :

write&read : 0.000 Mb

==============================================================

Statistics:

===========

< Parallel Direct Factorization with #processors: > 8

< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>

#equations: 5389726

#non-zeros in A: 42250735

non-zeros in A (%): 0.000145

#right-hand sides: 1

< Factors L and U >

#columns for each panel: 96

#independent subgraphs: 0

< Preprocessing with state of the art partitioning metis>

#supernodes: 2241484

size of largest supernode: 9951

number of nonzeros in L 3046299967

number of nonzeros in U 1

number of nonzeros in L+U 3046299968

gflop for the numerical factorization: 11175.376693

gflop/s for the numerical factorization: 16.832547

Sergey_Solovev__Inte · ‎01-25-2011

Hi,

In In-Core mode solving step is faster than factorization one.

In OOC mode following situation can be happened: factorisation step doesnt read L-factors, just saves. Solving step reads all L-factor twice: at first during forward step, secondly during backward one. Let me note, if iterative refinement step (iparm(8)) is switch on, so solving step can be significantly slower because PARDISO should read L-factors several times.

Regards, Sergey

eh4 · ‎11-16-2011

Hi,
May I know how you got Pardiso to run with 8 processors? Of couse you need to have 8 physical processors. My machine has 4 processors and I have the line

CALL OMP_SET_NUM_THREADS(4)

in my subroutine. However, when I run it, the report says that the number of processors used in the factorization is 1. When I printedomp_get_max_threads() it shows value of 4.

Am I missing something?

Thank you.

Sincerely,
EH

Sergey_Solovev__Inte · ‎11-16-2011

What the mode of PARDISO do youuse: InCore or OOC ? And what the version of MKL?
Our quick advice is to use call mkl_set_num_threads(4) for setting MKL_NUM_THREADS environment variable.
Thank you!
Sergey