Community
cancel
Showing results for
Did you mean:
Beginner
81 Views

## PARDISO performance is not consistent

I have a SPD matrix (see below for detail). Running 8 threads and 16G memory, factorization sometimes takes 314 s (e.g. Run 1) and sometime 36 s (e.g. Run 2) on same machine (see below for hardware detail). Any idea what's going on?

Matrix has been attached:

line 1: # of equations

line 2: index base

line 3: # of nonzeros

followed by triplet format (i,j,x)

Run 1:

=== PARDISO: solving a symmetric positive definite system ===
MKLPARDISO::numericalFact starts Wed Apr 23 15:57:22 2014

The local (internal) PARDISO version is                          : 103911000
1-based array indexing is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON

Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.510406 s
Time spent in reordering of the initial matrix (reorder)         : 10.586863 s
Time spent in symbolic factorization (symbfct)                   : 4.795883 s
Time spent in data preparations for factorization (parlist)      : 0.045271 s
Time spent in allocation of internal data structures (malloc)    : 0.072302 s
Time spent in additional calculations                            : 3.339262 s
Total time spent                                                 : 19.349987 s

Statistics:
===========
< Parallel Direct Factorization with number of processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b >
number of equations:           1576740
number of non-zeros in A:      56993250
number of non-zeros in A (%): 0.002292

number of right-hand sides:    0

< Factors L and U >
number of columns for each panel: 192
number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
number of supernodes:                    171522
size of largest supernode:               10602
number of non-zeros in L:                1336768222
number of non-zeros in U:                1
number of non-zeros in L+U:              1336768223
=== PARDISO is running in In-Core mode, because iparam(60)=1 and there is enough RAM for In-Core ===
Percentage of computed non-zeros for LL^T factorization
0 %  1 %  2 %  3 %  4 %  5 %  6 %  7 %  8 %  9 %  10 %  11 %  12 %  13 %  14 %  15 %  16 %  17 %  18 %  19 %  20 %  21 %  22 %  23 %  24 %  25 %  26 %  27 %  28 %  29 %  30 %  31 %  32 %  33 %  34 %  35 %  36 %  37 %  38 %  39 %  40 %  41 %  42 %  43 %  44 %  45 %  46 %  47 %  48 %  49 %  50 %  51 %  52 %  53 %  54 %  55 %  56 %  57 %  58 %  59 %  60 %  61 %  62 %  63 %  64 %  65 %  66 %  67 %  68 %  69 %  70 %  71 %  72 %  73 %  74 %  75 %  76 %  77 %  78 %  79 %  80 %  81 %  83 %  84 %  85 %  87 %  88 %  89 %  90 %  91 %  92 %  93 %  94 %  95 %  96 %  97 %  98 %  99 %  100 %
MKLPARDISO::solve starts Wed Apr 23 16:02:37 2014

=== PARDISO: solving a symmetric positive definite system ===
Single-level factorization algorithm is turned ON

Summary: ( factorization phase )
================

Times:
======
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct)                        : 314.881940 s
Time spent in allocation of internal data structures (malloc)    : 0.000198 s
Time spent in additional calculations                            : 0.000002 s
Total time spent                                                 : 314.882140 s

Statistics:
===========
< Parallel Direct Factorization with number of processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b >
number of equations:           1576740
number of non-zeros in A:      56993250
number of non-zeros in A (%): 0.002292

number of right-hand sides:    0

< Factors L and U >
number of columns for each panel: 192
number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
number of supernodes:                    171522
size of largest supernode:               10602
number of non-zeros in L:                1336768222
number of non-zeros in U:                1
number of non-zeros in L+U:              1336768223
gflop   for the numerical factorization: 4695.075325

gflop/s for the numerical factorization: 14.910589

=== PARDISO: solving a symmetric positive definite system ===

MKLPARDISO::solve ends Wed Apr 23 16:02:49 2014

Summary: ( solution phase )
================

Times:
======
Time spent in direct solver at solve step (solve)                : 4.046562 s
Time spent in additional calculations                            : 8.368093 s
Total time spent                                                 : 12.414655 s

Statistics:
===========
< Parallel Direct Factorization with number of processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b >
number of equations:           1576740
number of non-zeros in A:      56993250
number of non-zeros in A (%): 0.002292

number of right-hand sides:    1

< Factors L and U >
number of columns for each panel: 192
number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
number of supernodes:                    171522
size of largest supernode:               10602
number of non-zeros in L:                1336768222
number of non-zeros in U:                1
number of non-zeros in L+U:              1336768223
gflop   for the numerical factorization: 4695.075325

gflop/s for the numerical factorization: 14.910589

Run 2:

=== PARDISO: solving a symmetric positive definite system ===
The local (internal) PARDISO version is                          : 103911000
MKLPARDISO::numericalFact starts Mon Apr 21 14:49:46 2014

1-based array indexing is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON

Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.426923 s
Time spent in reordering of the initial matrix (reorder)         : 10.688333 s
Time spent in symbolic factorization (symbfct)                   : 3.410566 s
Time spent in data preparations for factorization (parlist)      : 0.046608 s
Time spent in allocation of internal data structures (malloc)    : 0.072816 s
Time spent in additional calculations                            : 3.617951 s
Total time spent                                                 : 18.263197 s

Statistics:
===========
< Parallel Direct Factorization with number of processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b >
number of equations:           1576740
number of non-zeros in A:      56993250
number of non-zeros in A (%): 0.002292

number of right-hand sides:    0

< Factors L and U >
number of columns for each panel: 192
number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
number of supernodes:                    171522
size of largest supernode:               10602
number of non-zeros in L:                1336768222
number of non-zeros in U:                1
number of non-zeros in L+U:              1336768223
=== PARDISO is running in In-Core mode, because iparam(60)=1 and there is enough RAM for In-Core ===
Percentage of computed non-zeros for LL^T factorization
0 %  1 %  2 %  3 %  4 %  5 %  6 %  7 %  8 %  9 %  10 %  11 %  12 %  13 %  14 %  15 %  16 %  17 %  18 %  19 %  20 %  21 %  22 %  23 %  24 %  25 %  26 %  27 %  28 %  29 %  30 %  31 %  32 %  33 %  34 %  35 %  36 %  37 %  38 %  39 %  40 %  41 %  42 %  43 %  44 %  45 %  46 %  47 %  48 %  49 %  50 %  51 %  52 %  53 %  54 %  55 %  56 %  57 %  58 %  59 %  60 %  61 %  62 %  63 %  64 %  65 %  66 %  67 %  68 %  69 %  70 %  71 %  72 %  73 %  74 %  75 %  76 %  77 %  78 %  80 %  81 %  82 %  83 %  85 %  86 %  87 %  88 %  89 %  91 %  92 %  93 %  94 %  95 %  96 %  97 %  98 %  99 %  100 %
MKLPARDISO::solve starts Mon Apr 21 14:50:23 2014

=== PARDISO: solving a symmetric positive definite system ===
Single-level factorization algorithm is turned ON

Summary: ( factorization phase )
================

Times:
======
Time spent in copying matrix to internal data structure (A to LU): 0.000001 s

Time spent in factorization step (numfct)                        : 36.840021 s
Time spent in allocation of internal data structures (malloc)    : 0.000220 s
Time spent in additional calculations                            : 0.000002 s
Total time spent                                                 : 36.840244 s

Statistics:
===========
< Parallel Direct Factorization with number of processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b >
number of equations:           1576740
number of non-zeros in A:      56993250
number of non-zeros in A (%): 0.002292

number of right-hand sides:    0

< Factors L and U >
number of columns for each panel: 192
number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
number of supernodes:                    171522
size of largest supernode:               10602
number of non-zeros in L:                1336768222
number of non-zeros in U:                1
number of non-zeros in L+U:              1336768223
gflop   for the numerical factorization: 4695.075325

gflop/s for the numerical factorization: 127.444969

=== PARDISO: solving a symmetric positive definite system ===
MKLPARDISO::solve ends Mon Apr 21 14:50:25 2014

Summary: ( solution phase )
================

Times:
======
Time spent in direct solver at solve step (solve)                : 0.829546 s
Time spent in additional calculations                            : 1.730757 s
Total time spent                                                 : 2.560303 s

Statistics:
===========
< Parallel Direct Factorization with number of processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b >
number of equations:           1576740
number of non-zeros in A:      56993250
number of non-zeros in A (%): 0.002292

number of right-hand sides:    1

< Factors L and U >
number of columns for each panel: 192
number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>

number of supernodes:                    171522
size of largest supernode:               10602
number of non-zeros in L:                1336768222
number of non-zeros in U:                1
number of non-zeros in L+U:              1336768223
gflop   for the numerical factorization: 4695.075325

gflop/s for the numerical factorization: 127.444969

Hardware:

Operating System:   Linux 2.6.32-431.3.1.el6.x86_64 (CentOS 6.3)

Average Load:       1.01 1.39 2.14 (average over last 1min, 5min & 15min)
CPU Type:           Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz (x86_64)
CPU Count:          16 (8 cores/socket, Hyper-threading enabled)
CPU Clock:          1200 MHz
CPU Cache:          20480 KB (L2)
Physical Memory:    64367 MB
Swap Space:         48000 MB
Graphics device:    NVIDIA Device 11fa (rev a1)
ls: cannot access /proc/ide: No such file or directory
SCSI CD 0,0,0,0:    /dev/scd0 ()
SCSI Disk 0,0,0,0:  /dev/sda  ()
()
SCSI Disk 0,0,0,0:  /dev/sda  ()
()

4 Replies
Moderator
81 Views

thanks for the issue. We will check the issue on our side. Is that lp64 mode?

Beginner
81 Views

Yes.

Beginner
81 Views

I tried the production version of mkl 11.2. Unfortunately, it still produces poor performance (see mkl11.2time.txt). Interestingly, mkl 11.2 beta produces good result (see mkl11.2betatime.txt).

Beginner
81 Views

Any progress on this issue?