Community
cancel
Showing results for 
Search instead for 
Did you mean: 
81 Views

PARDISO performance is not consistent

I have a SPD matrix (see below for detail). Running 8 threads and 16G memory, factorization sometimes takes 314 s (e.g. Run 1) and sometime 36 s (e.g. Run 2) on same machine (see below for hardware detail). Any idea what's going on? 

 

Matrix has been attached:

line 1: # of equations

line 2: index base

line 3: # of nonzeros

followed by triplet format (i,j,x) 

Run 1:

=== PARDISO: solving a symmetric positive definite system ===
MKLPARDISO::numericalFact starts Wed Apr 23 15:57:22 2014

The local (internal) PARDISO version is                          : 103911000
1-based array indexing is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON


Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.510406 s
Time spent in reordering of the initial matrix (reorder)         : 10.586863 s
Time spent in symbolic factorization (symbfct)                   : 4.795883 s
Time spent in data preparations for factorization (parlist)      : 0.045271 s
Time spent in allocation of internal data structures (malloc)    : 0.072302 s
Time spent in additional calculations                            : 3.339262 s
Total time spent                                                 : 19.349987 s

Statistics:
===========
< Parallel Direct Factorization with number of processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b >
             number of equations:           1576740
             number of non-zeros in A:      56993250
             number of non-zeros in A (%): 0.002292

             number of right-hand sides:    0

< Factors L and U >
             number of columns for each panel: 192
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    171522
             size of largest supernode:               10602
             number of non-zeros in L:                1336768222
             number of non-zeros in U:                1
             number of non-zeros in L+U:              1336768223
=== PARDISO is running in In-Core mode, because iparam(60)=1 and there is enough RAM for In-Core ===
Percentage of computed non-zeros for LL^T factorization
 0 %  1 %  2 %  3 %  4 %  5 %  6 %  7 %  8 %  9 %  10 %  11 %  12 %  13 %  14 %  15 %  16 %  17 %  18 %  19 %  20 %  21 %  22 %  23 %  24 %  25 %  26 %  27 %  28 %  29 %  30 %  31 %  32 %  33 %  34 %  35 %  36 %  37 %  38 %  39 %  40 %  41 %  42 %  43 %  44 %  45 %  46 %  47 %  48 %  49 %  50 %  51 %  52 %  53 %  54 %  55 %  56 %  57 %  58 %  59 %  60 %  61 %  62 %  63 %  64 %  65 %  66 %  67 %  68 %  69 %  70 %  71 %  72 %  73 %  74 %  75 %  76 %  77 %  78 %  79 %  80 %  81 %  83 %  84 %  85 %  87 %  88 %  89 %  90 %  91 %  92 %  93 %  94 %  95 %  96 %  97 %  98 %  99 %  100 % 
MKLPARDISO::solve starts Wed Apr 23 16:02:37 2014


=== PARDISO: solving a symmetric positive definite system ===
Single-level factorization algorithm is turned ON


Summary: ( factorization phase )
================

Times:
======
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct)                        : 314.881940 s
Time spent in allocation of internal data structures (malloc)    : 0.000198 s
Time spent in additional calculations                            : 0.000002 s
Total time spent                                                 : 314.882140 s

Statistics:
===========
< Parallel Direct Factorization with number of processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b >
             number of equations:           1576740
             number of non-zeros in A:      56993250
             number of non-zeros in A (%): 0.002292

             number of right-hand sides:    0

< Factors L and U >
             number of columns for each panel: 192
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    171522
             size of largest supernode:               10602
             number of non-zeros in L:                1336768222
             number of non-zeros in U:                1
             number of non-zeros in L+U:              1336768223
             gflop   for the numerical factorization: 4695.075325

             gflop/s for the numerical factorization: 14.910589


=== PARDISO: solving a symmetric positive definite system ===


MKLPARDISO::solve ends Wed Apr 23 16:02:49 2014

Summary: ( solution phase )
================

Times:
======
Time spent in direct solver at solve step (solve)                : 4.046562 s
Time spent in additional calculations                            : 8.368093 s
Total time spent                                                 : 12.414655 s

Statistics:
===========
< Parallel Direct Factorization with number of processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b >
             number of equations:           1576740
             number of non-zeros in A:      56993250
             number of non-zeros in A (%): 0.002292

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 192
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    171522
             size of largest supernode:               10602
             number of non-zeros in L:                1336768222
             number of non-zeros in U:                1
             number of non-zeros in L+U:              1336768223
             gflop   for the numerical factorization: 4695.075325

             gflop/s for the numerical factorization: 14.910589

 

Run 2:

=== PARDISO: solving a symmetric positive definite system ===
The local (internal) PARDISO version is                          : 103911000
MKLPARDISO::numericalFact starts Mon Apr 21 14:49:46 2014

1-based array indexing is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON


Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.426923 s
Time spent in reordering of the initial matrix (reorder)         : 10.688333 s
Time spent in symbolic factorization (symbfct)                   : 3.410566 s
Time spent in data preparations for factorization (parlist)      : 0.046608 s
Time spent in allocation of internal data structures (malloc)    : 0.072816 s
Time spent in additional calculations                            : 3.617951 s
Total time spent                                                 : 18.263197 s

Statistics:
===========
< Parallel Direct Factorization with number of processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b >
             number of equations:           1576740
             number of non-zeros in A:      56993250
             number of non-zeros in A (%): 0.002292

             number of right-hand sides:    0

< Factors L and U >
             number of columns for each panel: 192
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    171522
             size of largest supernode:               10602
             number of non-zeros in L:                1336768222
             number of non-zeros in U:                1
             number of non-zeros in L+U:              1336768223
=== PARDISO is running in In-Core mode, because iparam(60)=1 and there is enough RAM for In-Core ===
Percentage of computed non-zeros for LL^T factorization
 0 %  1 %  2 %  3 %  4 %  5 %  6 %  7 %  8 %  9 %  10 %  11 %  12 %  13 %  14 %  15 %  16 %  17 %  18 %  19 %  20 %  21 %  22 %  23 %  24 %  25 %  26 %  27 %  28 %  29 %  30 %  31 %  32 %  33 %  34 %  35 %  36 %  37 %  38 %  39 %  40 %  41 %  42 %  43 %  44 %  45 %  46 %  47 %  48 %  49 %  50 %  51 %  52 %  53 %  54 %  55 %  56 %  57 %  58 %  59 %  60 %  61 %  62 %  63 %  64 %  65 %  66 %  67 %  68 %  69 %  70 %  71 %  72 %  73 %  74 %  75 %  76 %  77 %  78 %  80 %  81 %  82 %  83 %  85 %  86 %  87 %  88 %  89 %  91 %  92 %  93 %  94 %  95 %  96 %  97 %  98 %  99 %  100 %
MKLPARDISO::solve starts Mon Apr 21 14:50:23 2014


=== PARDISO: solving a symmetric positive definite system ===
Single-level factorization algorithm is turned ON


Summary: ( factorization phase )
================

Times:
======
Time spent in copying matrix to internal data structure (A to LU): 0.000001 s


Time spent in factorization step (numfct)                        : 36.840021 s
Time spent in allocation of internal data structures (malloc)    : 0.000220 s
Time spent in additional calculations                            : 0.000002 s
Total time spent                                                 : 36.840244 s

Statistics:
===========
< Parallel Direct Factorization with number of processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b >
             number of equations:           1576740
             number of non-zeros in A:      56993250
             number of non-zeros in A (%): 0.002292

             number of right-hand sides:    0

< Factors L and U >
             number of columns for each panel: 192
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    171522
             size of largest supernode:               10602
             number of non-zeros in L:                1336768222
             number of non-zeros in U:                1
             number of non-zeros in L+U:              1336768223
             gflop   for the numerical factorization: 4695.075325

             gflop/s for the numerical factorization: 127.444969


=== PARDISO: solving a symmetric positive definite system ===
MKLPARDISO::solve ends Mon Apr 21 14:50:25 2014

 

Summary: ( solution phase )
================

Times:
======
Time spent in direct solver at solve step (solve)                : 0.829546 s
Time spent in additional calculations                            : 1.730757 s
Total time spent                                                 : 2.560303 s

Statistics:
===========
< Parallel Direct Factorization with number of processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b >
             number of equations:           1576740
             number of non-zeros in A:      56993250
             number of non-zeros in A (%): 0.002292

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 192
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>


             number of supernodes:                    171522
             size of largest supernode:               10602
             number of non-zeros in L:                1336768222
             number of non-zeros in U:                1
             number of non-zeros in L+U:              1336768223
             gflop   for the numerical factorization: 4695.075325

             gflop/s for the numerical factorization: 127.444969

 

Hardware:

 Operating System:   Linux 2.6.32-431.3.1.el6.x86_64 (CentOS 6.3)

 Average Load:       1.01 1.39 2.14 (average over last 1min, 5min & 15min)
 CPU Type:           Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz (x86_64)
 CPU Addressability: 64bit
 CPU Count:          16 (8 cores/socket, Hyper-threading enabled)
 CPU Clock:          1200 MHz
 CPU Cache:          20480 KB (L2)
 Physical Memory:    64367 MB
 Swap Space:         48000 MB
 Graphics device:    NVIDIA Device 11fa (rev a1)
ls: cannot access /proc/ide: No such file or directory
 SCSI CD 0,0,0,0:    /dev/scd0 ()
 SCSI Disk 0,0,0,0:  /dev/sda  ()
()
 SCSI Disk 0,0,0,0:  /dev/sda  ()
()

 

0 Kudos
4 Replies
Gennady_F_Intel
Moderator
81 Views

thanks for the issue. We will check the issue on our side. Is that lp64 mode?

81 Views

Yes. 

81 Views

I tried the production version of mkl 11.2. Unfortunately, it still produces poor performance (see mkl11.2time.txt). Interestingly, mkl 11.2 beta produces good result (see mkl11.2betatime.txt).

81 Views

Any progress on this issue?

Reply