Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

PARDISO performance is not consistent

xian-zhong_guous_cd-
618 Views

I have a SPD matrix (see below for detail). Running 8 threads and 16G memory, factorization sometimes takes 314 s (e.g. Run 1) and sometime 36 s (e.g. Run 2) on same machine (see below for hardware detail). Any idea what's going on? 

 

Matrix has been attached:

line 1: # of equations

line 2: index base

line 3: # of nonzeros

followed by triplet format (i,j,x) 

Run 1:

=== PARDISO: solving a symmetric positive definite system ===
MKLPARDISO::numericalFact starts Wed Apr 23 15:57:22 2014

The local (internal) PARDISO version is                          : 103911000
1-based array indexing is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON


Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.510406 s
Time spent in reordering of the initial matrix (reorder)         : 10.586863 s
Time spent in symbolic factorization (symbfct)                   : 4.795883 s
Time spent in data preparations for factorization (parlist)      : 0.045271 s
Time spent in allocation of internal data structures (malloc)    : 0.072302 s
Time spent in additional calculations                            : 3.339262 s
Total time spent                                                 : 19.349987 s

Statistics:
===========
< Parallel Direct Factorization with number of processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b >
             number of equations:           1576740
             number of non-zeros in A:      56993250
             number of non-zeros in A (%): 0.002292

             number of right-hand sides:    0

< Factors L and U >
             number of columns for each panel: 192
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    171522
             size of largest supernode:               10602
             number of non-zeros in L:                1336768222
             number of non-zeros in U:                1
             number of non-zeros in L+U:              1336768223
=== PARDISO is running in In-Core mode, because iparam(60)=1 and there is enough RAM for In-Core ===
Percentage of computed non-zeros for LL^T factorization
 0 %  1 %  2 %  3 %  4 %  5 %  6 %  7 %  8 %  9 %  10 %  11 %  12 %  13 %  14 %  15 %  16 %  17 %  18 %  19 %  20 %  21 %  22 %  23 %  24 %  25 %  26 %  27 %  28 %  29 %  30 %  31 %  32 %  33 %  34 %  35 %  36 %  37 %  38 %  39 %  40 %  41 %  42 %  43 %  44 %  45 %  46 %  47 %  48 %  49 %  50 %  51 %  52 %  53 %  54 %  55 %  56 %  57 %  58 %  59 %  60 %  61 %  62 %  63 %  64 %  65 %  66 %  67 %  68 %  69 %  70 %  71 %  72 %  73 %  74 %  75 %  76 %  77 %  78 %  79 %  80 %  81 %  83 %  84 %  85 %  87 %  88 %  89 %  90 %  91 %  92 %  93 %  94 %  95 %  96 %  97 %  98 %  99 %  100 % 
MKLPARDISO::solve starts Wed Apr 23 16:02:37 2014


=== PARDISO: solving a symmetric positive definite system ===
Single-level factorization algorithm is turned ON


Summary: ( factorization phase )
================

Times:
======
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct)                        : 314.881940 s
Time spent in allocation of internal data structures (malloc)    : 0.000198 s
Time spent in additional calculations                            : 0.000002 s
Total time spent                                                 : 314.882140 s

Statistics:
===========
< Parallel Direct Factorization with number of processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b >
             number of equations:           1576740
             number of non-zeros in A:      56993250
             number of non-zeros in A (%): 0.002292

             number of right-hand sides:    0

< Factors L and U >
             number of columns for each panel: 192
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    171522
             size of largest supernode:               10602
             number of non-zeros in L:                1336768222
             number of non-zeros in U:                1
             number of non-zeros in L+U:              1336768223
             gflop   for the numerical factorization: 4695.075325

             gflop/s for the numerical factorization: 14.910589


=== PARDISO: solving a symmetric positive definite system ===


MKLPARDISO::solve ends Wed Apr 23 16:02:49 2014

Summary: ( solution phase )
================

Times:
======
Time spent in direct solver at solve step (solve)                : 4.046562 s
Time spent in additional calculations                            : 8.368093 s
Total time spent                                                 : 12.414655 s

Statistics:
===========
< Parallel Direct Factorization with number of processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b >
             number of equations:           1576740
             number of non-zeros in A:      56993250
             number of non-zeros in A (%): 0.002292

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 192
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    171522
             size of largest supernode:               10602
             number of non-zeros in L:                1336768222
             number of non-zeros in U:                1
             number of non-zeros in L+U:              1336768223
             gflop   for the numerical factorization: 4695.075325

             gflop/s for the numerical factorization: 14.910589

 

Run 2:

=== PARDISO: solving a symmetric positive definite system ===
The local (internal) PARDISO version is                          : 103911000
MKLPARDISO::numericalFact starts Mon Apr 21 14:49:46 2014

1-based array indexing is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON


Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.426923 s
Time spent in reordering of the initial matrix (reorder)         : 10.688333 s
Time spent in symbolic factorization (symbfct)                   : 3.410566 s
Time spent in data preparations for factorization (parlist)      : 0.046608 s
Time spent in allocation of internal data structures (malloc)    : 0.072816 s
Time spent in additional calculations                            : 3.617951 s
Total time spent                                                 : 18.263197 s

Statistics:
===========
< Parallel Direct Factorization with number of processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b >
             number of equations:           1576740
             number of non-zeros in A:      56993250
             number of non-zeros in A (%): 0.002292

             number of right-hand sides:    0

< Factors L and U >
             number of columns for each panel: 192
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    171522
             size of largest supernode:               10602
             number of non-zeros in L:                1336768222
             number of non-zeros in U:                1
             number of non-zeros in L+U:              1336768223
=== PARDISO is running in In-Core mode, because iparam(60)=1 and there is enough RAM for In-Core ===
Percentage of computed non-zeros for LL^T factorization
 0 %  1 %  2 %  3 %  4 %  5 %  6 %  7 %  8 %  9 %  10 %  11 %  12 %  13 %  14 %  15 %  16 %  17 %  18 %  19 %  20 %  21 %  22 %  23 %  24 %  25 %  26 %  27 %  28 %  29 %  30 %  31 %  32 %  33 %  34 %  35 %  36 %  37 %  38 %  39 %  40 %  41 %  42 %  43 %  44 %  45 %  46 %  47 %  48 %  49 %  50 %  51 %  52 %  53 %  54 %  55 %  56 %  57 %  58 %  59 %  60 %  61 %  62 %  63 %  64 %  65 %  66 %  67 %  68 %  69 %  70 %  71 %  72 %  73 %  74 %  75 %  76 %  77 %  78 %  80 %  81 %  82 %  83 %  85 %  86 %  87 %  88 %  89 %  91 %  92 %  93 %  94 %  95 %  96 %  97 %  98 %  99 %  100 %
MKLPARDISO::solve starts Mon Apr 21 14:50:23 2014


=== PARDISO: solving a symmetric positive definite system ===
Single-level factorization algorithm is turned ON


Summary: ( factorization phase )
================

Times:
======
Time spent in copying matrix to internal data structure (A to LU): 0.000001 s


Time spent in factorization step (numfct)                        : 36.840021 s
Time spent in allocation of internal data structures (malloc)    : 0.000220 s
Time spent in additional calculations                            : 0.000002 s
Total time spent                                                 : 36.840244 s

Statistics:
===========
< Parallel Direct Factorization with number of processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b >
             number of equations:           1576740
             number of non-zeros in A:      56993250
             number of non-zeros in A (%): 0.002292

             number of right-hand sides:    0

< Factors L and U >
             number of columns for each panel: 192
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    171522
             size of largest supernode:               10602
             number of non-zeros in L:                1336768222
             number of non-zeros in U:                1
             number of non-zeros in L+U:              1336768223
             gflop   for the numerical factorization: 4695.075325

             gflop/s for the numerical factorization: 127.444969


=== PARDISO: solving a symmetric positive definite system ===
MKLPARDISO::solve ends Mon Apr 21 14:50:25 2014

 

Summary: ( solution phase )
================

Times:
======
Time spent in direct solver at solve step (solve)                : 0.829546 s
Time spent in additional calculations                            : 1.730757 s
Total time spent                                                 : 2.560303 s

Statistics:
===========
< Parallel Direct Factorization with number of processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b >
             number of equations:           1576740
             number of non-zeros in A:      56993250
             number of non-zeros in A (%): 0.002292

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 192
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>


             number of supernodes:                    171522
             size of largest supernode:               10602
             number of non-zeros in L:                1336768222
             number of non-zeros in U:                1
             number of non-zeros in L+U:              1336768223
             gflop   for the numerical factorization: 4695.075325

             gflop/s for the numerical factorization: 127.444969

 

Hardware:

 Operating System:   Linux 2.6.32-431.3.1.el6.x86_64 (CentOS 6.3)

 Average Load:       1.01 1.39 2.14 (average over last 1min, 5min & 15min)
 CPU Type:           Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz (x86_64)
 CPU Addressability: 64bit
 CPU Count:          16 (8 cores/socket, Hyper-threading enabled)
 CPU Clock:          1200 MHz
 CPU Cache:          20480 KB (L2)
 Physical Memory:    64367 MB
 Swap Space:         48000 MB
 Graphics device:    NVIDIA Device 11fa (rev a1)
ls: cannot access /proc/ide: No such file or directory
 SCSI CD 0,0,0,0:    /dev/scd0 ()
 SCSI Disk 0,0,0,0:  /dev/sda  ()
()
 SCSI Disk 0,0,0,0:  /dev/sda  ()
()

 

0 Kudos
4 Replies
Gennady_F_Intel
Moderator
618 Views

thanks for the issue. We will check the issue on our side. Is that lp64 mode?

0 Kudos
xian-zhong_guous_cd-
618 Views

Yes. 

0 Kudos
xian-zhong_guous_cd-
618 Views

I tried the production version of mkl 11.2. Unfortunately, it still produces poor performance (see mkl11.2time.txt). Interestingly, mkl 11.2 beta produces good result (see mkl11.2betatime.txt).

0 Kudos
xian-zhong_guous_cd-
618 Views

Any progress on this issue?

0 Kudos
Reply