Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
6981 Discussions

Pardiso vs. MATLAB backslash comparison and fast forward backward substitution advice

utab
Beginner
1,035 Views
Dear all,

I have a 143748 X 143748 sparse matrix that I would like to solve with Pardiso. I could compile the example on pardiso usage and get the result. I used the default settings for iparm except that the array indices are 0 based since I am calling pardiso from C++. However there is a question from my side, when I did a timing comparison with MATLAB backslash which uses UMFPACK for unsymmetric matrices and sparse cholesky for symmetric ones. On this system MATLAB backslash beats pardiso by a factor of 2-3. I write 2-3 because MATLAB timings with tic toc are not that realiable however that should still give me an idea. Here is the log file that I got from a pardiso solve:

Reordering completed ...

Number of nonzeros in factors = 30513182

Number of factorization MFLOPS = 7193
Elapsed time in Analysis 4.24

Factorization completed ...
Elapsed time in Numerical Factorization 2.74
Elapsed time in solution 0.46

I used boost::timer class for timings, are there any functions to time in MKL by the way?

And the same opertion by use of Factorize package on MATLAB central file exchange which is created by Tim Davies, the father of UMFPACK.

factorization Elapsed time is 4.386812 seconds.
solution Elapsed time is 1.125806 seconds.

Direct backslash in MATLAB gives

Elapsed time is 3.407212 seconds.

Is there a way to improve these timings for Pardiso side, my matrix is a symmetric indefinite one. One more remark is that it is important for me to do a forward-backward substitution quickly because I am trying to use the factorized form of a stiffness matrix that is available from a previous solution as a preconditioner for cg iterations(which gives fast convergence in matlab however the bottleneck seems like the forward-backward solutions for the preconditioner usage.)

Last point is that I used gnu c++ compiler with -O3 flag, do you think intel c++ compiler can opmitize the process better?

Could you comment on the above points?

Best regards
Umut



0 Kudos
3 Replies
Konstantin_A_Intel
1,035 Views
Hi Umut,
The strange thing is that Analysis phase took more than Factorization and Solve phases in PARDISO. Usually factorization is a dominant part.
Could you please set msglvl=1 and post here what PARDISO reported?
To measure time in MKL you may use dsecnd() function, search here:
But please note that the very first call to dsecnd takes about 0.5 sec itself.
Regards,
Konstantin
0 Kudos
Gennady_F_Intel
Moderator
1,035 Views
Umit,
actually, the problem with dsecnd() function has been resolved in the latest 10.3.Update 3 which is already available. But the behaviour you are experiencing is an unexpected for Pardiso.
--Gennady
0 Kudos
utab
Beginner
1,035 Views
Hi,

Konstantin, here are the output, a little, long, with msglvl = 1, mtype = 1 and iparm[1] = 0 used phase=13 on all:

=== PARDISO is running in In-Core mode, because iparam(60)=0 ===

Percentage of computed non-zeros for LL^T factorization
0 % 1 % 2 % 3 % 4 % 5 % 6 % 7 % 8 % 9 % 10 % 11 % 12 % 13 % 14 % 15 % 16 % 17 % 18 % 19 % 20 % 21 % 22 % 23 % 24 % 25 % 26 % 27 % 28 % 29 % 30 % 31 % 32 % 33 % 34 % 35 % 36 % 37 % 38 % 39 % 40 % 41 % 42 % 43 % 44 % 45 % 46 % 47 % 48 % 49 % 50 % 51 % 52 % 53 % 54 % 55 % 56 % 57 % 58 % 59 % 60 % 61 % 62 % 63 % 64 % 65 % 66 % 67 % 68 % 69 % 70 % 71 % 73 % 74 % 75 % 76 % 77 % 78 % 79 % 80 % 81 % 82 % 83 % 84 % 85 % 86 % 87 % 88 % 89 % 90 % 91 % 92 % 93 % 94 % 95 % 96 % 97 % 98 % 99 % 100 %

================ PARDISO: solving a real struct. sym. system ================
The local (internal) PARDISO version is : 103000115
0-based array is turned ON
PARDISO double precision computation is turned ON
Minimum degree algorithm at reorder step is turned ON
Single-level factorization algorithm is turned ON
Scaling is turned ON


Summary PARDISO: ( reorder to solve )
================

Times:
======
Time spent in calculations of symmetric matrix portrait(fulladj): 0.029071 s
Time spent in reordering of the initial matrix(reorder) : 0.814425 s
Time spent in symbolic factorization(symbfct) : 0.375613 s
Time spent in data preparations for factorization(parlist) : 0.017267 s
Time spent in copying matrix to internal data structure(A to LU): 0.000002 s
Time spent in factorization step(numfct) : 3.503580 s
Time spent in direct solver at solve step (solve) : 0.227055 s
Time spent in allocation of internal data structures(malloc) : 0.549828 s
Time spent in additional calculations : 0.448929 s
Total time spent : 5.965770 s

Statistics:
===========
< Parallel Direct Factorization with #processors: > 2
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>
#equations: 143748
#non-zeros in A: 2934084
non-zeros in A (%): 0.014199

#right-hand sides: 1

< Factors L and U >
< Preprocessing with multiple minimum degree, tree height >
< Reduction for efficient parallel factorization >
#columns for each panel: 96
#independent subgraphs: 0
#supernodes: 55576
size of largest supernode: 1161
number of nonzeros in L 23266462
number of nonzeros in U 20072706
number of nonzeros in L+U 43339168
gflop for the numerical factorization: 18.084707

gflop/s for the numerical factorization: 5.161779


Reordering completed ...

Number of nonzeros in factors = 43339168

Number of factorization MFLOPS = 18084

WITH iparm[1]=2

=== PARDISO is running in In-Core mode, because iparam(60)=0 ===

Percentage of computed non-zeros for LL^T factorization
0 % 1 % 2 % 3 % 4 % 5 % 6 % 7 % 8 % 9 % 10 % 11 % 12 % 13 % 14 % 15 % 16 % 17 % 18 % 19 % 20 % 21 % 22 % 23 % 24 % 25 % 26 % 27 % 28 % 29 % 30 % 31 % 32 % 33 % 34 % 35 % 36 % 37 % 38 % 39 % 40 % 41 % 42 % 43 % 44 % 45 % 46 % 47 % 48 % 49 % 50 % 51 % 52 % 53 % 54 % 55 % 56 % 57 % 58 % 59 % 60 % 61 % 62 % 63 % 64 % 65 % 66 % 67 % 68 % 69 % 70 % 71 % 72 % 73 % 74 % 75 % 76 % 77 % 78 % 79 % 80 % 81 % 82 % 83 % 84 % 85 % 86 % 88 % 89 % 90 % 91 % 92 % 93 % 94 % 95 % 96 % 97 % 98 % 99 % 100 %

================ PARDISO: solving a real struct. sym. system ================
The local (internal) PARDISO version is : 103000115
0-based array is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Single-level factorization algorithm is turned ON
Scaling is turned ON


Summary PARDISO: ( reorder to solve )
================

Times:
======
Time spent in calculations of symmetric matrix portrait(fulladj): 0.022432 s
Time spent in reordering of the initial matrix(reorder) : 2.481204 s
Time spent in symbolic factorization(symbfct) : 0.231827 s
Time spent in data preparations for factorization(parlist) : 0.016098 s
Time spent in copying matrix to internal data structure(A to LU): 0.000001 s
Time spent in factorization step(numfct) : 1.920346 s
Time spent in direct solver at solve step (solve) : 0.161954 s
Time spent in allocation of internal data structures(malloc) : 0.422124 s
Time spent in additional calculations : 0.423500 s
Total time spent : 5.679486 s

Statistics:
===========
< Parallel Direct Factorization with #processors: > 2
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>
#equations: 143748
#non-zeros in A: 2934084
non-zeros in A (%): 0.014199

#right-hand sides: 1

< Factors L and U >
#columns for each panel: 96
#independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
#supernodes: 55598
size of largest supernode: 873
number of nonzeros in L 16718537
number of nonzeros in U 13794645
number of nonzeros in L+U 30513182
gflop for the numerical factorization: 7.193245

gflop/s for the numerical factorization: 3.745807


Reordering completed ...

Number of nonzeros in factors = 30513182

Number of factorization MFLOPS = 7193

WITH iparm[1] = 3;

=== PARDISO is running in In-Core mode, because iparam(60)=0 ===

Percentage of computed non-zeros for LL^T factorization
0 % 1 % 2 % 3 % 4 % 5 % 6 % 7 % 8 % 9 % 10 % 11 % 12 % 13 % 14 % 15 % 16 % 17 % 18 % 19 % 20 % 21 % 22 % 23 % 24 % 25 % 26 % 27 % 28 % 29 % 30 % 31 % 32 % 33 % 34 % 35 % 36 % 37 % 38 % 39 % 40 % 41 % 42 % 43 % 44 % 45 % 46 % 47 % 48 % 49 % 50 % 51 % 52 % 53 % 54 % 55 % 56 % 57 % 58 % 59 % 60 % 61 % 62 % 63 % 64 % 65 % 66 % 67 % 68 % 69 % 70 % 71 % 72 % 73 % 74 % 75 % 76 % 77 % 78 % 79 % 80 % 81 % 82 % 83 % 84 % 85 % 86 % 87 % 88 % 89 % 90 % 91 % 92 % 93 % 94 % 95 % 96 % 97 % 98 % 99 % 100 %

================ PARDISO: solving a real struct. sym. system ================
The local (internal) PARDISO version is : 103000115
0-based array is turned ON
PARDISO double precision computation is turned ON
Parallel METIS algorithm at reorder step is turned ON
Single-level factorization algorithm is turned ON
Scaling is turned ON


Summary PARDISO: ( reorder to solve )
================

Times:
======
Time spent in calculations of symmetric matrix portrait(fulladj): 0.022449 s
Time spent in reordering of the initial matrix(reorder) : 1.902797 s
Time spent in symbolic factorization(symbfct) : 0.284451 s
Time spent in data preparations for factorization(parlist) : 0.017533 s
Time spent in copying matrix to internal data structure(A to LU): 0.000001 s
Time spent in factorization step(numfct) : 1.905108 s
Time spent in direct solver at solve step (solve) : 0.155456 s
Time spent in allocation of internal data structures(malloc) : 0.422736 s
Time spent in additional calculations : 0.415412 s
Total time spent : 5.125943 s

Statistics:
===========
< Parallel Direct Factorization with #processors: > 2
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>
#equations: 143748
#non-zeros in A: 2934084
non-zeros in A (%): 0.014199

#right-hand sides: 1

< Factors L and U >
#columns for each panel: 96
#independent subgraphs: 0
#supernodes: 55599
size of largest supernode: 729
number of nonzeros in L 16698663
number of nonzeros in U 13786763
number of nonzeros in L+U 30485426
gflop for the numerical factorization: 7.221651

gflop/s for the numerical factorization: 3.790678


Reordering completed ...

Number of nonzeros in factors = 30485426

Number of factorization MFLOPS = 7221


0 Kudos
Reply