Showing results for

- Intel Community
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
- Pardiso vs. MATLAB backslash comparison and fast forward backward substitution advice

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Highlighted
##

Dear all,

I have a 143748 X 143748 sparse matrix that I would like to solve with Pardiso. I could compile the example on pardiso usage and get the result. I used the default settings for iparm except that the array indices are 0 based since I am calling pardiso from C++. However there is a question from my side, when I did a timing comparison with MATLAB backslash which uses UMFPACK for unsymmetric matrices and sparse cholesky for symmetric ones. On this system MATLAB backslash beats pardiso by a factor of 2-3. I write 2-3 because MATLAB timings with tic toc are not that realiable however that should still give me an idea. Here is the log file that I got from a pardiso solve:

Reordering completed ...

Number of nonzeros in factors = 30513182

Number of factorization MFLOPS = 7193

Elapsed time in Analysis 4.24

Factorization completed ...

Elapsed time in Numerical Factorization 2.74

Elapsed time in solution 0.46

I used boost::timer class for timings, are there any functions to time in MKL by the way?

And the same opertion by use of Factorize package on MATLAB central file exchange which is created by Tim Davies, the father of UMFPACK.

factorization Elapsed time is 4.386812 seconds.

solution Elapsed time is 1.125806 seconds.

Direct backslash in MATLAB gives

Elapsed time is 3.407212 seconds.

Is there a way to improve these timings for Pardiso side, my matrix is a symmetric indefinite one. One more remark is that it is important for me to do a forward-backward substitution quickly because I am trying to use the factorized form of a stiffness matrix that is available from a previous solution as a preconditioner for cg iterations(which gives fast convergence in matlab however the bottleneck seems like the forward-backward solutions for the preconditioner usage.)

Last point is that I used gnu c++ compiler with -O3 flag, do you think intel c++ compiler can opmitize the process better?

Could you comment on the above points?

Best regards

Umut

utab

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-10-2011
04:35 AM

60 Views

Pardiso vs. MATLAB backslash comparison and fast forward backward substitution advice

I have a 143748 X 143748 sparse matrix that I would like to solve with Pardiso. I could compile the example on pardiso usage and get the result. I used the default settings for iparm except that the array indices are 0 based since I am calling pardiso from C++. However there is a question from my side, when I did a timing comparison with MATLAB backslash which uses UMFPACK for unsymmetric matrices and sparse cholesky for symmetric ones. On this system MATLAB backslash beats pardiso by a factor of 2-3. I write 2-3 because MATLAB timings with tic toc are not that realiable however that should still give me an idea. Here is the log file that I got from a pardiso solve:

Reordering completed ...

Number of nonzeros in factors = 30513182

Number of factorization MFLOPS = 7193

Elapsed time in Analysis 4.24

Factorization completed ...

Elapsed time in Numerical Factorization 2.74

Elapsed time in solution 0.46

I used boost::timer class for timings, are there any functions to time in MKL by the way?

And the same opertion by use of Factorize package on MATLAB central file exchange which is created by Tim Davies, the father of UMFPACK.

factorization Elapsed time is 4.386812 seconds.

solution Elapsed time is 1.125806 seconds.

Direct backslash in MATLAB gives

Elapsed time is 3.407212 seconds.

Is there a way to improve these timings for Pardiso side, my matrix is a symmetric indefinite one. One more remark is that it is important for me to do a forward-backward substitution quickly because I am trying to use the factorized form of a stiffness matrix that is available from a previous solution as a preconditioner for cg iterations(which gives fast convergence in matlab however the bottleneck seems like the forward-backward solutions for the preconditioner usage.)

Last point is that I used gnu c++ compiler with -O3 flag, do you think intel c++ compiler can opmitize the process better?

Could you comment on the above points?

Best regards

Umut

3 Replies

Highlighted
##

Hi Umut,

Konstantin_A_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-10-2011
10:12 PM

60 Views

The strange thing is that Analysis phase took more than Factorization and Solve phases in PARDISO. Usually factorization is a dominant part.

Could you please set msglvl=1 and post here what PARDISO reported?

To measure time in MKL you may use dsecnd() function, search here:

But please note that the very first call to dsecnd takes about 0.5 sec itself.

Regards,

Konstantin

Highlighted
##

Gennady_F_Intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-11-2011
12:04 AM

60 Views

Umit,

actually, the problem with dsecnd() function has been resolved in the latest 10.3.Update 3 which is already available. But the behaviour you are experiencing is an unexpected for Pardiso.

--Gennady

Highlighted
##

Hi,

Konstantin, here are the output, a little, long, with msglvl = 1, mtype = 1 and iparm[1] = 0 used phase=13 on all:

=== PARDISO is running in In-Core mode, because iparam(60)=0 ===

Percentage of computed non-zeros for LL^T factorization

0 % 1 % 2 % 3 % 4 % 5 % 6 % 7 % 8 % 9 % 10 % 11 % 12 % 13 % 14 % 15 % 16 % 17 % 18 % 19 % 20 % 21 % 22 % 23 % 24 % 25 % 26 % 27 % 28 % 29 % 30 % 31 % 32 % 33 % 34 % 35 % 36 % 37 % 38 % 39 % 40 % 41 % 42 % 43 % 44 % 45 % 46 % 47 % 48 % 49 % 50 % 51 % 52 % 53 % 54 % 55 % 56 % 57 % 58 % 59 % 60 % 61 % 62 % 63 % 64 % 65 % 66 % 67 % 68 % 69 % 70 % 71 % 73 % 74 % 75 % 76 % 77 % 78 % 79 % 80 % 81 % 82 % 83 % 84 % 85 % 86 % 87 % 88 % 89 % 90 % 91 % 92 % 93 % 94 % 95 % 96 % 97 % 98 % 99 % 100 %

================ PARDISO: solving a real struct. sym. system ================

The local (internal) PARDISO version is : 103000115

0-based array is turned ON

PARDISO double precision computation is turned ON

Minimum degree algorithm at reorder step is turned ON

Single-level factorization algorithm is turned ON

Scaling is turned ON

Summary PARDISO: ( reorder to solve )

================

Times:

======

Time spent in calculations of symmetric matrix portrait(fulladj): 0.029071 s

Time spent in reordering of the initial matrix(reorder) : 0.814425 s

Time spent in symbolic factorization(symbfct) : 0.375613 s

Time spent in data preparations for factorization(parlist) : 0.017267 s

Time spent in copying matrix to internal data structure(A to LU): 0.000002 s

Time spent in factorization step(numfct) : 3.503580 s

Time spent in direct solver at solve step (solve) : 0.227055 s

Time spent in allocation of internal data structures(malloc) : 0.549828 s

Time spent in additional calculations : 0.448929 s

Total time spent : 5.965770 s

Statistics:

===========

< Parallel Direct Factorization with #processors: > 2

< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>

#equations: 143748

#non-zeros in A: 2934084

non-zeros in A (%): 0.014199

#right-hand sides: 1

< Factors L and U >

< Preprocessing with multiple minimum degree, tree height >

< Reduction for efficient parallel factorization >

#columns for each panel: 96

#independent subgraphs: 0

#supernodes: 55576

size of largest supernode: 1161

number of nonzeros in L 23266462

number of nonzeros in U 20072706

number of nonzeros in L+U 43339168

gflop for the numerical factorization: 18.084707

gflop/s for the numerical factorization: 5.161779

Reordering completed ...

Number of nonzeros in factors = 43339168

Number of factorization MFLOPS = 18084

WITH iparm[1]=2

=== PARDISO is running in In-Core mode, because iparam(60)=0 ===

Percentage of computed non-zeros for LL^T factorization

0 % 1 % 2 % 3 % 4 % 5 % 6 % 7 % 8 % 9 % 10 % 11 % 12 % 13 % 14 % 15 % 16 % 17 % 18 % 19 % 20 % 21 % 22 % 23 % 24 % 25 % 26 % 27 % 28 % 29 % 30 % 31 % 32 % 33 % 34 % 35 % 36 % 37 % 38 % 39 % 40 % 41 % 42 % 43 % 44 % 45 % 46 % 47 % 48 % 49 % 50 % 51 % 52 % 53 % 54 % 55 % 56 % 57 % 58 % 59 % 60 % 61 % 62 % 63 % 64 % 65 % 66 % 67 % 68 % 69 % 70 % 71 % 72 % 73 % 74 % 75 % 76 % 77 % 78 % 79 % 80 % 81 % 82 % 83 % 84 % 85 % 86 % 88 % 89 % 90 % 91 % 92 % 93 % 94 % 95 % 96 % 97 % 98 % 99 % 100 %

================ PARDISO: solving a real struct. sym. system ================

The local (internal) PARDISO version is : 103000115

0-based array is turned ON

PARDISO double precision computation is turned ON

METIS algorithm at reorder step is turned ON

Single-level factorization algorithm is turned ON

Scaling is turned ON

Summary PARDISO: ( reorder to solve )

================

Times:

======

Time spent in calculations of symmetric matrix portrait(fulladj): 0.022432 s

Time spent in reordering of the initial matrix(reorder) : 2.481204 s

Time spent in symbolic factorization(symbfct) : 0.231827 s

Time spent in data preparations for factorization(parlist) : 0.016098 s

Time spent in copying matrix to internal data structure(A to LU): 0.000001 s

Time spent in factorization step(numfct) : 1.920346 s

Time spent in direct solver at solve step (solve) : 0.161954 s

Time spent in allocation of internal data structures(malloc) : 0.422124 s

Time spent in additional calculations : 0.423500 s

Total time spent : 5.679486 s

Statistics:

===========

< Parallel Direct Factorization with #processors: > 2

< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>

#equations: 143748

#non-zeros in A: 2934084

non-zeros in A (%): 0.014199

#right-hand sides: 1

< Factors L and U >

#columns for each panel: 96

#independent subgraphs: 0

< Preprocessing with state of the art partitioning metis>

#supernodes: 55598

size of largest supernode: 873

number of nonzeros in L 16718537

number of nonzeros in U 13794645

number of nonzeros in L+U 30513182

gflop for the numerical factorization: 7.193245

gflop/s for the numerical factorization: 3.745807

Reordering completed ...

Number of nonzeros in factors = 30513182

Number of factorization MFLOPS = 7193

WITH iparm[1] = 3;

=== PARDISO is running in In-Core mode, because iparam(60)=0 ===

Percentage of computed non-zeros for LL^T factorization

0 % 1 % 2 % 3 % 4 % 5 % 6 % 7 % 8 % 9 % 10 % 11 % 12 % 13 % 14 % 15 % 16 % 17 % 18 % 19 % 20 % 21 % 22 % 23 % 24 % 25 % 26 % 27 % 28 % 29 % 30 % 31 % 32 % 33 % 34 % 35 % 36 % 37 % 38 % 39 % 40 % 41 % 42 % 43 % 44 % 45 % 46 % 47 % 48 % 49 % 50 % 51 % 52 % 53 % 54 % 55 % 56 % 57 % 58 % 59 % 60 % 61 % 62 % 63 % 64 % 65 % 66 % 67 % 68 % 69 % 70 % 71 % 72 % 73 % 74 % 75 % 76 % 77 % 78 % 79 % 80 % 81 % 82 % 83 % 84 % 85 % 86 % 87 % 88 % 89 % 90 % 91 % 92 % 93 % 94 % 95 % 96 % 97 % 98 % 99 % 100 %

================ PARDISO: solving a real struct. sym. system ================

The local (internal) PARDISO version is : 103000115

0-based array is turned ON

PARDISO double precision computation is turned ON

Parallel METIS algorithm at reorder step is turned ON

Single-level factorization algorithm is turned ON

Scaling is turned ON

Summary PARDISO: ( reorder to solve )

================

Times:

======

Time spent in calculations of symmetric matrix portrait(fulladj): 0.022449 s

Time spent in reordering of the initial matrix(reorder) : 1.902797 s

Time spent in symbolic factorization(symbfct) : 0.284451 s

Time spent in data preparations for factorization(parlist) : 0.017533 s

Time spent in copying matrix to internal data structure(A to LU): 0.000001 s

Time spent in factorization step(numfct) : 1.905108 s

Time spent in direct solver at solve step (solve) : 0.155456 s

Time spent in allocation of internal data structures(malloc) : 0.422736 s

Time spent in additional calculations : 0.415412 s

Total time spent : 5.125943 s

Statistics:

===========

< Parallel Direct Factorization with #processors: > 2

< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>

#equations: 143748

#non-zeros in A: 2934084

non-zeros in A (%): 0.014199

#right-hand sides: 1

< Factors L and U >

#columns for each panel: 96

#independent subgraphs: 0

#supernodes: 55599

size of largest supernode: 729

number of nonzeros in L 16698663

number of nonzeros in U 13786763

number of nonzeros in L+U 30485426

gflop for the numerical factorization: 7.221651

gflop/s for the numerical factorization: 3.790678

Reordering completed ...

Number of nonzeros in factors = 30485426

Number of factorization MFLOPS = 7221

utab

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-12-2011
02:57 AM

60 Views

Konstantin, here are the output, a little, long, with msglvl = 1, mtype = 1 and iparm[1] = 0 used phase=13 on all:

=== PARDISO is running in In-Core mode, because iparam(60)=0 ===

Percentage of computed non-zeros for LL^T factorization

0 % 1 % 2 % 3 % 4 % 5 % 6 % 7 % 8 % 9 % 10 % 11 % 12 % 13 % 14 % 15 % 16 % 17 % 18 % 19 % 20 % 21 % 22 % 23 % 24 % 25 % 26 % 27 % 28 % 29 % 30 % 31 % 32 % 33 % 34 % 35 % 36 % 37 % 38 % 39 % 40 % 41 % 42 % 43 % 44 % 45 % 46 % 47 % 48 % 49 % 50 % 51 % 52 % 53 % 54 % 55 % 56 % 57 % 58 % 59 % 60 % 61 % 62 % 63 % 64 % 65 % 66 % 67 % 68 % 69 % 70 % 71 % 73 % 74 % 75 % 76 % 77 % 78 % 79 % 80 % 81 % 82 % 83 % 84 % 85 % 86 % 87 % 88 % 89 % 90 % 91 % 92 % 93 % 94 % 95 % 96 % 97 % 98 % 99 % 100 %

================ PARDISO: solving a real struct. sym. system ================

The local (internal) PARDISO version is : 103000115

0-based array is turned ON

PARDISO double precision computation is turned ON

Minimum degree algorithm at reorder step is turned ON

Single-level factorization algorithm is turned ON

Scaling is turned ON

Summary PARDISO: ( reorder to solve )

================

Times:

======

Time spent in calculations of symmetric matrix portrait(fulladj): 0.029071 s

Time spent in reordering of the initial matrix(reorder) : 0.814425 s

Time spent in symbolic factorization(symbfct) : 0.375613 s

Time spent in data preparations for factorization(parlist) : 0.017267 s

Time spent in copying matrix to internal data structure(A to LU): 0.000002 s

Time spent in factorization step(numfct) : 3.503580 s

Time spent in direct solver at solve step (solve) : 0.227055 s

Time spent in allocation of internal data structures(malloc) : 0.549828 s

Time spent in additional calculations : 0.448929 s

Total time spent : 5.965770 s

Statistics:

===========

< Parallel Direct Factorization with #processors: > 2

< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>

#equations: 143748

#non-zeros in A: 2934084

non-zeros in A (%): 0.014199

#right-hand sides: 1

< Factors L and U >

< Preprocessing with multiple minimum degree, tree height >

< Reduction for efficient parallel factorization >

#columns for each panel: 96

#independent subgraphs: 0

#supernodes: 55576

size of largest supernode: 1161

number of nonzeros in L 23266462

number of nonzeros in U 20072706

number of nonzeros in L+U 43339168

gflop for the numerical factorization: 18.084707

gflop/s for the numerical factorization: 5.161779

Reordering completed ...

Number of nonzeros in factors = 43339168

Number of factorization MFLOPS = 18084

WITH iparm[1]=2

=== PARDISO is running in In-Core mode, because iparam(60)=0 ===

Percentage of computed non-zeros for LL^T factorization

0 % 1 % 2 % 3 % 4 % 5 % 6 % 7 % 8 % 9 % 10 % 11 % 12 % 13 % 14 % 15 % 16 % 17 % 18 % 19 % 20 % 21 % 22 % 23 % 24 % 25 % 26 % 27 % 28 % 29 % 30 % 31 % 32 % 33 % 34 % 35 % 36 % 37 % 38 % 39 % 40 % 41 % 42 % 43 % 44 % 45 % 46 % 47 % 48 % 49 % 50 % 51 % 52 % 53 % 54 % 55 % 56 % 57 % 58 % 59 % 60 % 61 % 62 % 63 % 64 % 65 % 66 % 67 % 68 % 69 % 70 % 71 % 72 % 73 % 74 % 75 % 76 % 77 % 78 % 79 % 80 % 81 % 82 % 83 % 84 % 85 % 86 % 88 % 89 % 90 % 91 % 92 % 93 % 94 % 95 % 96 % 97 % 98 % 99 % 100 %

================ PARDISO: solving a real struct. sym. system ================

The local (internal) PARDISO version is : 103000115

0-based array is turned ON

PARDISO double precision computation is turned ON

METIS algorithm at reorder step is turned ON

Single-level factorization algorithm is turned ON

Scaling is turned ON

Summary PARDISO: ( reorder to solve )

================

Times:

======

Time spent in calculations of symmetric matrix portrait(fulladj): 0.022432 s

Time spent in reordering of the initial matrix(reorder) : 2.481204 s

Time spent in symbolic factorization(symbfct) : 0.231827 s

Time spent in data preparations for factorization(parlist) : 0.016098 s

Time spent in copying matrix to internal data structure(A to LU): 0.000001 s

Time spent in factorization step(numfct) : 1.920346 s

Time spent in direct solver at solve step (solve) : 0.161954 s

Time spent in allocation of internal data structures(malloc) : 0.422124 s

Time spent in additional calculations : 0.423500 s

Total time spent : 5.679486 s

Statistics:

===========

< Parallel Direct Factorization with #processors: > 2

< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>

#equations: 143748

#non-zeros in A: 2934084

non-zeros in A (%): 0.014199

#right-hand sides: 1

< Factors L and U >

#columns for each panel: 96

#independent subgraphs: 0

< Preprocessing with state of the art partitioning metis>

#supernodes: 55598

size of largest supernode: 873

number of nonzeros in L 16718537

number of nonzeros in U 13794645

number of nonzeros in L+U 30513182

gflop for the numerical factorization: 7.193245

gflop/s for the numerical factorization: 3.745807

Reordering completed ...

Number of nonzeros in factors = 30513182

Number of factorization MFLOPS = 7193

WITH iparm[1] = 3;

=== PARDISO is running in In-Core mode, because iparam(60)=0 ===

Percentage of computed non-zeros for LL^T factorization

0 % 1 % 2 % 3 % 4 % 5 % 6 % 7 % 8 % 9 % 10 % 11 % 12 % 13 % 14 % 15 % 16 % 17 % 18 % 19 % 20 % 21 % 22 % 23 % 24 % 25 % 26 % 27 % 28 % 29 % 30 % 31 % 32 % 33 % 34 % 35 % 36 % 37 % 38 % 39 % 40 % 41 % 42 % 43 % 44 % 45 % 46 % 47 % 48 % 49 % 50 % 51 % 52 % 53 % 54 % 55 % 56 % 57 % 58 % 59 % 60 % 61 % 62 % 63 % 64 % 65 % 66 % 67 % 68 % 69 % 70 % 71 % 72 % 73 % 74 % 75 % 76 % 77 % 78 % 79 % 80 % 81 % 82 % 83 % 84 % 85 % 86 % 87 % 88 % 89 % 90 % 91 % 92 % 93 % 94 % 95 % 96 % 97 % 98 % 99 % 100 %

================ PARDISO: solving a real struct. sym. system ================

The local (internal) PARDISO version is : 103000115

0-based array is turned ON

PARDISO double precision computation is turned ON

Parallel METIS algorithm at reorder step is turned ON

Single-level factorization algorithm is turned ON

Scaling is turned ON

Summary PARDISO: ( reorder to solve )

================

Times:

======

Time spent in calculations of symmetric matrix portrait(fulladj): 0.022449 s

Time spent in reordering of the initial matrix(reorder) : 1.902797 s

Time spent in symbolic factorization(symbfct) : 0.284451 s

Time spent in data preparations for factorization(parlist) : 0.017533 s

Time spent in copying matrix to internal data structure(A to LU): 0.000001 s

Time spent in factorization step(numfct) : 1.905108 s

Time spent in direct solver at solve step (solve) : 0.155456 s

Time spent in allocation of internal data structures(malloc) : 0.422736 s

Time spent in additional calculations : 0.415412 s

Total time spent : 5.125943 s

Statistics:

===========

< Parallel Direct Factorization with #processors: > 2

< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>

#equations: 143748

#non-zeros in A: 2934084

non-zeros in A (%): 0.014199

#right-hand sides: 1

< Factors L and U >

#columns for each panel: 96

#independent subgraphs: 0

#supernodes: 55599

size of largest supernode: 729

number of nonzeros in L 16698663

number of nonzeros in U 13786763

number of nonzeros in L+U 30485426

gflop for the numerical factorization: 7.221651

gflop/s for the numerical factorization: 3.790678

Reordering completed ...

Number of nonzeros in factors = 30485426

Number of factorization MFLOPS = 7221

For more complete information about compiler optimizations, see our Optimization Notice.