Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

PARDISO unsymmetric only using 1 processor

xian-zhong_guous_cd-
380 Views
I have a SPD matrix. If I solve it as SPD (type=2), PARDISO uses 8 processors. But if I solve it as unsymmetric, PARDISO only uses 1 processor. (both logs are enclosed)

Symmetric log:

N=119433

ooc_max_core_size got by Env = 20000

The file ./pardiso_ooc.cfg was not opened

=== PARDISO is running in In-Core mode, because iparam(60)=1 and there is enough RAM for In-Core ===

================ PARDISO: solving a symm. posit. def. system ================

Summary PARDISO: ( reorder to reorder )

================

Times:

======

Time fulladj: 0.014256 s

Time reorder: 0.927871 s

Time symbfct: 0.102223 s

Time parlist: 0.013512 s

Time malloc : 0.018803 s

Time total : 1.149216 s total - sum: 0.072551 s

Statistics:

===========

< Parallel Direct Factorization with #processors: > 8

< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>

#equations: 119433

#non-zeros in A: 821520

non-zeros in A (%): 0.005759

#right-hand sides: 1

< Factors L and U >

#columns for each panel: 64

#independent subgraphs: 0

< Preprocessing with state of the art partitioning metis>

#supernodes: 55725

size of largest supernode: 605

number of nonzeros in L 14908033

number of nonzeros in U 1

number of nonzeros in L+U 14908034

Percentage of computed non-zeros for LL^T factorization

0 % 1 % 2 % 3 % 5 % 6 % 7 % 8 % 10 % 11 % 12 % 13 % 14 % 15 % 16 % 17 % 18 % 19 % 20 % 23 % 24 % 25 % 26 % 27 % 29 % 30 % 31 % 32 % 35 % 36 % 37 % 39 % 40 % 41 % 42 % 44 % 45 % 46 % 47 % 48 % 50 % 51 % 52 % 53 % 54 % 56 % 58 % 59 % 60 % 61 % 62 % 64 % 65 % 66 % 67 % 68 % 69 % 72 % 74 % 75 % 76 % 77 % 79 % 80 % 81 % 83 % 84 % 85 % 86 % 87 % 88 % 89 % 90 % 91 % 92 % 93 % 95 % 98 % 99 % 100 %

================ PARDISO: solving a symm. posit. def. system ================

Summary PARDISO: ( factorize to factorize )

================

Times:

======

Time A to LU: 0.000000 s

Time numfct : 0.330881 s

Time malloc : 0.000040 s

Time total : 0.330968 s total - sum: 0.000047 s

Statistics:

===========

< Parallel Direct Factorization with #processors: > 8

< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>

#equations: 119433

#non-zeros in A: 821520

non-zeros in A (%): 0.005759

#right-hand sides: 1

< Factors L and U >

#columns for each panel: 64

#independent subgraphs: 0

< Preprocessing with state of the art partitioning metis>

#supernodes: 55725

size of largest supernode: 605

number of nonzeros in L 14908033

number of nonzeros in U 1

number of nonzeros in L+U 14908034

gflop for the numerical factorization: 5.261862

gflop/s for the numerical factorization: 15.902588

================ PARDISO: solving a symm. posit. def. system ================

Summary PARDISO: ( solve to solve )

================

Times:

======

Time solve : 0.106138 s

Time total : 0.332094 s total - sum: 0.225956 s

Statistics:

===========

< Parallel Direct Factorization with #processors: > 8

< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>

#equations: 119433

#non-zeros in A: 821520

non-zeros in A (%): 0.005759

#right-hand sides: 1

< Factors L and U >

#columns for each panel: 64

#independent subgraphs: 0

< Preprocessing with state of the art partitioning metis>

#supernodes: 55725

size of largest supernode: 605

number of nonzeros in L 14908033

number of nonzeros in U 1

number of nonzeros in L+U 14908034

gflop for the numerical factorization: 5.261862

gflop/s for the numerical factorization: 15.902588

Unsymmetric log:

N=119433

ooc_max_core_size got by Env = 20000

The file ./pardiso_ooc.cfg was not opened

=== PARDISO is running in In-Core mode, because iparam(60)=1 and there is enough RAM for In-Core ===

================ PARDISO: solving a real nonsymmetric system ================

Summary PARDISO: ( reorder to reorder )

================

Times:

======

Time fulladj: 0.054908 s

Time reorder: 0.915838 s

Time symbfct: 0.106507 s

Time malloc : 0.153521 s

Time total : 1.374198 s total - sum: 0.143423 s

Statistics:

===========

< Parallel Direct Factorization with #processors: > 1

< Numerical Factorization with Level-3 BLAS performance >

< Linear system Ax = b>

#equations: 119433

#non-zeros in A: 1523607

non-zeros in A (%): 0.010681

#right-hand sides: 1

< Factors L and U >

#columns for each panel: 128

#independent subgraphs: 0

< Preprocessing with state of the art partitioning metis>

#supernodes: 55613

size of largest supernode: 605

number of nonzeros in L 15170945

number of nonzeros in U 12859458

number of nonzeros in L+U 28030403

Percentage of computed non-zeros for LL^T factorization

0 % 1 % 2 % 3 % 4 % 5 % 6 % 7 % 8 % 9 % 10 % 11 % 12 % 13 % 14 % 15 % 16 % 17 % 18 % 19 % 20 % 21 % 22 % 23 % 24 % 25 % 26 % 27 % 28 % 29 % 30 % 31 % 32 % 33 % 34 % 35 % 36 % 37 % 38 % 39 % 40 % 41 % 42 % 43 % 44 % 45 % 46 % 47 % 48 % 49 % 50 % 51 % 52 % 53 % 54 % 55 % 56 % 57 % 58 % 59 % 60 % 61 % 62 % 63 % 64 % 65 % 66 % 67 % 68 % 69 % 70 % 71 % 72 % 73 % 74 % 75 % 76 % 77 % 78 % 79 % 80 % 81 % 82 % 83 % 84 % 85 % 86 % 87 % 88 % 89 % 90 % 91 % 92 % 93 % 94 % 95 % 96 % 97 % 98 % 99 % 100 %

================ PARDISO: solving a real nonsymmetric system ================

Summary PARDISO: ( factorize to factorize )

================

Times:

======

Time A to LU: 0.000000 s

Time numfct : 2.065194 s

Time malloc : 0.000037 s

Time total : 2.065278 s total - sum: 0.000047 s

Statistics:

===========

< Parallel Direct Factorization with #processors: > 1

< Numerical Factorization with Level-3 BLAS performance >

< Linear system Ax = b>

#equations: 119433

#non-zeros in A: 1523607

non-zeros in A (%): 0.010681

#right-hand sides: 1

< Factors L and U >

#columns for each panel: 128

#independent subgraphs: 0

< Preprocessing with state of the art partitioning metis>

#supernodes: 55613

size of largest supernode: 605

number of nonzeros in L 15170945

number of nonzeros in U 12859458

number of nonzeros in L+U 28030403

gflop for the numerical factorization: 9.607209

gflop/s for the numerical factorization: 4.651965

================ PARDISO: solving a real nonsymmetric system ================

Summary PARDISO: ( solve to solve )

================

Times:

======

Time solve : 0.128349 s

Time total : 0.409732 s total - sum: 0.281383 s

Statistics:

===========

< Parallel Direct Factorization with #processors: > 1

< Numerical Factorization with Level-3 BLAS performance >

< Linear system Ax = b>

#equations: 119433

#non-zeros in A: 1523607

non-zeros in A (%): 0.010681

#right-hand sides: 1

< Factors L and U >

#columns for each panel: 128

#independent subgraphs: 0

< Preprocessing with state of the art partitioning metis>

#supernodes: 55613

size of largest supernode: 605

number of nonzeros in L 15170945

number of nonzeros in U 12859458

number of nonzeros in L+U 28030403

gflop for the numerical factorization: 9.607209

gflop/s for the numerical factorization: 4.651965

0 Kudos
6 Replies
Konstantin_A_Intel
380 Views
Hi,
It seems you're using rather old MKL version where only symmetrical type of matrices was parallelized for OOC case. In fact, latest 10.2 and 10.3 versions support parallelism for all types.
Which version did you use?
Regards,
Konstantin
0 Kudos
Gennady_F_Intel
Moderator
380 Views
more precisely - see the version 10.3 Bug fixes. The problem was market as

DPD200084190PARDISO OOC will now run parallel code for all supported matrix types
0 Kudos
xian-zhong_guous_cd-
380 Views
Here is the version info:
Major version: 10
Minor version: 2
Update version: 1
Product status: Product
Build: n20090616
Processor optimization: Intel Core 2 Duo Processor

I tested a smaller problem using in-core and I am still using one processor:

calling PARDISO:N=52116
ooc_max_core_size got by Env = 20000
The file ./pardiso_ooc.cfg was not opened

=== PARDISO is running in In-Core mode, because iparam(60)=1 and there is enough RAM for In-Core ===


================ PARDISO: solving a real nonsymmetric system ================


Summary PARDISO: ( reorder to reorder )
================

Times:
======
Time fulladj: 0.014744 s
Time reorder: 0.321509 s
Time symbfct: 0.047079 s
Time malloc : 0.057157 s
Time total : 0.485403 s total - sum: 0.044914 s

Statistics:
===========
< Parallel Direct Factorization with #processors: > 1
< Numerical Factorization with Level-3 BLAS performance >

< Linear system Ax = b>
#equations: 52116
#non-zeros in A: 525292
non-zeros in A (%): 0.019340

#right-hand sides: 1

< Factors L and U >
#columns for each panel: 128
#independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
#supernodes: 26576
size of largest supernode: 511
number of nonzeros in L 5768628
number of nonzeros in U 4786806
number of nonzeros in L+U 10555434
Percentage of computed non-zeros for LL^T factorization
0 % 1 % 2 % 3 % 4 % 5 % 6 % 7 % 8 % 9 % 10 % 11 % 12 % 13 % 14 % 15 % 16 % 17 % 18 % 20 % 21 % 22 % 23 % 24 % 25 % 26 % 27 % 28 % 29 % 30 % 31 % 32 % 33 % 34 % 35 % 36 % 37 % 38 % 39 % 40 % 41 % 42 % 43 % 45 % 46 % 47 % 48 % 49 % 50 % 51 % 52 % 53 % 54 % 55 % 56 % 57 % 58 % 59 % 60 % 61 % 62 % 63 % 64 % 65 % 66 % 67 % 68 % 69 % 71 % 72 % 73 % 74 % 75 % 76 % 77 % 78 % 79 % 80 % 81 % 82 % 84 % 85 % 86 % 87 % 88 % 89 % 90 % 91 % 92 % 93 % 94 % 95 % 96 % 97 % 98 % 99 % 100 %

================ PARDISO: solving a real nonsymmetric system ================


Summary PARDISO: ( factorize to factorize )
================

Times:
======
Time A to LU: 0.000000 s
Time numfct : 0.683435 s
Time malloc : 0.000474 s
Time total : 0.683954 s total - sum: 0.000045 s

Statistics:
===========
< Parallel Direct Factorization with #processors: > 1
< Numerical Factorization with Level-3 BLAS performance >

< Linear system Ax = b>
#equations: 52116
#non-zeros in A: 525292
non-zeros in A (%): 0.019340

#right-hand sides: 1

< Factors L and U >
#columns for each panel: 128
#independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
#supernodes: 26576
size of largest supernode: 511
number of nonzeros in L 5768628
number of nonzeros in U 4786806
number of nonzeros in L+U 10555434
gflop for the numerical factorization: 2.780741

gflop/s for the numerical factorization: 4.068774


================ PARDISO: solving a real nonsymmetric system ================


Summary PARDISO: ( solve to solve )
================

Times:
======
Time solve : 0.052398 s
Time total : 0.159472 s total - sum: 0.107074 s

Statistics:
===========
< Parallel Direct Factorization with #processors: > 1
< Numerical Factorization with Level-3 BLAS performance >

< Linear system Ax = b>
#equations: 52116
#non-zeros in A: 525292
non-zeros in A (%): 0.019340

#right-hand sides: 1

< Factors L and U >
#columns for each panel: 128
#independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
#supernodes: 26576
size of largest supernode: 511
number of nonzeros in L 5768628
number of nonzeros in U 4786806
number of nonzeros in L+U 10555434
gflop for the numerical factorization: 2.780741

gflop/s for the numerical factorization: 4.068774
0 Kudos
xian-zhong_guous_cd-
380 Views
I downloaded 10.3. However, install.sh tells me 10.3 is already installed. I have only 10.2 installed. Why does install.sh tell me 10.3? I assume I need uninstall 10.2 anyway, right?

Initializing, please wait...
--------------------------------------------------------------------------------
The Intel Math Kernel Library 10.3 Update 1 for Linux* is already installed.

If you want to reinstall the Intel Math Kernel Library 10.3 Update 1 for
Linux*
please uninstall current version and run install script again.
--------------------------------------------------------------------------------
Press "Enter" key to quit:

0 Kudos
Gennady_F_Intel
Moderator
380 Views
Is that commercial/evaluation/noncommercial or whatever version of MKL?
0 Kudos
xian-zhong_guous_cd-
380 Views
After I update MKL to 10.3, issue resolved. Thanks.
0 Kudos
Reply