Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library
- mkl_pardiso time consumption during solution phase

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Hazra__Dhiraj_Kumar

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-05-2015
10:13 AM

96 Views

mkl_pardiso time consumption during solution phase

Hello,

I am working with a sparse matrix A and trying to solve <Ax =b>. b is an array of 1 million and it is in double precision. The matrix is sufficiently sparse (number of non-zeros in A (%): 0.000694). I am using mkl_pardiso.f90 in 4 nodes. The factorization is taking 10 min to complete but I was expecting that the solution phase shall not take longer but it is more than one hour and it is still in that phase. Is this normal ? I provide the output till now (before solution phase). Can anybody please share any ideas in order to improve this situation ? Any help will be much appreciated.

=== PARDISO: solving a real nonsymmetric system === 1-based array indexing is turned ON PARDISO double precision computation is turned ON Parallel METIS algorithm at reorder step is turned ON Scaling is turned ON Matching is turned ON Summary: ( reordering phase ) ================ Times: ====== Time spent in calculations of symmetric matrix portrait (fulladj): 0.081618 s Time spent in reordering of the initial matrix (reorder) : 1.912543 s Time spent in symbolic factorization (symbfct) : 1.559847 s Time spent in data preparations for factorization (parlist) : 0.075198 s Time spent in allocation of internal data structures (malloc) : 0.340503 s Time spent in additional calculations : 0.220093 s Total time spent : 4.189802 s Statistics: =========== Parallel Direct Factorization is running on 4 OpenMP < Linear system Ax = b > number of equations: 1000000 number of non-zeros in A: 6940000 number of non-zeros in A (%): 0.000694 number of right-hand sides: 1 < Factors L and U > number of columns for each panel: 96 number of independent subgraphs: 0 number of supernodes: 654600 size of largest supernode: 11181 number of non-zeros in L: 782333465 number of non-zeros in U: 766664027 number of non-zeros in L+U: 1548997492 Reordering completed ... Number of nonzeros in factors = 1548997492 Number of factorization MFLOPS = 10596105 === PARDISO is running in In-Core mode, because iparam(60)=0 === Percentage of computed non-zeros for LL^T factorization 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 === PARDISO: solving a real nonsymmetric system === Single-level factorization algorithm is turned ON Summary: ( factorization phase ) ================ Times: ====== Time spent in copying matrix to internal data structure (A to LU): 0.000000 s Time spent in factorization step (numfct) : 953.925513 s Time spent in allocation of internal data structures (malloc) : 0.170726 s Time spent in additional calculations : 0.089973 s Total time spent : 954.186212 s Statistics: =========== Parallel Direct Factorization is running on 4 OpenMP < Linear system Ax = b > number of equations: 1000000 number of non-zeros in A: 6940000 number of non-zeros in A (%): 0.000694 number of right-hand sides: 1 < Factors L and U > number of columns for each panel: 96 number of independent subgraphs: 0 number of supernodes: 654600 size of largest supernode: 11181 number of non-zeros in L: 782333465 number of non-zeros in U: 766664027 number of non-zeros in L+U: 1548997492 gflop for the numerical factorization: 10596.105218 gflop/s for the numerical factorization: 11.107896 Factorization completed ...

Thanks,

Dhiraj

Link Copied

7 Replies

mecej4

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-05-2015
10:44 AM

96 Views

Hazra__Dhiraj_Kumar

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-05-2015
11:24 PM

96 Views

Hello,

Thanks a lot for your explanation. Actually my problem is that I have to do the factorization only once and then solution phase has to be repeated. So is there a way to do the fill in stuff once and for all so that if keeps the matrices in memory (I guess it automatically does unless phase=-1 is called). Specifically speaking I want something between phase 22 and 33, i.e. the fill-in part to be done at initialization stage, since only array b is going to change in my program. Or even during the fill-in it needs b ?

Thanks again for your help,

Dhiraj

mecej4

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-06-2015
05:03 AM

96 Views

You can make one set of calls (or single call) to complete phases 1 and 2 at the beginning of your run. Then, you can write a loop in which you assign new values to the r.h.s. vector b ( as in A.x = b), and obtain the solution with PHASE=33, do what you want with the solution x, and go on to the next r.h.s.

However, given the large fill-in that occurs during factorization of your atypical matrix, it can very well happen that the solution phase consumes CPU time that is not negligible compared to the factorization phase, especially if multiple solutions are asked for.

Hazra__Dhiraj_Kumar

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-06-2015
09:58 AM

96 Views

Hello,

Thank you for your answer. I am already calling the 11 and 22 phase at the initialization stage as my matrix A is not going to change. I am just calling 33 in a loop where matrix b gets changed and I get different x. And the solution phase is taking significantly larger time, seems that 10 times more than the factorization phase. Now I have 1 query. I shall be thankful if you can help in this matter.

The solution step is anyway going to take time as you said. I guess the forward and backward substitution for these elements takes time. In this case, can the cluster sparse solve help? I am yet to know the detailed structure of the cluster sparse solve but it seems that the more node I use, the faster the program will run. Am I right? or the result phase is expected to take nearly equal amount of time like this ?

Thanks again,

Dhiraj

mecej4

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-06-2015
10:30 AM

96 Views

Alexander_K_Intel2

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-09-2015
04:11 AM

96 Views

Hi Dhiraj,

Could you provide full log with time of solving step? it is not expected that time of solving step was bigger than factorization time only if number of rhs is huge.

Thanks,

Alex

Hazra__Dhiraj_Kumar

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-09-2015
07:23 AM

96 Views

Hi Alexander,

Actually last time I stopped the program after 1 hour. Today I ran it again and found that at nearly one hour the program stopped (automatically killed). It seems that the desktop ram (8GB) exhausted during the solution phase. I started the run again on our cluster and found that the program runs completely fine. The output is provided below. The cluster has 32 GB ram. Hence it seems that it is completely a ram problem. But I still do not understand when in cluster it took only 0.85 seconds in the solution phase why the desktop took so long before it got killed. In a smaller size problem (where size of b is 400000) computation time in the cluster and in my desktop was similar. Anyway my main problem here is solved. Thanks for all your input

Cheers,

Dhiraj

=== PARDISO: solving a real nonsymmetric system === The local (internal) PARDISO version is : 103911000 1-based array indexing is turned ON PARDISO double precision computation is turned ON Parallel METIS algorithm at reorder step is turned ON Scaling is turned ON Matching is turned ON Summary: ( reordering phase ) ================ Times: ====== Time spent in calculations of symmetric matrix portrait (fulladj): 0.101121 s Time spent in reordering of the initial matrix (reorder) : 1.829722 s Time spent in symbolic factorization (symbfct) : 3.486545 s Time spent in data preparations for factorization (parlist) : 0.096258 s Time spent in allocation of internal data structures (malloc) : 0.447319 s Time spent in additional calculations : 0.271942 s Total time spent : 6.232907 s Statistics: =========== < Parallel Direct Factorization with number of processors: > 20 < Numerical Factorization with BLAS3 and O(n) synchronization > < Linear system Ax = b > number of equations: 1000000 number of non-zeros in A: 6940000 number of non-zeros in A (%): 0.000694 number of right-hand sides: 1 < Factors L and U > number of columns for each panel: 72 number of independent subgraphs: 0 number of supernodes: 655255 size of largest supernode: 11186 number of non-zeros in L: 781479562 number of non-zeros in U: 768317946 number of non-zeros in L+U: 1549797508 Reordering completed ... Number of nonzeros in factors = 1549797508 Number of factorization MFLOPS = 10707689 === PARDISO is running in In-Core mode, because iparam(60)=0 === Percentage of computed non-zeros for LL^T factorization 0 % 1 % 2 % 3 % 4 % 5 % 6 % 7 % 8 % 9 % 10 % 11 % 12 % 13 % 14 % 15 % 16 % 17 % 18 % 19 % 20 % 21 % 22 % 23 % 24 % 25 % 26 % 27 % 28 % 29 % 30 % 31 % 32 % 33 % 35 % 36 % 37 % 39 % 40 % 41 % 42 % 43 % 44 % 45 % 47 % 48 % 49 % 51 % 52 % 54 % 55 % 57 % 59 % 60 % 62 % 63 % 65 % 67 % 69 % 71 % 73 % 75 % 77 % 79 % 82 % 84 % 86 % 89 % 92 % 93 % 95 % 96 % 98 % 99 % 100 % === PARDISO: solving a real nonsymmetric system === Single-level factorization algorithm is turned ON Summary: ( factorization phase ) ================ Times: ====== Time spent in copying matrix to internal data structure (A to LU): 0.000001 s Time spent in factorization step (numfct) : 140.436790 s Time spent in allocation of internal data structures (malloc) : 0.001189 s Time spent in additional calculations : 0.000002 s Total time spent : 140.437982 s Statistics: =========== < Parallel Direct Factorization with number of processors: > 20 < Numerical Factorization with BLAS3 and O(n) synchronization > < Linear system Ax = b > number of equations: 1000000 number of non-zeros in A: 6940000 number of non-zeros in A (%): 0.000694 number of right-hand sides: 1 < Factors L and U > number of columns for each panel: 72 number of independent subgraphs: 0 number of supernodes: 655255 size of largest supernode: 11186 number of non-zeros in L: 781479562 number of non-zeros in U: 768317946 number of non-zeros in L+U: 1549797508 gflop for the numerical factorization: 10707.689303 gflop/s for the numerical factorization: 76.245614 Factorization completed ... === PARDISO: solving a real nonsymmetric system === Summary: ( solution phase ) ================ Times: ====== Time spent in direct solver at solve step (solve) : 0.850383 s Time spent in additional calculations : 1.698694 s Total time spent : 2.549077 s Statistics: =========== < Parallel Direct Factorization with number of processors: > 20 < Numerical Factorization with BLAS3 and O(n) synchronization > < Linear system Ax = b > number of equations: 1000000 number of non-zeros in A: 6940000 number of non-zeros in A (%): 0.000694 number of right-hand sides: 1 < Factors L and U > number of columns for each panel: 72 number of independent subgraphs: 0 number of supernodes: 655255 size of largest supernode: 11186 number of non-zeros in L: 781479562 number of non-zeros in U: 768317946 number of non-zeros in L+U: 1549797508 gflop for the numerical factorization: 10707.689303 gflop/s for the numerical factorization: 76.245614 Solve completed ...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.