- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Dear All,

I have a reactive transport model that utilizing pardiso as the solver. The model is discretized by control volume method and solved by newton iteration method. I do sybmolic factorization at first, then do numerical factorization and substitution in every time step. The model can work but I do have a question on the speedup problem.

The parameter for pardiso are as follows:

iparm = 0

iparm(1) = 1 ! no solver default

iparm(2) = 3 ! fill-in reordering from METIS ,0-MIN DEGREE, 2-METIS, 3-OPENMP VERSION

iparm(3) = 0 ! numbers of processors. Input the next call mkl_set_dynamic(0), mkl_set_num_threads(n);

iparm(4) = 61 ! 0-no iterative-direct algorithm; 10*L+K, K=1 CGS, K=2 CGS for symmetric, 1.0E-L: stopping criterion

iparm(5) = 0 ! no user fill-in reducing permutation

iparm(6) = 0 ! if == 0, the array of b is replaced with the solution x.

iparm(7) = 0 ! Output, Number of iterative refinement steps performed

iparm(8) = 9 ! numbers of iterative refinement steps, must be 0 if a solution is calculated with separate substitutions (phase = 331, 332, 333)

iparm(9) = 0 ! not in use

iparm(10) = 13 ! Default value 13, perturbe the pivot elements with 1E-13

iparm(11) = 1 ! use nonsymmetric permutation and scaling MPS

iparm(12) = 0 ! not in use

iparm(13) = 1 ! maximum weighted matching algorithm is switched-on (default for non-symmetric)

iparm(14) = 0 ! Output: number of perturbed pivots

iparm(15) = 0 ! Output, Peak memory on symbolic factorization.

iparm(16) = 0 ! Output, Permanent memory on symbolic factorization. This value is only computed in phase 1.

iparm(17) = 0 ! Output, Size of factors/Peak memory on numerical factorization and solution.

iparm(18) = 0 ! Input/output. Report the number of non-zero elements in the factors. >= 0 Disable reporting.

iparm(19) = 0 ! Input/output. Report number of floating point operations to factor matrix A. >= 0 Disable reporting.

iparm(20) = 0 ! Output: Numbers of CG Iterations. >0 CGS succeeded, reports the number of completed iterations.

iparm(24) = 1 ! Parallel factorization control, 0: classic algorithm, 1: two-level factorization algorithm, improve scalability on many threads.

iparm(25) = 0 ! Parallel forward/backward solve control. 0: Use parallel algorithm for the solve step; 1: Use the sequential forward/backward solve.

iparm(27) = 0 !check matrix error, 0-without check, 1-check

maxfct = 1

mnum = 1

nrhs = 1

error = 0 ! initialize error flag

msglvl = 0 ! print statistical information

mtype = 11 ! real unsymmetric

First, I do symbolic factorization with phase = 11 (One time)

phase = 11 ! only reordering and symbolic factorization

call pardiso (pt_in, maxfct, mnum, mtype, phase, n_in, a_in, ia_in, ja_in, idum, nrhs, iparm, msglvl, ddum, ddum, error)

Then, in every time step, I do newton iteration. For every newton iteration, I do numerical factorization and substitution:

!Newton iteration

phase = 22 ! only factorization

call pardiso (pt_in, maxfct, mnum, mtype, phase, n_in, a_in, ia_in, ja_in, idum, nrhs, iparm_in, msglvl, ddum, ddum, error)

phase = 33 ! only substitution

call pardiso (pt_in, maxfct, mnum, mtype, phase, n_in, a_in, ia_in, ja_in, idum, nrhs, iparm, msglvl, b_in, x_out, error)

!Newton iteration

The matrix is 2573775*2573775 with nozero entry of 12810555.

When I use 4 processors, the time consuming is as follows:

TimeStep NewtonIteration NumericalFactorization Substitution

1 1 2.673447 0.253894

1 2 0.000024 0.614713

1 3 0.000023 0.595046

2 1 0.000023 0.619408

2 2 0.000023 0.613553

...

My question is: Why does substitution cost so much time after the first step? In common sense, time consuming of substitution is much less than numerical factorization.

System is WIN7 X64

Compiler: Intel Parallel Studio XE 2013

Thanks.

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi,

I see that you set iparm(4) to nonzero balue. As result you Cholesky decomposition have been replaced to iterative algorithm with preconditioner calculated on first run. That is the reason of almost zero factorization time and significant solving time im your implementation.

With best regards,

Alexander Kalinkin

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi Alexander,

Thanks so much. I have another question on the problem of symbolic factorization. I tried to do symbolic factorization at first run and then in the following newton iterations, only do numerical factorization and substitution. For the simple problem, this works fine. But for the complex problem, things are quite different. The results are correct in the beginning, but after many iterations, the result is incorrect with large error. Then I tried to do symbolic factorization every step, it can generate correct results, but this significant reduce the efficient. Is there anything wrong?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi,

In nonsymmetric case you turn on matching and scaling to improve quality of factorization step. After whn you change matrix, you use reordering from initial matrix that have been reordering to improve factorization quality of that matrix. As results your reordering is not quite good and follows to bad results on factorization and solving. For nonsymmetric case better to call reorder step for each matrix.

With best regards,

Alexander Kalinkin

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Alexander Kalinkin (Intel) wrote:

Hi,

In nonsymmetric case you turn on matching and scaling to improve quality of factorization step. After whn you change matrix, you use reordering from initial matrix that have been reordering to improve factorization quality of that matrix. As results your reordering is not quite good and follows to bad results on factorization and solving. For nonsymmetric case better to call reorder step for each matrix.

With best regards,

Alexander Kalinkin

Hi Alexander,

Thanks so much.

Is there any alternative method to avoid symbolic factorization everytime or is there a criteria that can help me to judge when I need to call reorder step. I have tested on many simulations, for about half of the simulations, I can get good results with symbolic factorization only for the initial step, and for the others, I should do symbolic factorization every step to get correct results.

Thanks again and best regards,

Daniel

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi Daniel,

It does depend on the problem. As i understood you solve nonlinear problem used PARDISO on each step to obtain solution of linear equations. The main problem of nonlinear problem - each one needs private investigation :) As for me, I call several factorization and solve step on one reorder step and made new reorder step in case when solution of linear equation become poor - number of iterative of solving step increase or residual norm increase of something else. In case of convergent algorithm number of PARDISO call without new reordering step become bigger.

With best regards,

Alexander Kalinkin

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Alexander Kalinkin (Intel) wrote:

Hi Daniel,

It does depend on the problem. As i understood you solve nonlinear problem used PARDISO on each step to obtain solution of linear equations. The main problem of nonlinear problem - each one needs private investigation :) As for me, I call several factorization and solve step on one reorder step and made new reorder step in case when solution of linear equation become poor - number of iterative of solving step increase or residual norm increase of something else. In case of convergent algorithm number of PARDISO call without new reordering step become bigger.

With best regards,

Alexander Kalinkin

Hi Alexander,

Thanks for your suggestion. I set the threshold for maximum residual, maximum solver refinement steps and maximum iterative steps if CGS acceleration is used. If the solver cannot meet the provided requisition, recall symbolic factorization. This dynamically scheme will not waste time on unnecessary symbolic factorizaion can can reduce the runtime a lot. I don't know whether it is a good method, as in common sense, symbolic factorization usually take one time.

Thanks again with all my best regards,

Daniel

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page