Substitution time consuming in Pardiso

danielsue · ‎04-17-2013

Dear All,

I have a reactive transport model that utilizing pardiso as the solver. The model is discretized by control volume method and solved by newton iteration method. I do sybmolic factorization at first, then do numerical factorization and substitution in every time step. The model can work but I do have a question on the speedup problem.

The parameter for pardiso are as follows:
   iparm = 0
        iparm(1) = 1 ! no solver default
        iparm(2) = 3 ! fill-in reordering from METIS ,0-MIN DEGREE, 2-METIS, 3-OPENMP VERSION
        iparm(3) = 0 ! numbers of processors. Input the next call mkl_set_dynamic(0), mkl_set_num_threads(n);
        iparm(4) = 61 ! 0-no iterative-direct algorithm; 10*L+K, K=1 CGS, K=2 CGS for symmetric, 1.0E-L: stopping criterion
        iparm(5) = 0 ! no user fill-in reducing permutation
        iparm(6) = 0 ! if == 0, the array of b is replaced with the solution x.
        iparm(7) = 0 ! Output, Number of iterative refinement steps performed
        iparm(8) = 9 ! numbers of iterative refinement steps, must be 0 if a solution is calculated with separate substitutions (phase = 331, 332, 333)
        iparm(9) = 0 ! not in use
        iparm(10) = 13 ! Default value 13, perturbe the pivot elements with 1E-13
        iparm(11) = 1 ! use nonsymmetric permutation and scaling MPS
        iparm(12) = 0 ! not in use
        iparm(13) = 1 ! maximum weighted matching algorithm is switched-on (default for non-symmetric)
        iparm(14) = 0 ! Output: number of perturbed pivots
        iparm(15) = 0 ! Output, Peak memory on symbolic factorization.
        iparm(16) = 0 ! Output, Permanent memory on symbolic factorization. This value is only computed in phase 1.
        iparm(17) = 0 ! Output, Size of factors/Peak memory on numerical factorization and solution.
        iparm(18) = 0 ! Input/output. Report the number of non-zero elements in the factors. >= 0 Disable reporting.
        iparm(19) = 0 ! Input/output. Report number of floating point operations to factor matrix A. >= 0 Disable reporting.
        iparm(20) = 0 ! Output: Numbers of CG Iterations. >0 CGS succeeded, reports the number of completed iterations.
        iparm(24) = 1 ! Parallel factorization control, 0: classic algorithm, 1: two-level factorization algorithm, improve scalability on many threads.
        iparm(25) = 0 ! Parallel forward/backward solve control. 0: Use parallel algorithm for the solve step; 1: Use the sequential forward/backward solve.
        iparm(27) = 0 !check matrix error, 0-without check, 1-check

        maxfct = 1
        mnum = 1
        nrhs = 1
        error = 0 ! initialize error flag
        msglvl = 0 ! print statistical information
        mtype = 11 ! real unsymmetric

First, I do symbolic factorization with phase = 11 (One time)
        phase = 11   ! only reordering and symbolic factorization
        call pardiso (pt_in, maxfct, mnum, mtype, phase, n_in, a_in, ia_in, ja_in, idum, nrhs, iparm, msglvl, ddum, ddum, error)
Then, in every time step, I do newton iteration. For every newton iteration, I do numerical factorization and substitution:
!Newton iteration
        phase = 22   ! only factorization
        call pardiso (pt_in, maxfct, mnum, mtype, phase, n_in, a_in, ia_in, ja_in, idum, nrhs, iparm_in, msglvl, ddum, ddum, error)
        phase = 33   ! only substitution
   call pardiso (pt_in, maxfct, mnum, mtype, phase, n_in, a_in, ia_in, ja_in, idum, nrhs, iparm, msglvl, b_in, x_out, error)
!Newton iteration

The matrix is 2573775*2573775 with nozero entry of 12810555.
When I use 4 processors, the time consuming is as follows:
TimeStep NewtonIteration NumericalFactorization Substitution
1         1                2.673447               0.253894
1         2                0.000024               0.614713
1         3                0.000023               0.595046
2         1                0.000023               0.619408
2         2                0.000023               0.613553
...
My question is: Why does substitution cost so much time after the first step? In common sense, time consuming of substitution is much less than numerical factorization.

System is WIN7 X64
Compiler: Intel Parallel Studio XE 2013

Thanks.

Alexander_K_Intel2 · ‎04-17-2013

Hi,

I see that you set iparm(4) to nonzero balue. As result you Cholesky decomposition have been replaced to iterative algorithm with preconditioner calculated on first run. That is the reason of almost zero factorization time and significant solving time im your implementation.

With best regards,

Alexander Kalinkin

danielsue · ‎04-17-2013

Hi Alexander,

Thanks so much. I have another question on the problem of symbolic factorization. I tried to do symbolic factorization at first run and then in the following newton iterations, only do numerical factorization and substitution. For the simple problem, this works fine. But for the complex problem, things are quite different. The results are correct in the beginning, but after many iterations, the result is incorrect with large error. Then I tried to do symbolic factorization every step, it can generate correct results, but this significant reduce the efficient. Is there anything wrong?

Alexander_K_Intel2 · ‎04-18-2013

Hi,

In nonsymmetric case you turn on matching and scaling to improve quality of factorization step. After whn you change matrix, you use reordering from initial matrix that have been reordering to improve factorization quality of that matrix. As results your reordering is not quite good and follows to bad results on factorization and solving. For nonsymmetric case better to call reorder step for each matrix.

With best regards,

Alexander Kalinkin

danielsue · ‎04-18-2013

Alexander Kalinkin (Intel) wrote:

Hi,

In nonsymmetric case you turn on matching and scaling to improve quality of factorization step. After whn you change matrix, you use reordering from initial matrix that have been reordering to improve factorization quality of that matrix. As results your reordering is not quite good and follows to bad results on factorization and solving. For nonsymmetric case better to call reorder step for each matrix.

With best regards,

Alexander Kalinkin

Hi Alexander,

Thanks so much.

Is there any alternative method to avoid symbolic factorization everytime or is there a criteria that can help me to judge when I need to call reorder step. I have tested on many simulations, for about half of the simulations, I can get good results with symbolic factorization only for the initial step, and for the others, I should do symbolic factorization every step to get correct results.

Thanks again and best regards,

Daniel

Alexander_K_Intel2 · ‎04-18-2013

Hi Daniel,

It does depend on the problem. As i understood you solve nonlinear problem used PARDISO on each step to obtain solution of linear equations. The main problem of nonlinear problem - each one needs private investigation :) As for me, I call several factorization and solve step on one reorder step and made new reorder step in case when solution of linear equation become poor - number of iterative of solving step increase or residual norm increase of something else. In case of convergent algorithm number of PARDISO call without new reordering step become bigger.

With best regards,

Alexander Kalinkin

danielsue · ‎04-18-2013

Alexander Kalinkin (Intel) wrote:

Hi Daniel,

It does depend on the problem. As i understood you solve nonlinear problem used PARDISO on each step to obtain solution of linear equations. The main problem of nonlinear problem - each one needs private investigation :) As for me, I call several factorization and solve step on one reorder step and made new reorder step in case when solution of linear equation become poor - number of iterative of solving step increase or residual norm increase of something else. In case of convergent algorithm number of PARDISO call without new reordering step become bigger.

With best regards,

Alexander Kalinkin

Hi Alexander,

Thanks for your suggestion. I set the threshold for maximum residual, maximum solver refinement steps and maximum iterative steps if CGS acceleration is used. If the solver cannot meet the provided requisition, recall symbolic factorization. This dynamically scheme will not waste time on unnecessary symbolic factorizaion can can reduce the runtime a lot. I don't know whether it is a good method, as in common sense, symbolic factorization usually take one time.

Thanks again with all my best regards,

Daniel