- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I have implemented Pardiso to solve my problem and it works fine. But the memory cost and efficiency of Pardiso is not outstanding. We have another sparse solver (Solver A, not parallelized) that use level-based preconditioner and CGS-based acceleration.The memory cost for Pardiso is about three to four times of Solver A. For most of my matrix, if I use only one processor for Pardiso, Solver A is much faster, if I use six processors, they are almost the same speed. For some large matrix, Solver A is even faster than Pardiso even if Pardiso uses 6 processors. I compared the runtime of numerical factorization and substitution, and found that solver A is much more efficient. We do not expect Pardiso to beat Solver A if Pardiso runs in serial mode. But we do hope Pardiso to beat Solver A when it runs in parallel mode using 6 processors.
The speedup for Pardiso seems good, I can get a speedup of 4.5 if I use 6 processors.
Here are the parameter I use.
maxfct = 1
mnum = 1
nrhs = 1
error = 0 ! initialize error flag
msglvl = 0 ! print statistical information
mtype = 11 ! real unsymmetric
iparm= 0
iparm(1) = 1 ! no solver default
iparm(2) = 3 ! fill-in reordering from METIS ,0-MIN DEGREE, 2-METIS, 3-OPENMP VERSION
iparm(3) = 0 ! numbers of processors. Input the next call mkl_set_dynamic(0), mkl_set_num_threads(n);
iparm(4) = 61 ! 0-no iterative-direct algorithm; 10*L+K, K=1 CGS, K=2 CGS for symmetric, 1.0E-L: stopping criterion
iparm(5) = 0 ! no user fill-in reducing permutation
iparm(6) = 0 ! if == 0, the array of b is replaced with the solution x.
iparm(7) = 0 ! Output, Number of iterative refinement steps performed
iparm(8) = 9 ! numbers of iterative refinement steps, must be 0 if a solution is calculated with separate substitutions
iparm(9) = 0 ! not in use
iparm(10) = 13 ! Default value 13, perturbe the pivot elements with 1E-13
iparm(11) = 1 ! use nonsymmetric permutation and scaling MPS
iparm(12) = 0 ! not in use
iparm(13) = 1 ! maximum weighted matching algorithm is switched-on (default for non-symmetric)
iparm(14) = 0 ! Output: number of perturbed pivots
iparm(15) = 0 ! Output, Peak memory on symbolic factorization.
iparm(16) = 0 ! Output, Permanent memory on symbolic factorization. This value is only computed in phase 1.
iparm(17) = 0 ! Output, Size of factors/Peak memory on numerical factorization and solution.
iparm(18) = 0 ! Input/output. Report the number of non-zero elements in the factors. >= 0 Disable reporting.
iparm(19) = 0 ! Input/output. Report number of floating point operations to factor matrix A. >= 0 Disable reporting.
iparm(20) = 0 ! Output: Numbers of CG Iterations. >0 CGS succeeded, reports the number of completed iterations.
iparm(24) = 1 ! Parallel factorization control, 0: classic algorithm, 1: two-level factorization algorithm, improve scalability on many threads.
iparm(25) = 0 ! Parallel forward/backward solve control. 0: Use parallel algorithm for the solve step; 1: Use the sequential forward/backward solve.
I tried to change iparm(4) and iparm(10), that does not make much difference. How can I improve the efficiency of Pardiso?
Thanks and best regards,
Daniel
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The codes are as follows:
phase = 11 ! only reordering and symbolic factorization
call pardiso (pt_in, maxfct, mnum, mtype, phase, n_in, a_in, ia_in, ja_in, &
idum, nrhs, iparm_in, msglvl, ddum, ddum, error)
!Newton loop
phase = 22 ! only factorization
call pardiso (pt_in, maxfct, mnum, mtype, phase, n_in, a_in, ia_in, ja_in, &
idum, nrhs, iparm_in, msglvl, ddum, ddum, error)
phase = 33 ! only substitution
call pardiso (pt_in, maxfct, mnum, mtype, phase, n_in, a_in, ia_in, ja_in, &
idum, nrhs, iparm_in, msglvl, b_in, x_out, error)
IF THE RESULT IS NOT GOOD, CALL SYBMOLIC FACTORIZATION AGAIN.
!Newton loop
phase = -1 ! release internal memory
call pardiso (pt_in, maxfct, mnum, mtype, phase, n_in, ddum, idum, &
idum, idum, nrhs, iparm, msglvl, ddum, ddum, error)
I did not release memory every step in Newton loop, is this necessary?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
danielsue wrote:
Hi All,
I have implemented Pardiso to solve my problem and it works fine. But the memory cost and efficiency of Pardiso is not outstanding. We have another sparse solver (Solver A, not parallelized) that use level-based preconditioner and CGS-based acceleration.The memory cost for Pardiso is about three to four times of Solver A. For most of my matrix, if I use only one processor for Pardiso, Solver A is much faster, if I use six processors, they are almost the same speed. For some large matrix, Solver A is even faster than Pardiso even if Pardiso uses 6 processors. I compared the runtime of numerical factorization and substitution, and found that solver A is much more efficient. We do not expect Pardiso to beat Solver A if Pardiso runs in serial mode. But we do hope Pardiso to beat Solver A when it runs in parallel mode using 6 processors.
The speedup for Pardiso seems good, I can get a speedup of 4.5 if I use 6 processors.
Here are the parameter I use.
maxfct = 1
mnum = 1
nrhs = 1
error = 0 ! initialize error flag
msglvl = 0 ! print statistical information
mtype = 11 ! real unsymmetriciparm= 0
iparm(1) = 1 ! no solver default
iparm(2) = 3 ! fill-in reordering from METIS ,0-MIN DEGREE, 2-METIS, 3-OPENMP VERSION
iparm(3) = 0 ! numbers of processors. Input the next call mkl_set_dynamic(0), mkl_set_num_threads(n);
iparm(4) = 61 ! 0-no iterative-direct algorithm; 10*L+K, K=1 CGS, K=2 CGS for symmetric, 1.0E-L: stopping criterion
iparm(5) = 0 ! no user fill-in reducing permutation
iparm(6) = 0 ! if == 0, the array of b is replaced with the solution x.
iparm(7) = 0 ! Output, Number of iterative refinement steps performed
iparm(8) = 9 ! numbers of iterative refinement steps, must be 0 if a solution is calculated with separate substitutions
iparm(9) = 0 ! not in use
iparm(10) = 13 ! Default value 13, perturbe the pivot elements with 1E-13
iparm(11) = 1 ! use nonsymmetric permutation and scaling MPS
iparm(12) = 0 ! not in use
iparm(13) = 1 ! maximum weighted matching algorithm is switched-on (default for non-symmetric)
iparm(14) = 0 ! Output: number of perturbed pivots
iparm(15) = 0 ! Output, Peak memory on symbolic factorization.
iparm(16) = 0 ! Output, Permanent memory on symbolic factorization. This value is only computed in phase 1.
iparm(17) = 0 ! Output, Size of factors/Peak memory on numerical factorization and solution.
iparm(18) = 0 ! Input/output. Report the number of non-zero elements in the factors. >= 0 Disable reporting.
iparm(19) = 0 ! Input/output. Report number of floating point operations to factor matrix A. >= 0 Disable reporting.
iparm(20) = 0 ! Output: Numbers of CG Iterations. >0 CGS succeeded, reports the number of completed iterations.
iparm(24) = 1 ! Parallel factorization control, 0: classic algorithm, 1: two-level factorization algorithm, improve scalability on many threads.
iparm(25) = 0 ! Parallel forward/backward solve control. 0: Use parallel algorithm for the solve step; 1: Use the sequential forward/backward solve.I tried to change iparm(4) and iparm(10), that does not make much difference. How can I improve the efficiency of Pardiso?
Thanks and best regards,
Daniel
The codes are as follows:
!codes start
phase = 11 ! only reordering and symbolic factorization
call pardiso (pt_in, maxfct, mnum, mtype, phase, n_in, a_in, ia_in, ja_in, &
idum, nrhs, iparm_in, msglvl, ddum, ddum, error)
!Newton loop
phase = 22 ! only factorization
call pardiso (pt_in, maxfct, mnum, mtype, phase, n_in, a_in, ia_in, ja_in, &
idum, nrhs, iparm_in, msglvl, ddum, ddum, error)
phase = 33 ! only substitution
call pardiso (pt_in, maxfct, mnum, mtype, phase, n_in, a_in, ia_in, ja_in, &
idum, nrhs, iparm_in, msglvl, b_in, x_out, error)
IF THE RESULT IS NOT GOOD, CALL SYBMOLIC FACTORIZATION AGAIN.
!Newton loop
phase = -1 ! release internal memory
call pardiso (pt_in, maxfct, mnum, mtype, phase, n_in, ddum, idum, &
idum, idum, nrhs, iparm, msglvl, ddum, ddum, error)
!codes end
I didn't release memory every newton loop, is this necesary?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Daniel,
Nice to hear you again. :) First of all I will try to switch on parallel solving. Also I see that you turn on iterative algorithm instead of factorization step, did you try to switch it off (iparm(4) = 0)? Also it make sense to obtain solution on each nonlinear iteration by internal iterative routine, for example you can make several iteration of fgmres method using current matrix as stiffness matrix and solving step of pardiso with LU decomposition of previous matrix as preconditioner. Solution from previous non-linear iterative step could be initial solution for internal iterative algorithm. But, as i wrote before, in such case we modified algorithm on nonlinear solver and nobody can guarantee that solutions will be the same :) Does this make sense?
With best regards,
Alexander Kalinkin
P.S. Also could you provide any information about solver A?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Alexander Kalinkin (Intel) wrote:
Hi Daniel,
Nice to hear you again. :) First of all I will try to switch on parallel solving. Also I see that you turn on iterative algorithm instead of factorization step, did you try to switch it off (iparm(4) = 0)? Also it make sense to obtain solution on each nonlinear iteration by internal iterative routine, for example you can make several iteration of fgmres method using current matrix as stiffness matrix and solving step of pardiso with LU decomposition of previous matrix as preconditioner. Solution from previous non-linear iterative step could be initial solution for internal iterative algorithm. But, as i wrote before, in such case we modified algorithm on nonlinear solver and nobody can guarantee that solutions will be the same :) Does this make sense?
With best regards,
Alexander Kalinkin
P.S. Also could you provide any information about solver A?
Hi Alexander,
Thanks for the quick reply.
I also tried to turn off iterative algorithm (iparm(4) = 0), this does not guarantee higher speed. Sometimes it is faster when iparm(4) = 0. Currently I didn't modify algorithm on nonlinear solver
SolverA is Waterloo Sparese, Iterative Matrix Solver. I attached a user's guide and notes for this solver.
Thanks and regards,
Daniel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page