Dear Pardiso users,
I am using Intel Pardiso to solve a sparse system. For small matrices it works perfectly and very fast. However, with increasing system size I figured out that for some set of parameters the numerical factorization doesn't work. Since I have to solve the system several times and the system does not change crucially for each step, I use the CGS-algorithm (iparm(4) = 91). Now, the following problem occurs: the solver directly (iparm(4) = 0) solves the system as a first step and the following error is produced:
*** error PARDISO: iterative refinement contraction rate is greater than 0.9, interrupt
Unfortunately, both the documentation and the forum/Intel website do not provide any further information about this error. In the following you can see the iparm parameters I have used. They almost correspond to default values, however, iparm(4) is set to perform CGS steps.
call mkl_set_dynamic ( 0 ) iparm(1) = 1 ! do not use default values iparm(2) = 3 ! fill-in reordering from METIS iparm(3) = 1 ! Number of processors iparm(4) = 91 ! iterative-direct algorithm iparm(8) = 10 ! Max. number of iterative refinement steps on entry iparm(10) = 13 ! perturb the pivot elements with 1E-13 iparm(11) = 1 ! use nonsymmetric permutation and scaling MPS iparm(13) = 1 ! Improved accuracy using nonsymmetric weighted matching iparm(21) = 1 ! Apply 1x1 diagonal pivoting during the factorization process iparm(24) = 1 ! Parallel factorization control iparm(25) = 1 ! Parallel forward/backward solve control iparm(27) = 1 ! checks whether column indices are sorted in increasing order within each row I would like to share a code that you can reproduce the problem, however, the matrices are 1.2 GB large. As I have written, the problem only occurs for large systems. In the following, you can see an extract of my code that performs the solution of the system. ik = 0 cgsxcounter = 0 do ik = ik + 1 if (ik.ge.3) then write(*,*) 'no solution found' stop end if if (cgsxcounter.eq.0) then iparm(4) = 0 !Release all memory phase = -1 call pardiso_64 (pt, maxfct, mnum, mtype, phase, DimensionL, ddum, idum, idum, idum, nrhs, iparm, msglvl, ddum, ddum, error) !Reordering and Symbolic Factorization, This step also allocates all memory that is necessary for the factorization phase = 11 ! only reordering and symbolic factorization call pardiso_64 (pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, ddum, ddum, error) if (error.ne.0) write(*,*) 'Reordering and Symbolic Factorization wrong: ', error cgsxcounter=1 end if !Factorization. phase = 22 ! only factorization call pardiso_64 (pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, ddum, ddum, error) if (error.ne.0) stop !Back substitution and iterative refinement phase = 33 ! only substitution call pardiso_64 (pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, rhodot, rho, error) if (iparm(20).lt.0) then write(*,*) 'Try again' cgsxcounter = 0 else exit end if end do iparm(4) = 91
It would be nice if you could provide more information about the problem. What can cause such a error? What can be done in order to avoid it?
Thanks in advance,
As an add-on: I know what the error means, I just want to know how it can occur and if there is a proper workaround.
Another thing bothers me. In the set of parameters, where intel pardiso seems to work properly, my jobs are sometimes aborted abnormally with the following error:
/var/spool/torque/mom_priv/jobs/2609619.adm01.SC: line 29: 11952 Bus error ./a.out
traps: a.out trap stack segment ip:16c5b50 sp:7fff237c04b0 error:0 in a.out[400000+4708000]
Maybe it indicates the real issue. I have never seen such an error and after googling I could not find something similar. The thing is that I cannot reproduce the issue. Sometimes it occurs sometimes not. Firstly, I thought that it is a hardware issue, however, I can reproduce the error on different nodes.
I am testing right now if the compiler- or mkl version solves the problem. Right now I am using ifort 2017 and mkl 2017. Unfortunately, I cannot test more recent versions. I'll keep you up to date.
Thanks in advance,
First, do you see the error (contraction rate) when you're calling phase 33 after you called 22 with iparm(4) = 0? From the code snippet you provided it looks like this. Then, CGS has nothing to do with it, right? Also, I see the call with phase -1 at the start of the loop in the code snippet, is it really what is happening? You shouldn't do that and this could be the root cause for the segfaults you mentioned in your second post.
Second, we don't recommend in general to use the CGS algorithm. If you want to change the matrix slightly and then solve the system iteratively without factorization, I suggest that you simply skip the phase 22 and call phase 33 with iterative refinement turned on (iparm(8)) to solve the changed systems. In this case, the existing factorization will be used. With that, you should not free the memory until the last call to pardiso (otherwise factorization data will be deallocated).
Another thing to keep in mind is that even if the matrix changes only slightly, the scaling and matching might depend on that and this can affect the behavior (convergence) of CGS significantly.
Hope this helps.
thank you for your response. Yes, indeed, It has nothing to do with CGS. Just wanted to describe the full problem. Thank you for taking a look at this. Yes, I use phase=-1, since I thought that it releases all internal memory for all matrices before I continue. However, I will delete this from the code and test if it works.
Ok, thank you for this comment. I will also try this and let you know if this helps.
I have tried to change iparm(4)=0 and modified the code as in the following
ik = 0 do ik = ik + 1 if (ik.ge.3) then write(*,*) 'solution could not be found' stop end if if (cgsxcounter.eq.0) then !Reordering and Symbolic Factorization phase = 11 ! only reordering and symbolic factorization call pardiso_64 (pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, ddum, ddum, error) !Factorization. phase = 22 ! only factorization call pardiso_64 (pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, ddum, ddum, error) cgsxcounter=1 end if !Back substitution and iterative refinement phase = 33 ! only substitution call pardiso_64 (pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, rhodot, rho, error) end do
As far as I can see the segfault error disappeared. However, the problem with the iterative refinement still occurs. Does this indicate that there is no proper solution of the system? Or what exactly does it indicate? If this error occurs, the error flag isn't set to unequal 0. Why does this happen?
I would expect that the matrix is ill-conditioned. Iterative refinement steps (two steps which are automatically done) are performed when the solution found by the solver has a large residual. Since it is a direct solver, this can happen when the matrix has large condition number or when many pivots has been perturbed during the factorization. Unfortunately, it is hard to say more without looking into the actual workflow for your matrices.
As an option, I suggest you try the two-level factorization, iparm(24) = 1, this would use a different algorithm for the factorization and might solve the problem you observe.
In case it doesn't help, is it possible that you somehow provide us your matrices?
Another thing is: could you also check the iparm after the factorization phase (22), in particular, iparm(14) which reports the number of perturbed pivots? And what is the matrix type in your case? Also, was there a specific reason to use the diagonal pivoting instead of the default option (I suggest setting iparm(21)=0)?
EDIT: When switching two-level factorization on, please turn off matching and scaling (set iparm(11) and iparm(13) = 0).
EDIT2: Haven't noticed that you're already using the 2-level factorization. Though to be sure that it really uses the 2-level algorithm, still turn the matching and scaling off and retry the code.
I have tried out everything you suggested and it helped nothing. I have set iparm(24) =1, iparm(11)=0 and iparm(13)=0. I have also taken a look at iparm(14)=57 and iparm(20)=0. So I suggest that the matrix is highly ill conditioned right? Unfortunately, as I said, the matrices are quite large and I can only upload files that are up to 250MB. As long as there isn't another opportunity to send you the files, I guess that is all you can do for me.
Thanks in advance,