Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Intel Pardiso error numerical factorization

Horst
Beginner
932 Views

Dear Pardiso users,

I am using Intel Pardiso to solve a sparse system. For small matrices it works perfectly and very fast. However, with increasing system size I figured out that for some set of parameters the numerical factorization doesn't work. Since I have to solve the system several times and the system does not change crucially for each step, I use the CGS-algorithm (iparm(4) = 91). Now, the following problem occurs: the solver directly (iparm(4) = 0) solves the system as a first step and the following error is produced:

*** error PARDISO: iterative refinement  contraction rate is greater than 0.9, interrupt

Unfortunately, both the documentation and the forum/Intel website do not provide any further information about this error. In the following you can see the iparm parameters I have used. They almost correspond to default values, however, iparm(4) is set to perform CGS steps.

call mkl_set_dynamic ( 0 )

 iparm(1)  = 1  ! do not use default values
 iparm(2)  = 3  ! fill-in reordering from METIS
 iparm(3)  = 1  ! Number of processors
 iparm(4)  = 91 ! iterative-direct algorithm
 iparm(8)  = 10 ! Max. number of iterative refinement steps on entry
 iparm(10) = 13 ! perturb the pivot elements with 1E-13
 iparm(11) = 1  ! use nonsymmetric permutation and scaling MPS
 iparm(13) = 1  ! Improved accuracy using nonsymmetric weighted matching
 iparm(21) = 1  ! Apply 1x1 diagonal pivoting during the factorization process
 iparm(24) = 1  ! Parallel factorization control
 iparm(25) = 1  ! Parallel forward/backward solve control
 iparm(27) = 1  ! checks whether column indices are sorted in increasing order within each row

I would like to share a code that you can reproduce the problem, however, the matrices are 1.2 GB large. As I have written, the problem only occurs for large systems. In the following, you can see an extract of my code that performs the solution of the system.

ik = 0
cgsxcounter = 0

do

ik = ik + 1
if (ik.ge.3) then
write(*,*) 'no solution found'
stop
end if

if (cgsxcounter.eq.0) then

iparm(4) = 0

!Release all memory
phase = -1
call pardiso_64 (pt, maxfct, mnum, mtype, phase, DimensionL, ddum, idum, idum, idum, nrhs, iparm, msglvl, ddum, ddum, error)

!Reordering and Symbolic Factorization, This step also allocates all memory that is necessary for the factorization
phase = 11 ! only reordering and symbolic factorization
call pardiso_64 (pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, ddum, ddum, error)
if (error.ne.0) write(*,*) 'Reordering and Symbolic Factorization wrong: ', error

 cgsxcounter=1

end if

!Factorization.
phase = 22 ! only factorization
call pardiso_64 (pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, ddum, ddum, error)
if (error.ne.0) stop

!Back substitution and iterative refinement
phase = 33 ! only substitution
call pardiso_64 (pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, rhodot, rho, error)

if (iparm(20).lt.0) then
  write(*,*) 'Try again'
  cgsxcounter = 0
else
 exit
end if

end do

iparm(4) = 91

 

 

It would be nice if you could provide more information about the problem. What can cause such a error? What can be done in order to avoid it?

Thanks in advance,

Horst

 

 

0 Kudos
8 Replies
Horst
Beginner
932 Views

As an add-on: I know what the error means, I just want to know how it can occur and if there is a proper workaround. 

Another thing bothers me. In the set of parameters, where intel pardiso seems to work properly, my jobs are sometimes aborted abnormally with the following error:

/var/spool/torque/mom_priv/jobs/2609619.adm01.SC: line 29: 11952 Bus error ./a.out
traps: a.out[7329] trap stack segment ip:16c5b50 sp:7fff237c04b0 error:0 in a.out[400000+4708000]

Maybe it indicates the real issue. I have never seen such an error and after googling I could not find something similar. The thing is that I cannot reproduce the issue. Sometimes it occurs sometimes not. Firstly, I thought that it is a hardware issue, however, I can reproduce the error on different nodes.

I am testing right now if the compiler- or mkl version solves the problem. Right now I am using ifort 2017 and mkl 2017. Unfortunately, I cannot test more recent versions. I'll keep you up to date.

Thanks in advance,

Horst

0 Kudos
Kirill_V_Intel
Employee
932 Views

Hello Horst,

First, do you see the error (contraction rate) when you're calling phase 33 after you called 22 with iparm(4) = 0? From the code snippet you provided it looks like this. Then, CGS has nothing to do with it, right? Also, I see the call with phase -1 at the start of the loop in the code snippet, is it really what is happening? You shouldn't do that and this could be the root cause for the segfaults you mentioned in your second post.

Second, we don't recommend in general to use the CGS algorithm. If you want to change the matrix slightly and then solve the system iteratively without factorization, I suggest that you simply skip the phase 22 and call phase 33 with iterative refinement turned on (iparm(8)) to solve the changed systems. In this case, the existing factorization will be used. With that, you should not free the memory until the last call to pardiso (otherwise factorization data will be deallocated).

Another thing to keep in mind is that even if the matrix changes only slightly, the scaling and matching might depend on that and this can affect the behavior (convergence) of CGS significantly.

Hope this helps.

Best,
Kirill

0 Kudos
Horst
Beginner
932 Views

Hello Kirill,

thank you for your response. Yes, indeed, It has nothing to do with CGS. Just wanted to describe the full problem. Thank you for taking a look at this. Yes, I use phase=-1, since I thought that it releases all internal memory for all matrices before I continue. However, I will delete this from the code and test if it works.

Ok, thank you for this comment. I will also try this and let you know if this helps.

Best,

Horst

0 Kudos
Horst
Beginner
932 Views

Dear Kirill,

I have tried to change iparm(4)=0 and modified the code as in the following

ik = 0
do

ik = ik + 1
if (ik.ge.3) then
write(*,*) 'solution could not be found'
stop
end if

if (cgsxcounter.eq.0) then

!Reordering and Symbolic Factorization
phase = 11 ! only reordering and symbolic factorization
call pardiso_64 (pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, ddum, ddum, error)

!Factorization.
phase = 22 ! only factorization
call pardiso_64 (pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, ddum, ddum, error)

 cgsxcounter=1

end if

!Back substitution and iterative refinement
phase = 33 ! only substitution
call pardiso_64 (pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, rhodot, rho, error)

end do

As far as I can see the segfault error disappeared. However, the problem with the iterative refinement still occurs. Does this indicate that there is no proper solution of the system? Or what exactly does it indicate? If this error occurs, the error flag isn't set to unequal 0. Why does this happen?

Best,

Horst

0 Kudos
Kirill_V_Intel
Employee
932 Views

Dear Horst,

I would expect that the matrix is ill-conditioned. Iterative refinement steps (two steps which are automatically done) are performed when the solution found by the solver has a large residual. Since it is a direct solver, this can happen when the matrix has large condition number or when many pivots has been perturbed during the factorization. Unfortunately, it is hard to say more without looking into the actual workflow for your matrices.

As an option, I suggest you try the two-level factorization, iparm(24) = 1, this would use a different algorithm for the factorization and might solve the problem you observe.
In case it doesn't help, is it possible that you somehow provide us your matrices?

Another thing is: could you also check the iparm after the factorization phase (22), in particular, iparm(14) which reports the number of perturbed pivots? And what is the matrix type in your case? Also, was there a specific reason to use the diagonal pivoting instead of the default option (I suggest setting iparm(21)=0)?

EDIT: When switching two-level factorization on, please turn off matching and scaling (set iparm(11) and iparm(13) = 0).

EDIT2: Haven't noticed that you're already using the 2-level factorization. Though to be sure that it really uses the 2-level algorithm, still turn the matching and scaling off and retry the code.

Best,
Kirill

0 Kudos
Horst
Beginner
932 Views

Dear Kirill,

thank you very much for your suggestions. Currently, I am trying to run the jobs and I will let you know if this solves the problem. 

Best,

Horst

0 Kudos
Horst
Beginner
932 Views

Dear Kirill,

I have tried out everything you suggested and it helped nothing. I have set iparm(24) =1, iparm(11)=0 and iparm(13)=0. I have also taken a look at iparm(14)=57 and iparm(20)=0. So I suggest that the matrix is highly ill conditioned right? Unfortunately, as I said, the matrices are quite large and I can only upload files that are up to 250MB. As long as there isn't another opportunity to send you the files, I guess that is all you can do for me.

Thanks in advance,

Horst

0 Kudos
Gennady_F_Intel
Moderator
932 Views

you can submit the ticket against Intel MKL to the Intel Online Service Center where you may upload such big files.

0 Kudos
Reply