Fast solving of block-tridiagonal matrices

Mikhail_Matrosov · ‎09-07-2011

Hello,

I'm trying to solve SLE of form Ax=b with matrix A being large, sparse, symmetric, positive defined and block-tridiagonal. With exactly the same structure as one arising from five-point finite difference approximation, used for solving Poisson equation. I.e.:

D1 L2
L2 D2 L3
L3 D3 L4
L4 D4 L5 ...

where Di is symmetric tridiagonal and Li is diagonal. When solving Poisson equation, Li = -I with I denoting unit matrix and Di is of form

4 -1
-1 4 -1
-1 4 -1 ...

I tried direct solver (DSS) and iterative solver (PCG) with different preconditioners. I was able to compute solution, but I'm looking for possible speed-ups. I've solved Poisson eqation of the same size and solution was found much faster. E.g., 0.1 seconds against 15 seconds for PCG and 325 seconds for DSS.

The question is, are there some special routines for dealing with matrices of such a structure in MKL? I understand, that Poisson-case is optimised for matrix with given values, while I have arbitrary values in A. But still, are there some alternatives for PCG?

Alexander_K_Intel2 · ‎09-07-2011

Hi Mikhail,

The DSS is direct solver, PCG is iterative and Poisson solver is direct solver for matrix with form you wrote in topic above. It's clear that for some matrix each of these 3 solvers spend different time to compute solution and this computational depend on condition number, matrix structure, matrix value and etc... It difficult to say for general matrix which solver is best for it, but if you provide to us matrix you want to solve we could recommended you one of these solvers. For matrix from you topic the best solver is Poisson solver.

With best regards,

Alexander Kalinkin

Mikhail_Matrosov · ‎09-07-2011

My matrix has the same structure, as one used for Poisson solver, but with arbitrary elements. However, it is symmetric and positive definite with diagonal dominance. All diagonal elements are (strictly) positive and all other are (non-strictly) negative. Here is the small example:

12.7972 -9.4535 0 -2.3438 0 0 0 0 0 0 0 0
-9.4535 39.8238 -26.2896 0 -3.0808 0 0 0 0 0 0 0
0 -26.2896 31.1540 0 0 -3.8644 0 0 0 0 0 0
-2.3438 0 0 103.7057 -33.4790 0 -66.8830 0 0 0 0 0
0 -3.0808 0 -33.4790 66.0553 -11.2585 0 -17.2370 0 0 0 0
0 0 -3.8644 0 -11.2585 93.8882 0 0 -77.7652 0 0 0
0 0 0 -66.8830 0 0 87.0312 -13.1618 0 -5.9864 0 0
0 0 0 0 -17.2370 0 -13.1618 41.9111 -5.2094 0 -5.3030 0
0 0 0 0 0 -77.7652 0 -5.2094 88.3185 0 0 -4.3439
0 0 0 0 0 0 -5.9864 0 0 24.3179 -17.3315 0
0 0 0 0 0 0 0 -5.3030 0 -17.3315 27.9133 -4.2788
0 0 0 0 0 0 0 0 -4.3439 0 -4.2788 9.6227

For my experiments, PCG with incomplete Cholesky factorization as preconditioner works much faster than DSS. Is there a way to use Poisson solver for this matrix? As I understand, values for matrix for Poisson solver are predefined (4, -1, 0, -1, etc...), while I have arbitrary values (with mentioned constraints).

IDZ_A_Intel · ‎09-08-2011

Hi,

Have you try PCG with poisson solver as preconditioner?

With best regards,

Alexander Kalinkin

Mikhail_Matrosov · ‎09-08-2011

Sorry, I cannot understand how to use Poisson solver for my matrix. As I understand, one can provide only right-hand side for Poisson solver, i.e. for given equation

f(x, y) stands for b in SLE of the form Ax=b. And u stands for x. However, coefficients of matrix A are determined from finite approximation of second derivative. So how it is possible to provide my own coefficients for A?..

Alexander_K_Intel2 · ‎09-08-2011

Hi,

Maybe I wrote a bit unclear... Any iterative process for solving Ax=f could be representing in next form:

B(x^(k+1)-x^k)=tau_k*(Ax^k-f), or

x^(k+1)-x^k=B^(-1)*(tau_k*(Ax^k-f)) where B is preconditioner. So you could use PCG with Poisson solver as preconditioner instead of PCG with incomplete Cholesky preconditioner (when you want to multiply preconditioner on some vector just call poisson solver with this array)

With best regards,

Alexander Kalinkin

Mikhail_Matrosov · ‎09-08-2011

So, your guess is to use fixed derivative approximation matrix instead of incomplete Cholesky factorization? I.e. to let preconditioner B be equal to

4 -1 0 0 -1 0 0 0 0 0 0 0 0 0 0 0
-1 4 -1 0 0 -1 0 0 0 0 0 0 0 0 0 0
0 -1 4 -1 0 0 -1 0 0 0 0 0 0 0 0 0
0 0 -1 4 0 0 0 -1 0 0 0 0 0 0 0 0
-1 0 0 0 4 -1 0 0 -1 0 0 0 0 0 0 0
0 -1 0 0 -1 4 -1 0 0 -1 0 0 0 0 0 0
0 0 -1 0 0 -1 4 -1 0 0 -1 0 0 0 0 0
0 0 0 -1 0 0 -1 4 0 0 0 -1 0 0 0 0
0 0 0 0 -1 0 0 0 4 -1 0 0 -1 0 0 0
0 0 0 0 0 -1 0 0 -1 4 -1 0 0 -1 0 0
0 0 0 0 0 0 -1 0 0 -1 4 -1 0 0 -1 0
0 0 0 0 0 0 0 -1 0 0 -1 4 0 0 0 -1
0 0 0 0 0 0 0 0 -1 0 0 0 4 -1 0 0
0 0 0 0 0 0 0 0 0 -1 0 0 -1 4 -1 0
0 0 0 0 0 0 0 0 0 0 -1 0 0 -1 4 -1
0 0 0 0 0 0 0 0 0 0 0 -1 0 0 -1 4

Am I right?

Alexander_K_Intel2 · ‎09-08-2011

Yes, you are correct. If your stiffness matrix comes from some differential equation then such preconditioner could be good enough.

With best regards,

Alexander Kalinkin

Mikhail_Matrosov · ‎09-09-2011

Sad, but PCG does not converge at all with this preconditioner.

So, back to my question, are there some special routines for dealing with matrices of such a structure in MKL?

Alexander_K_Intel2 · ‎09-09-2011

First of all answer on your question: MKL support direct solver for sparse matrix (PARDISO functionality), iterative solver for sparse matrix (ISS functionality) and lapack solvers for dense matrix. You could use any of these solvers for your matrix.

Back to iss solver with Poisson precondition: what kind of boundary condition have you used for your solver? It's seems that for your matrix there is kernel vector (all elements of this vector are equal 1). So probably it's better to use Neumann boundary condition... I will try to check it later.

With best regards,

Alexander Kalinkin

Mikhail_Matrosov · ‎09-09-2011

I thought PARDISO should be used on shared-memory massive-parallel architectures. Is it suited well for usual desktop calculations? I used DSS as a direct solver, but it is to slow. Mush slower than even MATLAB's mldivide. It consumes less memory though.

I checked convergence of Poisson preconditioner in MATLAB, I simply provided matrix mentioned above as preconditioner to MATLAB's PCG routine. I'm always testing all preconditioners in MATLAB first to swiftly estimate convergence rate - it works properly. As I understand, this is the same as to run Poisson solver with Dirichlet boundary conditions. Don't think Neumann will change anything. Moreover, solution for Neumann boundary conditions is determined up to a constant value - it's not acceptable in my case.

styc · ‎09-11-2011

I wonder if you have tried a block Jacobi preconditioner M = diag(D₁, D₂, ) in PCG. For diagonally-dominant SPD problems this should be very effective. Also, have you tried using no preconditioner at all? The cost of applying the preconditioner might just dominate the convergence rate benefit it brings about. Furthermore, how did you implement matrix-vector multiplication in PCG? Given the special structure of your matrix, compared to a nave matvec in CSR or CSC format, a specially-crafted routine will help extract a lot of extra juice.

Mikhail_Matrosov · ‎09-11-2011

I've tried Jacobi preconditioner as well as no preconditioner. There is no convergence at all with no preconditioner, and Cholesky convergence is much faster than Jacobi, so Cholesky is the favorite.

The most narrow place in current PCG is applying preconditioner, not matrix-vector multiplication (time ratio is approx 6:1). To apply incomplete Cholesky preconditioner, two triangular systems are solved using MKL sparse BLAS routine mkl_ddiatrsv. It fits perfectly since Cholesky lower triangle has exactly three non-zero diagonals. Maybe applying the LDL^T factorization will be a bit faster than current LL^T (since L will be unit diagonal), but it is essentially the same.

Matrix-vector multiplication uses CSR and might be boosted using the same diagonal representation, but it will not benefit much on the overall time, since applying the preconditioner is the most expensive operation.

One more thing: applying Jacobi preconditioner is much cheaper, but convergence is too low, so Cholesky is now the best option.