Tridiagonal AX=B

ozarbit · ‎04-08-2009

Hello, I have a known tridiagonal matrix n*n A and a known vector B.

The usual tridiagonal algorithm to solve for X is of order(n)

Is it available somewhere in MKL?

regards,

TimP · ‎04-08-2009

Hello, I have a known tridiagonal matrix n*n A and a known vector B.
The usual tridiagonal algorithm to solve for X is of order(n)

Is it available somewhere in MKL?

Library implementations of basic tridiagonal solver usually aren't efficient, and don't maintain the ease of use you have in your own source code.
If you would use the search function, you would see that MKL supports tridiagonal BLAS operations such as ?gtrfs ?gtsv
If BLAS is overkill for your case, you may not be interested in MKL.

ozarbit · ‎04-08-2009

Quoting - tim18

Library implementations of basic tridiagonal solver usually aren't efficient, and don't maintain the ease of use you have in your own source code.
If you would use the search function, you would see that MKL supports tridiagonal BLAS operations such as ?gtrfs ?gtsv
If BLAS is overkill for your case, you may not be interested in MKL.

That's what I thought, I was just checking.

Besides my matrix is really smaller (order of n=20), even though it may be called 1000 times every 1s.

I will just go for e.g. with tridiag() function presented in numerical recipes.

thank you,

ozarbit · ‎04-27-2009

As a matter of a check, I've tried the code posted in

http://codepad.org/rM6MHqhT

on a SSSE3 Q6700 quad 2.66Ghz with 2 L1 32kb and 2 L2 4MB 4GB DDR2

on a vs2005+intel11 platform

The numerical recipes version takes 1000 ticks while the dgtsv takes 70000 ticks... the .exe was ran 10 times.

Probably because dgtsv uses a LU decomposition first, while the NR code doesn't and is of order n.

Also, the NR code was vectorized by the intel compiler

Ilya_B_Intel · ‎04-27-2009

ozarbit,

Your testcase is not 100% fair. You are measuring the first (and the only)call of dgtsv. MKL does a lot of additional initializing work during the first call, while NR implementation does not have specific initialing activities. Further calls are much faster.

So, if you make some some call of dgtsv (just for initialing purpose, effect of the first time initializationshould be neglible in real-life app) and measure performance on the second call of dgtsv, then the difference will not be so huge. You will still see around 20% ofperformance difference. This is because of usage ofGaussian elimination with partial pivoting to cover bad cases in MKL routine, while NR gives wrong answer in such cases.

We know about this performance gap and investigate possibilities to cover it.

By the way, compiler does not vectorize NR codes, because of explicite data dependencies.

TimP · ‎04-27-2009

An external function call will necessarily involve more overhead than computation time for tridiagonal operations on short vectors.
If you use tridiagonal on problems which don't satisfy diagonal dominance properties, you may need the partial pivoting; otherwise BLAS is not a competitive solution, nor was it intended to be.

ozarbit · ‎04-27-2009

Quoting - Ilya Burylov (Intel)

ozarbit,

Your testcase is not 100% fair. You are measuring the first (and the only)call of dgtsv. MKL does a lot of additional initializing work during the first call, while NR implementation does not have specific initialing activities. Further calls are much faster.

So, if you make some some call of dgtsv (just for initialing purpose, effect of the first time initializationshould be neglible in real-life app) and measure performance on the second call of dgtsv, then the difference will not be so huge. You will still see around 20% ofperformance difference. This is because of usage ofGaussian elimination with partial pivoting to cover bad cases in MKL routine, while NR gives wrong answer in such cases.

We know about this performance gap and investigate possibilities to cover it.

By the way, compiler does not vectorize NR codes, because of explicite data dependencies.

tim18, ilya, thank you for your answers.

inhttp://codepad.org/cHgP2JAa

I called dgtsv many times before the measure.

Now it's only 3000 ticks, correct.

And yes, I misread the vectorization message.

Btw, is the partial pivoting required as soon as the matrix is not diagonally dominant (in my case, clearly, it is not diagonally dominant, however NR code works fine), or in somecases when it isn't dominant, it may run into trouble.

tim18, true, it is an external library call... I just wanted an idea of the difference..

I have an order of 16384 for the matrix.... What sort of orders should the additional tests be negligible?

regards,

TimP · ‎04-27-2009

If the matrix isn't diagonally dominant, or positive definite, failing to implement a pivoting strategy may (or may not) encounter excessively small pivots. I'm sure cases could be made up where it isn't diagonal dominant, but the partial pivoting doesn't change the result. On such large problems, if the behavior isn't well known, the point about checking for pivoting requirements is well taken.
I'm sorry, I misread your statement about the size of the problem. Your size should be large enough to make the library call overhead insignificant once the initializations have completed. The cost of checking pivots still may be significant for such a case, and once the pivoting has changed the order, the run-time cost could be several times as large.