Fast Small Dense Matrix Solver

ScottBoyce · ‎03-06-2013

I have a general square dense matrix A (not symmetric) which is formed by A=P^TBP where B was in a compressed storage scheme and P is a rectangular matrix. The size of A ranges from 10x10 to 500x500, where B can be 150,000x150,000 and is sparse.

What would be the best way to solve for x given b (system of linear equations) that result from

Ax=b => x=A^-1b

Right now I am just using LAPACK DGESV that is linked to MKL (so assume I am using their solver). Is there any benifit to going to a interative solver or any recomendations as to how to best solve this system of equations as fast as possible.

Thanks for any comments

SergeyKostrov · ‎03-07-2013

Scott, I have a generic question. >>...The size of A ... 500x500, where B can be 150,000x150,000... How long does it take to solve it on your computer? Thanks in advance. Note: I see that there are two threads already, one is in MKL forum and another is in Intel Visual Fortran forum...

ScottBoyce · ‎03-07-2013

Someone had suggested after I posted on Intel Fortrnal that I post my question on here since I am using the MKL library to solve the LAPACK routines.

It only takes a few seconds, but for each solution of A creates a new version of B and which is then matrix multiplied by P to build a new version of A which then needs a new solution. I like to speed up, even by a fraction of a second, solving the system of equations. There also is of course a slow down do to the A=P^TBP, but I am unsure if there is anything faster than using DGEMM.

It is a particular program where time is important, even for a few extra milliseconds.

SergeyKostrov · ‎03-07-2013

Thanks for the details! >>...It only takes a few seconds... Is it for B when it has dimensions 150000x150000? Note 1: In case of a single-precision 84GB of memory is needed for B Note 2: In case of a double-precision 168GB of memory is needed for B PS: Of course it is possible if a Cray-like supercomputer is used...

ScottBoyce · ‎03-07-2013

B is formed as a result of finite differences, so its stored in a band like structure/vector to minimize storage then is transformed from the pre and post multiplication of P. Actually what I will post another time is how is it best to multiply out P^TBP