Community
cancel
Showing results for 
Search instead for 
Did you mean: 
eihoppe
Beginner
128 Views

Performance of PARDISO versus LAPACK band solver

Hello all,

I have implemented the PARDISO solver into our atmospheric simulation model in hopes of improving performance of solving small (<150-200 rows) banded matrices. However, when compared to using LAPACK's band solver (sgbsv) to solve the same matrices, the (wallclock) runtime of the model increases by approximately 30-40%, rather than the speed-up we wanted to obtain.

When we used PARDISO in parallel on an 8-core computer, the wallclock runtime improves, but is still either approximately tied with LAPACK, or slower than LAPACK by a small margin of 0-5%. However, we would prefer to run PARDISO sequentially, as our model is frequently implemented in larger host models that run many instances of our model in parallel.

Are our matrices too small to obtain a performance improvement over LAPACK's band solver? Alternately, is it possible that LAPACK's band solver routines are far more optimized for the case of a band matrix than PARDISO's more general approach?

Essentially, my question boils down to: am I going to be able to find a way to get PARDISO to run faster than the LAPACK solver for our matrices, or are the matrices too small/LAPACK's method of band solving is too optimized to expect any runtime improvements, especially when run sequentially?

We are currently using MKL version 10.2, and PARDISO is called in single-precision mode.

Any insight into the matter is greatly appreciated.

Regards,
~Eric Hoppe
University of Wisconsin - Milwaukee
0 Kudos
2 Replies
128 Views

Hello Eric,

The performance results you reported are expected. Matrices with 150-200 equations are too small for PARDISO to demonstrate great performance. Moreover, you're correct that PARDISO is a general-purpose solver (not knowing any specific information about the matrix structure). Using specific banded LAPACK function (gbsv) is preferable in this case (moreover, you will not save any memory using PARDISO's CSR format as far as banded format used in LAPACK is optimal for your case).
Best regards,
Konstantin
eihoppe
Beginner
128 Views

Hello again,

I have a slight update on my situation with the PARDISO solver.

I have implemented skipping the factorization step after the first simulation timestep, as the structure of the matrices stays consistent throughout the duration of the simulation. This has provided a significant improvement to the runtime. However, PARDISO still remains slower than LAPACK when run sequentially (the increase in runtime for PARDISO is now on the order of 10-15% rather than 30-40%).

Given that skipping the factorization step is the only major time-saver I found when searching the forums, I'm lead to believe that our matrices aren't large enough to allow PARDISO to be faster than the LAPACK band solver.

However, if anyone else has any ideas of other techniques I can try to speed up PARDISO's solves further, it would be much appreciated.

EDIT: Ah, it is as I suspected, then. Thanks much for your reply, Konstantin.

Regards,
~Eric Hoppe
University of Wisconsin - Milwaukee
Reply