Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

DPPTRI is not parallelized

heiga
Beginner
690 Views
I had used DPOTRI to inverse a positive definite symmertric matrix. Recently I had to reduce the memory usage, so I started using packed storage for this matrix and DPPTRI to invert it. When I used DPOTRI, it was automatically parallelized, however, DPPTRI was not parallelized. Is it a bug of MKL? Or DPPTRI does not support parallelization?

P.S. I'm using MKL 10.0.2 and ICC 10.1.015 on a Core 2 Quad machine (em64t mode, kernel 2.6.18). OMP_NUM_THREADS is set to 4.
0 Kudos
1 Solution
Todd_R_Intel
Employee
690 Views
Heiga,

Several LAPACK functions in MKL are threaded. The Linux User Guide lists the following routines:

*GETRF, *POTRF, *GBTRF, *GEQRF, *ORMQR, *STEQR, *BDSQR, *SPTRF, *SPTRS, *HPTRF, *HPTRS, *PPTRF, *PPTRS

Note that parallel operation of these routines will lead to parallel operation of other driver routines which call these. But DTPTRI() and DPPTRI() don't seem to benefit in this case.

Regards,
Todd

View solution in original post

0 Kudos
5 Replies
heiga
Beginner
690 Views
P.S. DTPTRI is also not parallelized. My compile option is "-axPT -ip -O3 -openmp -parallel -align -i-dynamic".
0 Kudos
TimP
Honored Contributor III
690 Views
As a general rule, level 2 BLAS functions aren't parallelized in MKL, as parallel scaling would be limited. Your compile options don't affect the MKL functions, although either -openmp or -parallel add -lguide to the link command, so as to satisfy that requirement of threaded MKL.
You could try the public source of DTPMV et al. with -parallel (use at least 10.x compilers for automatic "schedule guided," or with explicit openmp guided, to test whether you get good speedup. It looks as if there may be a chance of 50% speedup.
I don't know if there is enough demand for this function for anyone to have tested parallelization here. If you show that you can improve on the performance of these MKL functions, you might then file a request for such parallelization in MKL, on your premier.intel.com account. As processors evolve so that the benefit of threading becomes more pervasive, more such parallelization may be worth considering.
0 Kudos
heiga
Beginner
690 Views
> level 2 BLAS functions aren't parallelized in MKL

I think both DPPTRI() and DTPTRI() are the LAPACK functions rather than the BLAS ones. Aren't these routines parallelized?
0 Kudos
TimP
Honored Contributor III
690 Views
Most of the work in these lapack functions is performed in level 2 BLAS functions called by them. As it is not clearly visible (at least not to me) whether it is safe to thread DPPTRI or DTPTRI, it would necessary to run Intel thread checker on a representative sample of test cases to see whether any problems appear.
I don't know whether such an effort to parallelize lapack functions has been undertaken. It's certainly conceivable that it could be an interesting project, to take advantage of increasingly effective hardware support for OpenMP.
Supposing that you supplied all of the relevant source code for those functions and the BLAS functions called by them, it would be a tall order for ifort -parallel to parallelize at the level of DPPTRI or DTPTRI, but perhaps feasible to auto-parallelize the inner BLAS functions.
Full parallelization of lapack still looks like a research project. It looks like most of the effort has gone into ScaLAPACK, which of course should give more payback when it is successful. Did you look to see if any MKL ScaLAPACK functions are interesting for your application?
0 Kudos
Todd_R_Intel
Employee
691 Views
Heiga,

Several LAPACK functions in MKL are threaded. The Linux User Guide lists the following routines:

*GETRF, *POTRF, *GBTRF, *GEQRF, *ORMQR, *STEQR, *BDSQR, *SPTRF, *SPTRS, *HPTRF, *HPTRS, *PPTRF, *PPTRS

Note that parallel operation of these routines will lead to parallel operation of other driver routines which call these. But DTPTRI() and DPPTRI() don't seem to benefit in this case.

Regards,
Todd

0 Kudos
Reply