- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is there a list of LAPACK routines that are threaded?
For example, we are inverting single and double complex using getrf and getri and not seeing any scaling on multi-core.
MKL 9.0
For example, we are inverting single and double complex using getrf and getri and not seeing any scaling on multi-core.
MKL 9.0
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No doubt, you could examine the objects you are interested in for libguide calls, using nm, dumpbin, or the like. MKL 9 sometimes seems to have higher size thresholds for threading than previous versions, possibly to allow for the greater "distance" between CPUs on split bus platforms. Threading, if any, may be done in lower level functions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What motivates the question is the MKL release notes which often contain statements such as in the 9.1 release notes
"Performance of ?STEDC improved by xxx and parallelized with OpenMP ...."
Clearly , based on customer feedback, certain LAPACK routines are targeted for parallelization - others are not.
"Performance of ?STEDC improved by xxx and parallelized with OpenMP ...."
Clearly , based on customer feedback, certain LAPACK routines are targeted for parallelization - others are not.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In the user notes - mkluse.htm in MKL 9.0 - is a list of the threaded routines of MKL. I have copied that section of the document here:
Intel MKL is threaded in a number of places: sparse solver, LAPACK (*GETRF, *POTRF, *GBTRF, *GEQRF, *ORMQR, *STEQR, *BDSQR
routines), all Level 3 BLAS, Sparse BLAS matrix-vector and matrix-matrix multiply routines for the compressed sparse row and diagonal formats, and all DFTs (except 1D transformations when DFTI_NUMBER_OF_TRANSFORMS=1
and sizes are not power-of-two). The library uses OpenMP* threading software.
Since you can see the GETRF is threaded, it would be interesting to know more about the issue. How many equations are you solving? Did you set OMP_NUM_THREADS in the environment?
If you are on a Linux system you need to enter something like "export OMP_NUM_THREADS=
Bruce

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page