- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you would use the search function, you would see that MKL supports tridiagonal BLAS operations such as ?gtrfs ?gtsv
If BLAS is overkill for your case, you may not be interested in MKL.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you would use the search function, you would see that MKL supports tridiagonal BLAS operations such as ?gtrfs ?gtsv
If BLAS is overkill for your case, you may not be interested in MKL.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ozarbit,
Your testcase is not 100% fair. You are measuring the first (and the only)call of dgtsv. MKL does a lot of additional initializing work during the first call, while NR implementation does not have specific initialing activities. Further calls are much faster.
So, if you make some some call of dgtsv (just for initialing purpose, effect of the first time initializationshould be neglible in real-life app) and measure performance on the second call of dgtsv, then the difference will not be so huge. You will still see around 20% ofperformance difference. This is because of usage ofGaussian elimination with partial pivoting to cover bad cases in MKL routine, while NR gives wrong answer in such cases.
We know about this performance gap and investigate possibilities to cover it.
By the way, compiler does not vectorize NR codes, because of explicite data dependencies.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you use tridiagonal on problems which don't satisfy diagonal dominance properties, you may need the partial pivoting; otherwise BLAS is not a competitive solution, nor was it intended to be.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ozarbit,
Your testcase is not 100% fair. You are measuring the first (and the only)call of dgtsv. MKL does a lot of additional initializing work during the first call, while NR implementation does not have specific initialing activities. Further calls are much faster.
So, if you make some some call of dgtsv (just for initialing purpose, effect of the first time initializationshould be neglible in real-life app) and measure performance on the second call of dgtsv, then the difference will not be so huge. You will still see around 20% ofperformance difference. This is because of usage ofGaussian elimination with partial pivoting to cover bad cases in MKL routine, while NR gives wrong answer in such cases.
We know about this performance gap and investigate possibilities to cover it.
By the way, compiler does not vectorize NR codes, because of explicite data dependencies.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm sorry, I misread your statement about the size of the problem. Your size should be large enough to make the library call overhead insignificant once the initializations have completed. The cost of checking pivots still may be significant for such a case, and once the pivoting has changed the order, the run-time cost could be several times as large.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page