- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

I have an application where I use these primitives heavily to generate Givens rotations. In my experiments, using an AVX-enabled environment and using Intel MKL 11 beta update 2 I have observed the following points below. I was invoking these primitives hoping that MKL was doing something really smart and get a special speed up over a plain sqrt version, why is not that so? is there any documentation on number of flops or better cycles needed for these routines?

- cblas_drotg leads to non-convergence of my algorithm (too many round errors) I haven't tried setting CBWR to COMPATIBLE though .. need to try that.
- dlartg is slow
- dlartgp is faster than dlartg I was actually puzzled by this, since I expected that dlartgp gives more guarantees namely positiveness of the diagonal elements.
- my own plain sqrt version (see below) outperforms all above and has no errors and also gives positiveness of the diagonal elements (needed for updating a Cholesky decomposition => need positiveness of the trace i.e. eigenvalues to compute log of the trace).

my own:

[cpp]inline void genrot_sqrt(double *x, double *y, double *c, double *s, double *d) { double h = sqrt((*x)*(*x) + (*y)*(*y)); *c = (*x) / h; *s = (*y) / h; }[/cpp]

TIA,

Best regards,

Giovanni Azua

Link Copied

0 Replies

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page