Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library
- DGEQRF performance on block p-cyclic matrix

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Jeffrey_P_2

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-13-2012
06:17 PM

67 Views

DGEQRF performance on block p-cyclic matrix

Hello, first time poster here.

My work includes finding the QR decomposition of a p-cyclic matrix M. It is square, L blocks high, L blocks wide, and each block is N by N. Each block column has two nonzero blocks on and directly below the diagonal, like so

M11 M1L

M21 M22

M32 M33

... ...

MLL-1 MLL

I am using a Block Orthogonal Factorization method to find the QR but want to compare it to DGEQRF in terms of time and speed. The code is attached.

Now my problem is that although BSOF always has a better timing than DGEQRF, as L increases and the amount of zeroes in M increases, DGEQRF gets much much faster in terms of GFlop/s. Attached are results from a test where the size of M is constant at 10,000 and L grows as N decreases. If DGEQRF were unaffected by structure, its speed and execution time would be the same for each test, but it is not. So my question is why is DGEQRF going so fast? My theory is that there is some heuristic which skips some amount of flops when it sees some formation of zeroes, making my flop count incorrect and leading to a bad GFlop rate. But I have no idea how/where this is being done.

I should also note that I have run benchmark testing with DGEMM and DGEQRF on a fully random matrix and get normal speeds of ~140GFlops and ~120GFlops respectively. So my DGEQRF speeds on M of ~1000GFlops must be off. Those results are attached as well.

Thanks!

1 Solution

Alexander_K_Intel3

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-14-2012
03:33 AM

67 Views

Hi Jeffrey,

You are right, DGEQRF (and all other LQ/QL/RQ including DORGQR, DORMQR and similar) indeed has an optimization to skip zeros. The optimization is actually derived from NETLIB LAPACK code. If you'd like to learn details you could explore code of DGEQRF and DLARFB which is used underneath of DGEQRF. MKL implementation differs but basics of skipping zeroes are similar. There is no good way to count exact flops for that case, since according to strategy of reblocking matrix into smaller ones not all zeroes are skipped.

W.B.R.,

Alexander

Link Copied

1 Reply

Alexander_K_Intel3

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-14-2012
03:33 AM

68 Views

Hi Jeffrey,

You are right, DGEQRF (and all other LQ/QL/RQ including DORGQR, DORMQR and similar) indeed has an optimization to skip zeros. The optimization is actually derived from NETLIB LAPACK code. If you'd like to learn details you could explore code of DGEQRF and DLARFB which is used underneath of DGEQRF. MKL implementation differs but basics of skipping zeroes are similar. There is no good way to count exact flops for that case, since according to strategy of reblocking matrix into smaller ones not all zeroes are skipped.

W.B.R.,

Alexander

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.