Showing results for

- Intel Community
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
- Exploit symmetry in Cholesky

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Georgios_S_

New Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-17-2015
04:20 AM

90 Views

I am performing a Cholesky decomposition, like in this thread. Post #4 contains a minor example, that looks like as my code. The minor example of Ying is based on this example. As one can see, the matrix is read by the master node and then gets distributed (every element of the matrix). Then every element of the matrix is being gathered back to the master node.

However, the example is general. In my application, the matrix is * symmetric*, so I am wondering if I could do better, than just sending all the matrix. That is, to send only the lower part of tha matrix, perform the Cholesky decomposition and then receive only the lower part of the matrix. Is this possible?

Of course, this has to do on how pdpotrf() expects it's input. My idea is that if we could limit distribution only in the lower part of the matrix, then we could achieve better performace.

PS - a negative answer will be also flagged as the best reply, i.e. it is accepted.

Link Copied

Accepted Solutions

Alexander_K_Intel3

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-20-2015
05:53 AM

90 Views

Hi George,

It's enough to fill only lower triangular for the Cholesky if UPLO=L. Or upper for UPLO=U.

So you may modify the example to run scatter/gather loops only on block below diagonal (c<=r).

Also there are Cdtrsd2d() and Cdtrrv2d() functions which transfer only upper or lower triangular of matrix, which you may use to transfer diagonal blocks (c==r).

Best regards,

Alexander

11 Replies

mecej4

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-17-2015
06:11 AM

90 Views

Georgios_S_

New Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-17-2015
06:31 AM

90 Views

Hi mecej4,

yes, that's correct. I just highlighted the symmetry property, because that's the one I want to exploit. I am interested in the Cholesky solver of MKL's, which is pdpotrf(). If there are more, distributed versions, please let me know.

Can I input the triangle of the matrix only in MKL (instead of the whole matrix)?

Thanks,

George

Alexander_K_Intel3

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-20-2015
05:53 AM

91 Views

Hi George,

It's enough to fill only lower triangular for the Cholesky if UPLO=L. Or upper for UPLO=U.

So you may modify the example to run scatter/gather loops only on block below diagonal (c<=r).

Also there are Cdtrsd2d() and Cdtrrv2d() functions which transfer only upper or lower triangular of matrix, which you may use to transfer diagonal blocks (c==r).

Best regards,

Alexander

Georgios_S_

New Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-20-2015
06:44 AM

90 Views

Hi Alexander,

thanks for the response. How should I provide the input matrix then? I mean, now I am using a double* A, which I malloc to N*N. In the triangle case, what should I do?

Take for example, this *serial *version of Cholesky. By the way I am trying to input the triangle matrix A, it founds wrong result (info is also set to 3). How should I do it?

#include <mpi.h> #include <iostream> #include <mkl.h> #include <mkl_scalapack.h> #include "mkl_lapacke.h" #include <mkl_cblas.h> void print_lower(double* A, int N); int main(int argc, char *argv[]) { int N = 5; /*double A[] = { 4, 1, 2, 0.5, 2, 1, 0.5, 0, 0, 0, 2, 0, 3, 0, 0, 0.5, 0, 0, 0.625, 0, 2, 0, 0, 0, 16 };*/ double A[] = { // how to input the triangle matrix? 4, 1, 0.5, 2, 0, 3, 0.5, 0, 0, 0.625, 2, 0, 0, 0, 16 }; int nInfo; char ch = 'L'; dpotrf( &ch, &N, A, &N, &nInfo ); printf("N = %d, dpotrf return status = %d\n", N, nInfo); print_lower(A, N); return 0; } void print_lower(double* A, int N) { int i, j; for(i = 0; i < N; ++i) { for(j = 0; j < N; ++j) { // lower triangular matrix printf("%f ", (i >= j) ? A[i + j * N] : 0.0); } printf("\n"); } }

Best regards,

George

mecej4

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-20-2015
05:01 PM

90 Views

There are some errors in your program. The first is that DPOTRF takes a square matrix for A, so you must pass an array of size at least N^{2}. However, only the lower or the upper triangle needs to be filled in. The remaining values can be set to zero or any place-holder values.

Secondly, you are calling the Fortran 77 name, so you must follow Fortran 2-D array convention (column-major order). Thus

double A[] = { 4, 1, 2, 0.5, 2, 0, 0.5, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0.625, 0, 0, 0, 0, 0, 16 };

would pass the lower triangle, with zeroes in the strictly upper triangle.

If you wish to use the Lapack-E conventions, you should call LAPACKE_dpotrf instead.

Georgios_S_

New Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-20-2015
05:27 PM

90 Views

Hi mecej4,

that clears things up, thank you! So, actually, I can't reduce the space needed, I mean the matrix has to be still N*N. However, I may save time in scattering/gathering the matrix, in the distribution process. If you know any example that performs the distribution of a triangle matrix, please let me know.

Kind regards,

George

Ying_H_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-20-2015
07:31 PM

90 Views

Hi George,

Right, as Alexander and Mecej mentioned, the scalpack function ask the whole matrix (although it is symmetric). So the matrix should be whole matrix. (N*N) and the input matrix on each node are nrows x ncolumns. The memory can't be save. You may save some time by distributing and gather only upper or lower part.

As you see, in this __example__ , it distribute the whole matrix.

Cdgesd2d(ctxt, nr, nc, A_glob+N*c+r, N, sendr, sendc);

And Alexander mentioned:

you may modify the example to run scatter/gather loops only on block below diagonal (c<=r).Also there are Cdtrsd2d() and Cdtrrv2d() functions which transfer only upper or lower triangular of matrix, which you may use to transfer diagonal blocks (c==r).

Cdtrsd2d parameter are like below ( see more details in MKL manual)

call Cdtrsd2d( icontxt, uplo, diag, m, n, a, lda, rdest, cdest)

uplo (input) CHARACTER*1 . Indicates whether the matrix is upper or lower

trapezoidal, as discussed below. Upper or Lower

diag (input) CHARACTER*1 . Indicates whether the diagonal of the matrix is unit

diagonal (will not be operated on ) "U" or otherwise (will be operated on) "N".

So you may modify the code

if (mpiroot) {

// Send a nr-by-nc submatrix to process (sendr, sendc)

Cdtrsd2d() //(ctxt, "U", "N", nr, nc, A_glob+N*c+r, N, sendr, sendc);

}

if (myrow == sendr && mycol == sendc) {

// Receive the same data

// The leading dimension of the local matrix is nrows!

Cdtrrv2d() // (ctxt, nr, nc, A_loc+nrows*recvc+recvr, nrows, 0, 0);

recvc = (recvc+nc)%ncols;

}

Best Regards,

Ying

p.S

?trrv2d

Receives a message from the process into the

trapezoidal matrix.

Syntax

call itrrv2d( icontxt, uplo, diag, m, n, a, lda, rsrc, csrc)

call strrv2d( icontxt, uplo, diag, m, n, a, lda, rsrc, csrc)

call dtrrv2d( icontxt, uplo, diag, m, n, a, lda, rsrc, csrc)

Alexander_K_Intel3

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-21-2015
12:47 AM

90 Views

Georgios_S_

New Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-21-2015
10:12 AM

90 Views

Hi all,

thanks for your kind posts, they helped a lot. Alexander, your code works. I just have a last question. mecej4's post #7 says that we should use column-major approach. However, your code will transfer the lower part and if for example, we had only one process, then the matrix the factorization would apply to, would be like this:

double A[] = { 4, 0, 0, 0, 0, 1, 0.5, 0, 0, 0, 2, 0, 3, 0, 0, 0.5, 0, 0, 0.625, 0, 2, 0, 0, 0, 16 };

and not like the one in mecej4's post #7. However, it still works. Why? Is this because of the symmetry?

Thanks again,

George

Alexander_K_Intel3

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-22-2015
09:43 PM

90 Views

George,

You are welcome!

The example transfers lower triangular of matrix stored Column-Major including diagonal elements. So if you have Row-Major format with lower triangular filled (as in post #10) - this is equivalent to filled upper triangular of Column-Major matrix. The scatter code will transfer zeroes from subdiagonal elements and diagonal elements of the matrix. And pdpotrf will factorize that diagonal matrix. Such matrix is perfectly positive defined, so no problems to compute the factorization, though it will differ from result of factorization of original A.

Best regards,

Alexander

Georgios_S_

New Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-04-2015
08:13 AM

90 Views

Hi Alexander,

thanks for your response.

Best regards,

George

For more complete information about compiler optimizations, see our Optimization Notice.