Solved: Hi George,

Georgios_S_ · ‎07-17-2015

I am performing a Cholesky decomposition, like in this thread. Post #4 contains a minor example, that looks like as my code. The minor example of Ying is based on this example. As one can see, the matrix is read by the master node and then gets distributed (every element of the matrix). Then every element of the matrix is being gathered back to the master node.

However, the example is general. In my application, the matrix is symmetric, so I am wondering if I could do better, than just sending all the matrix. That is, to send only the lower part of tha matrix, perform the Cholesky decomposition and then receive only the lower part of the matrix. Is this possible?

Of course, this has to do on how pdpotrf() expects it's input. My idea is that if we could limit distribution only in the lower part of the matrix, then we could achieve better performace.

PS - a negative answer will be also flagged as the best reply, i.e. it is accepted.

Alexander_K_Intel3 · ‎07-20-2015

Hi George,

It's enough to fill only lower triangular for the Cholesky if UPLO=L. Or upper for UPLO=U.
So you may modify the example to run scatter/gather loops only on block below diagonal (c<=r).
Also there are Cdtrsd2d() and Cdtrrv2d() functions which transfer only upper or lower triangular of matrix, which you may use to transfer diagonal blocks (c==r).

Best regards,
Alexander

View solution in original post

mecej4 · ‎07-17-2015

Cholesky decomposition applies only to symmetric positive definite matrices -- it is not sufficient for the matrix to be symmetric. A Cholesky solver may accept the input matrix as a full matrix or may accept one triangle, but these are user interface details and the algorithm definitely makes use of the required symmetry.

Georgios_S_ · ‎07-17-2015

Hi mecej4,

yes, that's correct. I just highlighted the symmetry property, because that's the one I want to exploit. I am interested in the Cholesky solver of MKL's, which is pdpotrf(). If there are more, distributed versions, please let me know.

Can I input the triangle of the matrix only in MKL (instead of the whole matrix)?

Thanks,

George

Alexander_K_Intel3 · ‎07-20-2015

Hi George,

It's enough to fill only lower triangular for the Cholesky if UPLO=L. Or upper for UPLO=U.
So you may modify the example to run scatter/gather loops only on block below diagonal (c<=r).
Also there are Cdtrsd2d() and Cdtrrv2d() functions which transfer only upper or lower triangular of matrix, which you may use to transfer diagonal blocks (c==r).

Best regards,
Alexander

Georgios_S_ · ‎07-20-2015

Hi Alexander,

thanks for the response. How should I provide the input matrix then? I mean, now I am using a double* A, which I malloc to N*N. In the triangle case, what should I do?

Take for example, this serial version of Cholesky. By the way I am trying to input the triangle matrix A, it founds wrong result (info is also set to 3). How should I do it?

#include <mpi.h>
#include <iostream>
#include <mkl.h>
#include <mkl_scalapack.h>
#include "mkl_lapacke.h"
#include <mkl_cblas.h>

void print_lower(double* A, int N);

int main(int argc, char *argv[]) {
  int N = 5;
  /*double A[] =  {
	4,   1,   2, 0.5,   2,
	1,   0.5, 0, 0,     0,
	2,   0,   3, 0,     0,
	0.5, 0,   0, 0.625, 0,
	2,   0,   0, 0,     16
  };*/
  
  double A[] =  { // how to input the triangle matrix?
	4, 
	1,   0.5,
	2,   0,   3,
	0.5, 0,   0, 0.625,
	2,   0,   0, 0,     16
  };

  int nInfo;
  char ch = 'L';
  dpotrf(
        &ch,
        &N,
        A,
        &N,
        &nInfo
  );

  printf("N = %d, dpotrf return status = %d\n", N, nInfo);
  
  print_lower(A, N);
  return 0;
}

void print_lower(double* A, int N) {
  int i, j;
  for(i = 0; i < N; ++i) {
    for(j = 0; j < N; ++j) {
      // lower triangular matrix
      printf("%f ", (i >= j) ? A[i + j * N] : 0.0);
    }
    printf("\n");
  }
}

Best regards,

George

mecej4 · ‎07-20-2015

There are some errors in your program. The first is that DPOTRF takes a square matrix for A, so you must pass an array of size at least N². However, only the lower or the upper triangle needs to be filled in. The remaining values can be set to zero or any place-holder values.

Secondly, you are calling the Fortran 77 name, so you must follow Fortran 2-D array convention (column-major order). Thus

  double A[] =  {
	4,   1,   2, 0.5,   2,
	0,   0.5, 0, 0,     0,
	0,   0,   3, 0,     0,
	0, 0,   0, 0.625, 0,
	0,   0,   0, 0,     16
  };

would pass the lower triangle, with zeroes in the strictly upper triangle.

If you wish to use the Lapack-E conventions, you should call LAPACKE_dpotrf instead.

Georgios_S_ · ‎07-20-2015

Hi mecej4,

that clears things up, thank you! So, actually, I can't reduce the space needed, I mean the matrix has to be still N*N. However, I may save time in scattering/gathering the matrix, in the distribution process. If you know any example that performs the distribution of a triangle matrix, please let me know.

Kind regards,

George

Ying_H_Intel · ‎07-20-2015

Hi George,

Right, as Alexander and Mecej mentioned, the scalpack function ask the whole matrix (although it is symmetric). So the matrix should be whole matrix. (N*N) and the input matrix on each node are nrows x ncolumns. The memory can't be save. You may save some time by distributing and gather only upper or lower part.

As you see, in this example , it distribute the whole matrix.

Cdgesd2d(ctxt, nr, nc, A_glob+N*c+r, N, sendr, sendc);

And Alexander mentioned:

you may modify the example to run scatter/gather loops only on block below diagonal (c<=r).Also there are Cdtrsd2d() and Cdtrrv2d() functions which transfer only upper or lower triangular of matrix, which you may use to transfer diagonal blocks (c==r).

Cdtrsd2d parameter are like below ( see more details in MKL manual)

call Cdtrsd2d( icontxt, uplo, diag, m, n, a, lda, rdest, cdest)

uplo (input) CHARACTER*1 . Indicates whether the matrix is upper or lower
trapezoidal, as discussed below. Upper or Lower
diag (input) CHARACTER*1 . Indicates whether the diagonal of the matrix is unit
diagonal (will not be operated on ) "U" or otherwise (will be operated on) "N".

So you may modify the code

if (mpiroot) {
                // Send a nr-by-nc submatrix to process (sendr, sendc)
                Cdtrsd2d()   //(ctxt, "U", "N", nr, nc, A_glob+N*c+r, N, sendr, sendc);
            }

            if (myrow == sendr && mycol == sendc) {
                // Receive the same data
                // The leading dimension of the local matrix is nrows!
                Cdtrrv2d()   //   (ctxt, nr, nc, A_loc+nrows*recvc+recvr, nrows, 0, 0);
                recvc = (recvc+nc)%ncols;
            }

Best Regards,

Ying

p.S

?trrv2d
Receives a message from the process into the
trapezoidal matrix.
Syntax
call itrrv2d( icontxt, uplo, diag, m, n, a, lda, rsrc, csrc)
call strrv2d( icontxt, uplo, diag, m, n, a, lda, rsrc, csrc)
call dtrrv2d( icontxt, uplo, diag, m, n, a, lda, rsrc, csrc)

Alexander_K_Intel3 · ‎07-21-2015

I've modified the example referenced above to copy only lower triangular part.
You could find it in the attach.

Best regards,
Alexander

Georgios_S_ · ‎07-21-2015

Hi all,

thanks for your kind posts, they helped a lot. Alexander, your code works. I just have a last question. mecej4's post #7 says that we should use column-major approach. However, your code will transfer the lower part and if for example, we had only one process, then the matrix the factorization would apply to, would be like this:

double A[] =  {
    4,    0,    0, 0,          0,
    1,    0.5, 0, 0,          0,
    2,    0,    3, 0,          0,
    0.5, 0,    0, 0.625,  0,
    2,    0,    0, 0,           16
  };

and not like the one in mecej4's post #7. However, it still works. Why? Is this because of the symmetry?

Thanks again,

George

Alexander_K_Intel3 · ‎07-22-2015

George,

You are welcome!

The example transfers lower triangular of matrix stored Column-Major including diagonal elements. So if you have Row-Major format with lower triangular filled (as in post #10) - this is equivalent to filled upper triangular of Column-Major matrix. The scatter code will transfer zeroes from subdiagonal elements and diagonal elements of the matrix. And pdpotrf will factorize that diagonal matrix. Such matrix is perfectly positive defined, so no problems to compute the factorization, though it will differ from result of factorization of original A.

Best regards,
Alexander

Georgios_S_ · ‎08-04-2015

Hi Alexander,

thanks for your response.

Best regards,

George

Exploit symmetry in Cholesky