- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
i want to use some scalapacks' functions. but i confuse to set initial data form in scalapack, so i asked it here.
i have understood the data layout like figure below, and made code, too.
test code what i coded (pdgeqrf) like below :
#include <stdio.h> #include <stdlib.h> #include <malloc.h> #include <time.h> #include <mpi.h> #include <mkl_blacs.h> #include <mkl_scalapack.h> #include <mkl_lapacke.h> #include <mkl_cblas.h> #include <errno.h> int main(int argc, char **argv) { int i,j; // test parameters (default) int m = 4000; int n = 4000; int mb = 8; int nb = 8; int nprows = 8; int npcols = 8; // temp values // parameter value change (optional) if(argc >=5){ m=atoi(argv[1]); n=atoi(argv[2]); nprows=atoi(argv[3]); npcols=atoi(argv[4]); } // time and validity double startTime; double endTime; double gap; double flops; //QR double * A; double * tau; double * work; int mpirank, mpisize; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &mpirank); MPI_Comm_size(MPI_COMM_WORLD, &mpisize); int myid, numproc, ctxt, myrow, mycol; MKL_INT descA[9]; MKL_INT zero = 0; MKL_INT one = 1; MKL_INT info = 0; Cblacs_pinfo(&myid, &numproc); if(numproc > 1 && myid != 0){ Cblacs_setup(&myid, &numproc); } Cblacs_get(-1, 0, &ctxt); Cblacs_gridinit(&ctxt, "R", nprows, npcols); blacs_gridinfo_(&ctxt, &nprows, &npcols, &myrow, &mycol); if(myrow == -1){ return 0; } /* * temp value for fortran */ char aform = 'N'; char diag = 'N'; MKL_INT lda = m; MKL_INT iarow = 0; MKL_INT iacol = 0; MKL_INT iseed = 10; MKL_INT iroff = 0; MKL_INT irnum = numroc_(&m, &mb, &myrow, &zero, &nprows); MKL_INT icoff = 0; MKL_INT icnum = numroc_(&n, &nb, &mycol, &zero, &npcols); MKL_INT lwork = -1; //descinit(desc, m, n, mb, nb, irsrc, icsrc, ictxt, LLD, info) descinit_(descA, &m, &n, &mb, &nb, &zero, &zero, &ctxt, &icnum, &info); A = (double*)malloc(sizeof(double)*irnum*icnum); tau = (double*)malloc(sizeof(double)*(m*n/2)); //array size should >= LOCc(ja+min(m,n)-1) work = (double*)malloc(sizeof(double)*m*n); //pdmatgen_(&ctxt, &aform, &diag, &m, &n, &mb, &nb, A, &m, &iarow, &iacol, &iseed, &iroff, &irnum, &icoff, &icnum, &myrow, &mycol, &nprows, &npcols); /* * generate matrix (by column major) * * matrix : A * size : irnum * icnum * seed : 10 */ for(j = 0; j < irnum; ++j) { for(i = 0; i < icnum; ++i) { A[i*irnum+j] = rand()%10; } } printf("QR valid test (MPI)\n"); // pdgeqrf routine MPI_Barrier(MPI_COMM_WORLD); startTime = MPI_Wtime(); pdgeqrf_(&m,&n,A,&one,&one,descA,tau,work,&lwork,&info); //info = dgeqrf(LAPACK_COL_MAJOR, m, n, A, m, tau, descA, mpirank); MPI_Barrier(MPI_COMM_WORLD); endTime = MPI_Wtime(); // flops gap = (double)( endTime - startTime ); flops = (2.0 * (double)n * (double)n * (double)(m-n/3) ) * 1.0e-9 / gap; printf("info\t%d, dgemm time (sec)\t%f, Gflops\t%f \n", info, gap, flops); Cblacs_gridexit(ctxt); Cblacs_exit(&zero); MPI_Finalize(); free(A); free(tau); free(work); return 0; }
i have compiled my code like this :
mpiicc scalapack_test2.c -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -mkl
i've comfirmed the result value 0 correctly, but there is some weird things especially speed things.
in addition, i couldn't check this result is right or not. i'm not sure of initializing data set value exactly.
this is result texts what i confirmed.
// form : ./a.out m n nprow npcol [@localhost src]$ mpirun -n 4 ./a.out 6400 6400 2 2 QR valid test (MPI) QR valid test (MPI) QR valid test (MPI) QR valid test (MPI) info 0, dgemm time (sec) 0.001011, Gflops 345703.851960 info 0, dgemm time (sec) 0.001003, Gflops 348580.607742 info 0, dgemm time (sec) 0.001141, Gflops 306337.241154 info 0, dgemm time (sec) 0.001143, Gflops 305826.040084 [@localhost src]$ mpirun -n 1 ./a.out 6400 6400 1 1 QR valid test (MPI) info 0, dgemm time (sec) 0.000237, Gflops 1474979.915656
could you give me some advise what i miss understand or miss used? if you give me good example of this, i'm really thank for you.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Sorry for miss the thread. Yes, your assumption about Scalapack array initialization is exact correct. As http://www.netlib.org/scalapack/ claim: Scalapack uses 2D block-ciclyc distribution of n*n matrix.
As i understand, you have two questions here.
1. about data layout.
A. MKL actually provide scalapack sample code under MKL install folder.
for example. <MKL install dir>/examples_cluster_c/pblas3_s_example.c
* Product C=A*B is computed by means of p?gemm, difference B-inv_A*C is
* also computed by means of p?gemm (but with transa='T'). Norm of the dif-
* ference and norms of matrices A and B are computed using p?lange.
* Sheme of 2D block-ciclyc distribution of n*n matrix:
*
* 0 1 0 1 0 (0,0) (0,1)
* __________________________ ______________ ___________
* |1 1 |2 2 |3 3 |4 4 |5 | |1 1 |3 3 |5 | |2 2 |4 4 |
* 0 |1 1 |2 2 |3 3 |4 4 |5 | |1 1 |3 3 |5 | |2 2 |4 4 |
* |_____|_____|_____|_____|__| |_____|_____|__| |_____|_____|
* |6 6 |7 7 |8 8 |9 9 |10| |11 11|13 13|15| |12 12|14 14|
* 1 |6 6 |7 7 |8 8 |9 9 |10| |11 11|13 13|15| |12 12|14 14|
* |_____|_____|_____|_____|__| |_____|_____|__| |_____|_____|
* |11 11|12 12|13 13|14 14|15| ---> |21 21|23 23|25| |22 22|24 24|
* 0 |11 11|12 12|13 13|14 14|15| |_____|_____|__| |_____|_____|
* |_____|_____|_____|_____|__|
* |16 16|17 17|18 18|19 19|20|
* 1 |16 16|17 17|18 18|19 19|20| (1,0) (1,1)
* |_____|_____|_____|_____|__| ______________ ___________
* 0 |21 21|22 22|23 23|24 24|25| |6 6 |8 8 |10| |7 7 |9 9 |
* |_____|_____|_____|_____|__| |6 6 |8 8 |10| |7 7 |9 9 |
* |_____|_____|__| |_____|_____|
* |16 16|18 18|20| |17 17|19 19|
* |16 16|18 18|20| |17 17|19 19|
* |_____|_____|__| |_____|_____|
*
*========================================================================*/
B. I'm not sure if you did search on the forum. there are some C sample code also.
https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/536962
https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/558359
etc.
you may refer to them.
2. Regarding the performance, you mentioned there is some weird things especially speed things
Could you please tell the expected speed and what are your processors? It may depend on matrix size, test method etc.
Best Regards,
Ying

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page