Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
6981 Discussions

Data initialize task on scalapack with C file

Yeongha_L_
Beginner
346 Views

Hello,

i want to use some scalapacks' functions. but i confuse to set initial data form in scalapack, so i asked it here.

i have understood the data layout like figure below, and made code, too.

 

 

 

 

 

 

 

 

 

 

 

 

test code what i coded (pdgeqrf) like below :

#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include <time.h>
#include <mpi.h>
#include <mkl_blacs.h>
#include <mkl_scalapack.h>
#include <mkl_lapacke.h>
#include <mkl_cblas.h>
#include <errno.h>



int main(int    argc,
         char **argv)
{
  int i,j;
  // test parameters (default)
  int m      = 4000;
  int n      = 4000;
  int mb     = 8; 
 int nb     = 8;
  int nprows = 8;
 int npcols = 8; // temp values

 // parameter value change (optional)
  if(argc >=5){
    m=atoi(argv[1]);
    n=atoi(argv[2]);
    nprows=atoi(argv[3]);
    npcols=atoi(argv[4]);
  
  }

  // time and validity
  double startTime;
  double endTime;
  double gap;
  double flops;

 //QR
 double * A;
 double * tau;
 double * work;

  int mpirank, mpisize;
  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &mpirank);
  MPI_Comm_size(MPI_COMM_WORLD, &mpisize);

  int myid, numproc, ctxt, myrow, mycol;
  MKL_INT descA[9];
  MKL_INT zero = 0;
 MKL_INT one = 1;
 MKL_INT info = 0;

  Cblacs_pinfo(&myid, &numproc);
  if(numproc > 1 && myid != 0){
    Cblacs_setup(&myid, &numproc);
  }
  Cblacs_get(-1, 0, &ctxt);
  Cblacs_gridinit(&ctxt, "R", nprows, npcols);
  blacs_gridinfo_(&ctxt, &nprows, &npcols, &myrow, &mycol);

  if(myrow == -1){
    return 0;
  }

 /*
  * temp value for fortran
  */
 char aform = 'N';
 char diag = 'N';
 MKL_INT lda = m;
 MKL_INT iarow = 0;
 MKL_INT iacol = 0;
 MKL_INT iseed = 10;
 MKL_INT iroff = 0;
 MKL_INT irnum = numroc_(&m, &mb, &myrow, &zero, &nprows);
 MKL_INT icoff = 0;
 MKL_INT icnum = numroc_(&n, &nb, &mycol, &zero, &npcols);
 MKL_INT lwork = -1; 

 //descinit(desc,  m,  n,  mb,  nb, irsrc, icsrc, ictxt, LLD, info)
  descinit_(descA, &m, &n, &mb, &nb, &zero, &zero, &ctxt, &icnum, &info);
 A = (double*)malloc(sizeof(double)*irnum*icnum);
 tau = (double*)malloc(sizeof(double)*(m*n/2)); //array size should >= LOCc(ja+min(m,n)-1)
 work = (double*)malloc(sizeof(double)*m*n); 

 //pdmatgen_(&ctxt, &aform, &diag, &m, &n, &mb, &nb, A, &m, &iarow, &iacol, &iseed, &iroff, &irnum, &icoff, &icnum, &myrow, &mycol, &nprows, &npcols);


 /* 
  *  generate matrix (by column major)
  *
  * matrix      : A
  * size        : irnum * icnum
  * seed        : 10
  */
  for(j = 0; j < irnum; ++j)
  {
    for(i = 0; i < icnum; ++i)
    {
      A[i*irnum+j] = rand()%10;
    }
  }

  printf("QR valid test (MPI)\n");

 
 // pdgeqrf routine
 MPI_Barrier(MPI_COMM_WORLD);
  startTime = MPI_Wtime();
 pdgeqrf_(&m,&n,A,&one,&one,descA,tau,work,&lwork,&info);
  //info = dgeqrf(LAPACK_COL_MAJOR, m, n, A, m, tau, descA, mpirank); 
 MPI_Barrier(MPI_COMM_WORLD);
  endTime = MPI_Wtime();

  // flops
  gap = (double)( endTime - startTime );
  flops = (2.0 * (double)n * (double)n * (double)(m-n/3) ) * 1.0e-9 / gap;
  printf("info\t%d, dgemm time (sec)\t%f, Gflops\t%f \n", info, gap, flops);
  
  Cblacs_gridexit(ctxt);
  Cblacs_exit(&zero);
  MPI_Finalize();
    
  free(A);
  free(tau);
 free(work);
  return 0;
}

 

i have compiled my code like this :

 mpiicc scalapack_test2.c -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -mkl

 

i've comfirmed the result value 0 correctly, but there is some weird things especially speed things.

in addition, i couldn't check this result is right or not. i'm not sure of  initializing data set value exactly.

this is result texts what i confirmed.

//  form : ./a.out m n nprow npcol

[@localhost src]$ mpirun -n 4 ./a.out 6400 6400 2 2
QR valid test (MPI)
QR valid test (MPI)
QR valid test (MPI)
QR valid test (MPI)
info    0, dgemm time (sec)     0.001011, Gflops        345703.851960
info    0, dgemm time (sec)     0.001003, Gflops        348580.607742
info    0, dgemm time (sec)     0.001141, Gflops        306337.241154
info    0, dgemm time (sec)     0.001143, Gflops        305826.040084

[@localhost src]$ mpirun -n 1 ./a.out 6400 6400 1 1
QR valid test (MPI)
info    0, dgemm time (sec)     0.000237, Gflops        1474979.915656

 

could you give me some advise what i miss understand or miss used? if you give me good example of this, i'm really thank for you.

0 Kudos
1 Reply
Ying_H_Intel
Employee
346 Views

Hello,

Sorry for miss the thread.  Yes,  your  assumption about Scalapack array initialization is exact correct.  As  http://www.netlib.org/scalapack/  claim: Scalapack uses   2D block-ciclyc distribution of n*n matrix.

As i understand, you have two questions here.

1. about data layout.  

A. MKL actually provide scalapack sample code under MKL install folder. 

for example.  <MKL install dir>/examples_cluster_c/pblas3_s_example.c

* Product C=A*B is computed by means of p?gemm,  difference  B-inv_A*C  is
 * also computed by means of p?gemm (but with transa='T'). Norm of the dif-
 * ference and norms of matrices A and B are computed using p?lange.
 * Sheme of 2D block-ciclyc distribution of n*n matrix:
 *
 *        0     1     0     1    0             (0,0)              (0,1)
 *      __________________________         ______________      ___________
 *     |1  1 |2  2 |3  3 |4  4 |5 |       |1  1 |3  3 |5 |    |2  2 |4  4 |
 * 0   |1  1 |2  2 |3  3 |4  4 |5 |       |1  1 |3  3 |5 |    |2  2 |4  4 |
 *     |_____|_____|_____|_____|__|       |_____|_____|__|    |_____|_____|
 *     |6  6 |7  7 |8  8 |9  9 |10|       |11 11|13 13|15|    |12 12|14 14|
 * 1   |6  6 |7  7 |8  8 |9  9 |10|       |11 11|13 13|15|    |12 12|14 14|
 *     |_____|_____|_____|_____|__|       |_____|_____|__|    |_____|_____|
 *     |11 11|12 12|13 13|14 14|15| --->  |21 21|23 23|25|    |22 22|24 24|
 * 0   |11 11|12 12|13 13|14 14|15|       |_____|_____|__|    |_____|_____|
 *     |_____|_____|_____|_____|__|
 *     |16 16|17 17|18 18|19 19|20|       
 * 1   |16 16|17 17|18 18|19 19|20|            (1,0)              (1,1)
 *     |_____|_____|_____|_____|__|        ______________      ___________
 * 0   |21 21|22 22|23 23|24 24|25|       |6  6 |8  8 |10|    |7  7 |9  9 |
 *     |_____|_____|_____|_____|__|       |6  6 |8  8 |10|    |7  7 |9  9 |
 *                                        |_____|_____|__|    |_____|_____|
 *                                        |16 16|18 18|20|    |17 17|19 19|
 *                                        |16 16|18 18|20|    |17 17|19 19|
 *                                        |_____|_____|__|    |_____|_____|
 *
 *========================================================================*/

 B. I'm not sure if you did search on the forum.  there are some C sample code also.

https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/536962

https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/558359

https://software.intel.com/en-us/articles/using-cluster-mkl-pblasscalapack-fortran-routine-in-your-c-program/

etc.

you may refer to them.

2. Regarding the performance,    you mentioned there is some weird things especially speed things

Could you please tell the expected speed and  what are your processors?  It may depend on matrix size, test method etc.

Best Regards,

Ying

0 Kudos
Reply