ScaLapack in C essentials

hpc-matt · ‎08-26-2010

Greetings Forum users and experts,

Forgive my seemingly simple question, I am still new at using Scalapack, and I am using it in C. I have read through many forum posts, and have looked at how to use the correct sequence for invoking ScaLapack routines,

Initialize MPI
Initialize CBlacs grid
Construct Data <- My issues (sort of)
Execute function call
Cleanup

When writing a program to compute QR decomposition (pdgeqrf) in C, where can I find the C interfaces to the auxiliary routines in ScaLapack. for instance, numroc_ or descinit_? Also, when constructing my data, must I use the standard scatter gather interface when using 2d block cyclic formats across multiple machines, or are those details in the invocation of the scala[ack routine?

Thanks again for all your help and patience with us new users,

Matt

Ivan_Latkin__Intel_ · ‎08-27-2010

Hello, Matt

Unfortunately, there is no C interface for ScaLAPACK or PBLAS.All parametersshould be passed into routines and functionsby reference, you can also define constants (i_one for 1, i_negone for -1, d_two for 2.0E+0 etc.) to pass into routines.Matrices should bestoredas 1d array(A[ i + lda*j ], not A)

To invoke ScaLAPACK routines in your program, you should first initialize grid via BLACS routines (BLACS is enough).
Second, you should distribute your matrix over process grid (block cyclic 2d distribution). You can do this by means of pdgeadd_ PBLAS routine. This routine cumputes sum of two matrices A, B: B:=alpha*A+beta*B). Matrices can have different distribution,in particularmatrixA can be owned by only one process, thus, setting alpha=1, beta=0 you cansimply copy your non-distributed matrix A into distributed matrix B.

Third, call pdgeqrf_ for matrix B. In the end of ScaLAPACK part of code, you can collect results on one process (just copy distributed matrix into local one via pdgeadd_). Finally, close grid via blacs_gridexit_ and blacs_exit_.

After all, ScaLAPACK-using program should contain following:

void main(){
// Useful constants
const int i_one = 1, i_negone = -1, i_zero = 0;
const double zero=0.0E+0, one=1.0E+0;
...

// Input your parameters: m, n - matrix dimensions, mb, nb - blocking parameters,
// nprow, npcol - grid dimensions
...

// Part with invoking of ScaLAPACK routines. Initialize process grid, first

blacs_get_( &i_negone, &i_zero, &ictxt );
blacs_gridinit_( &ictxt, "R", &nprow, &npcol );
blacs_gridinfo_( &ictxt, &nprow, &npcol, &myrow, &mycol );

if ( myrow==0 && mycol==0 ){
A = malloc(m*n,sizeof(double));
//input matrix A here
...
}else{
A = NULL
//other processes don't contain parts of A
}

// Compute dimensions of local part of distributed matrix A_distr
mp = numroc_( &m, &mb, &myrow, &i_zero, &nprow );
nq = numroc_( &n, &nb, &mycol, &i_zero, &npcol );
A_distr = malloc( mp*nq, sizeof(double));

// Initialize discriptors (local matrix A is considered as distributed with blocking parameters
// m, n,i.e. there is only one block - whole matrix A - which is located on process (0,0) )
lld = MAX( numroc_( &n, &n, &myrow, &i_zero, &nprow ), 1 );
descinit_( descA, &m, &n, &m, &n, &i_zero, &i_zero, &ictxt, &lld, &info );
lld_distr = MAX( mp, 1 );
descinit_( descA_distr, &m, &n, &mb, &nb, &i_zero, &i_zero, &ictxt, &lld_distr, &info );

// Call pdgeadd_ to distribute matrix (i.e. copy A into A_distr)
pdgeadd_( "N", &m, &n, &one, A, &i_one, &i_one, descA, &zero, A_distr, &i_one, &i_one, descA_distr );

// Now call ScaLAPACK routines
pdgeqrf_(&m, &n, A_distr, &i_one, &i_one, descA_distr, tau, work, &lwork, &info);

// Copy result into local matrix
pdgeadd_( "N", &m, &n, &one, A_distr, &i_one, &i_one, descA_distr, &zero, A, &i_one, &i_one, descA );

free( A_distr );
if( myrow==0 && mycol==0 ){
free( A );
}

// End of ScaLAPACK part. Exit process grid.
blacs_gridexit_( &ictxt );
blacs_exit_( &i_zero );

}

You can also study C examples for PBLAS they are available in MKL since 10.3.b release.
I hope my answer will be helpful for you. Feel free to ask questions.

Best regards,

Ivan Latkin

hpc-matt · ‎08-31-2010

Thanks again for the assistance, this example was a great help!

When using functions like numroc and descinit, as written above, the blacs grid terminates. Is there anything different I must do in order to ensure these functions are linked correctly to their fortran equivalent?

My linking line is:

MKLPATH:=$(CXXDIR)/mkl/lib/em64t
MPILIBS:=$(MKLPATH)/libmkl_scalapack_ilp64.a $(MKLPATH)/libmkl_solver_ilp64_sequential.a -Wl,--start-group $(MKLPATH)/libmkl_intel_ilp64.a $(MKLPATH)/libmkl_sequential.a $(MKLPATH)/libmkl_core.a $(MKLPATH)/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -lpthread -lm -lirc

mpicc -o MPI_DGEQRFDriver.o -L$(MPILIBS)

And the heder files I am using are:

#include
#include
#include
#include "mpi.h"
#include

EDIT:
Also, when invoking the blacs library, I must pass it blacs_gridinit_(0,0,&itext), not with -1. If this happens, I recieve a MPI_Comm_group: Invalid communicator error.

Thanks again for your further assistance!

Matt

Ivan_Latkin__Intel_ · ‎09-01-2010

Hello, Matt!

As far as you are linking ilp64 libraries, you should declare integer variables properly.
1. It's better to use 'MKL_INT' type for integerparameters and datapassed to ScaLAPACK and BLACS rutines. Constants also should be declared: "const MKL_INT i_negone=-1, i_zero=0, i_one=1;"
2. You shouldinclude "mkl_scalapack.h" header (it also refers to "mkl_types.h" where the 'MKL_INT' is defined).
3. Youshould compile your source codewith -DMKL_ILP64 option in case you're linking ilp64 libraries

Please, don't mix up two styles for BLACS routine invocation: Cblacs_get(-1, 0, &ictxt) and blacs_get_(&i_negone, &i_zero, &ictxt)-I would recommend using second one, because it works for sure.
Routine call like blacs_get_(-1, 0, &ictxt) is incorrect because for this case parameters should be passed by pointer, not by value

I hope this will help to resolve the problem

Best regards,
Ivan Latkin

hpc-matt · ‎09-08-2010

Thanks again for the help, I have made sure to adhere to your suggestions.

Is there a way to confirm that the BLACS stuff is working correctly? Whenever I try to call numroc, my grid errors out. I also initialize a grid using the MPI calls exactly as above, but the grid info returned from the blacs get info command is not the same (the nprow, npcol is not correct, always 0).

Also, when I call

blacs_get_( &i_negone, &i_zero, &ictxt );
My program crashes saying that that is an invalid context. if I call:

blacs_get_( &i_zero, &i_zero, &ictxt );
my program still doesnt work, but its a different error.

Any additional light that you can shed on these issues would be greatly appreciated.

Matt

Ivan_Latkin__Intel_ · ‎09-09-2010

Matt,

It is difficult to say where is a mistake. Could you please provide your source file? Iwould try to reproduce the issue.

To testBLACSroutines you can useNetlib tests(but they can be used for lp64 only). You can download Netlib BLACS- it is supplied with tests. Then you should to editmakefiles in the BLACS top directory in proper way, setpaths to BLACS (it can be path to MKL BLACS library) library and MPI, and print 'make tester' to create executables - they will be located in the TESTING/EXE folder. Finally, launch the executables via mpiexec (with 4 processes at least), if all tests pass you will see: " THERE WERE NO FAILURES IN THIS TEST RUN".

Best regards,
Ivan Latkin

hpc-matt · ‎09-09-2010

Greets,

Here is the contents of my file:

#include
#include
#include
#include "mpi.h"
#include "mkl_scalapack.h"
#include

#define INDEX(i,j,lda) (i+j*lda)

static int MAX( int a, int b ){
if (a>b) return(a); else return(b);
}
int main(int argc, char** argv){
// Useful constants for ScaLapack
const MKL_INT i_one = 1, i_negone = -1, i_zero = 0;
double zero=0.0E+0, one=1.0E+0;

//used for MPI init
int iam=0, nprocs=0;
int myrank_mpi, nprocs_mpi;
//Used for BLACS grid
int nprow=2, npcol=2, myrow=-1, mycol=-1;
//dimension for global matrix, Row and col blocking params
MKL_INT M, N, mb, nb, LWORK, INFO=0, ictxt=0;
int descA_distr[9],descA[9];
int lld=0;
double *A;
//dimension for local matrix
int mp, nq, lld_distr;
double *A_distr;
double *TAU,*WORK;
//counters, seeds
int i=0, itemp, seed;

//call MPI init commands
MPI_Init( &argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank_mpi);
MPI_Comm_size(MPI_COMM_WORLD, &nprocs_mpi);

// Input your parameters: m, n - matrix dimensions, mb, nb - blocking parameters,
// nprow, npcol - grid dimensions
//Want 2x2 grid, will change in future
nprow = 2;
npcol = 2;
//Matrix Size
M=16;
N=16;
//Matrix blocks
mb=2;
nb=2;
//for local work
LWORK=nb*M;

// Part with invoking of ScaLAPACK routines. Initialize process grid, first
/*
//Cblacs routines, do not work
Cblacs_pinfo( &iam, &nprocs ) ;
Cblacs_get( -1, 0, &ictxt );
Cblacs_gridinit( &ictxt, "Row", nprow, npcol );
Cblacs_gridinfo( ictxt, &nprow, &npcol, &myrow, &mycol );

printf("Thread=%d, Cblacs_pinfo, iam=%d,nprocs=%d\n",myrank_mpi,iam,nprocs);
printf("Thread=%d, After info, nprow=%d, npcol=%d, myrow=%d, mycol=%d\n",myrank_mpi,nprow,npcol,myrow,mycol);
*/

//blacs Method, does not work
printf("Thread %d: Before init, nprow=%d, npcol=%d, myrow=%d, mycol=%d\n",myrank_mpi,nprow,npcol,myrow,mycol);
//printf("before, ictxt=%d\n",ictxt);
blacs_get_( &i_zero, &i_zero, &ictxt );
blacs_gridinit_( &ictxt, "R", &nprow, &npcol );
blacs_gridinfo_( &ictxt, &nprow, &npcol, &myrow, &mycol );
printf("Thread %d: After init, nprow=%d, npcol=%d, myrow=%d, mycol=%d\n",myrank_mpi,nprow,npcol,myrow,mycol);
//Generate Data
if ( myrow==0 && mycol==0 ){
A = malloc(M*N*sizeof(double));
//just generate some random samples for now,
for(i =0; i {
A=rand()%1000000;
}
//FOR DEBUG
/*A[0]=1.0;
A[1]=2.0;
A[2]=3.0;
A[3]=4.0;
A[4]=4.0;
A[5]=3.0;
A[6]=2.0;
A[7]=1.0;
A[8]=1.0;
A[9]=2.0;
A[10]=3.0;
A[11]=4.0;
A[12]=4.0;
A[13]=3.0;
A[14]=2.0;
A[15]=1.0;*/
}else{
A = NULL;
//other processes don't contain parts of A
}

// Compute dimensions of local part of distributed matrix A_distr
/*
* DEBUG
printf("M=%d\n",M);
printf("mb=%d\n",mb);
printf("myrow=%d\n",myrow);
printf("izero=%d\n",i_zero);
printf("nprow=%d\n",nprow);
* */

mp = numroc_( &M, &mb, &myrow, &i_zero, &nprow);
nq = numroc_( &N, &nb, &mycol, &i_zero, &npcol );
printf("Thread %d: After mp=%d, np=%d\n",myrank_mpi,mp, nq);

A_distr = malloc( mp*nq*sizeof(double));
WORK = (double *)malloc(N*sizeof(double));
TAU = (double *)malloc(N*sizeof(double));

// Initialize discriptors (local matrix A is considered as distributed with blocking parameters
// m, n, i.e. there is only one block - whole matrix A - which is located on process (0,0) )
lld = MAX( numroc_( &N, &N, &myrow, &i_zero, &nprow ), 1 );
descinit_( descA, &M, &N, &M, &N, &i_zero, &i_zero, &ictxt, &lld, &INFO );
lld_distr = MAX( mp, 1 );
descinit_( descA_distr, &M, &N, &mb, &nb, &i_zero, &i_zero, &ictxt, &lld_distr, &INFO );

// Call pdgeadd_ to distribute matrix (i.e. copy A into A_distr)
pdgeadd_( "N", &M, &N, &one, A, &i_one, &i_one, descA, &zero, A_distr, &i_one, &i_one, descA_distr );

// Now call ScaLAPACK routines
pdgeqrf_( &M, &N, A_distr, &i_one, &i_one, descA_distr, TAU, WORK, &LWORK, &INFO);

// Copy result into local matrix
pdgeadd_( "N", &M, &N, &one, A_distr, &i_one, &i_one, descA_distr, &zero, A, &i_one, &i_one, descA );

free( A_distr );
if( myrow==0 && mycol==0 ){
free( A );
}

// End of ScaLAPACK part. Exit process grid.
blacs_gridexit_( &ictxt );
blacs_exit_( &i_zero );
//finalize MPI grid
// MPI_Finalize();
exit(0);
}
So I comment out all but the blacs invocation, using blacs_get_( &i_negone, &i_zero, &ictxt ); , I recieve the following error:

Fatal error in MPI_Comm_group: Invalid communicator, error stack:
MPI_Comm_group(150): MPI_Comm_group(comm=0x0, group=0x7fff3b3ad954) failed

If I sitll leave Scalapack comented out, but use blacs_get_( &i_zero, &i_zero, &ictxt ); , the program will execute giving me the following output:

Thread 0: Before init, nprow=2, npcol=2, myrow=-1, mycol=-1
Thread 2: Before init, nprow=2, npcol=2, myrow=-1, mycol=-1
Thread 1: Before init, nprow=2, npcol=2, myrow=-1, mycol=-1
Thread 3: Before init, nprow=2, npcol=2, myrow=-1, mycol=-1
Thread 2: After init, nprow=0, npcol=0, myrow=0, mycol=0
Thread 0: After init, nprow=0, npcol=0, myrow=0, mycol=0
Thread 3: After init, nprow=0, npcol=0, myrow=0, mycol=1
Thread 1: After init, nprow=0, npcol=0, myrow=0, mycol=1

As you can see it's still wrong, as the myrow and mycol variables should be different. This causes numroc to malfunction, and everything crashes.

Sorry for such a seemingly simple problem, but it has got me stumped.

Thanks again for your help,
Matt

P.S. My makefile is as follows:

CC = mpicc
CCFLAGS = -O3
CXXDIR = /opt/intel/Compiler/11.1/069
LIBDIR:= $(CXXDIR)/mkl/lib/em64t
LIBS:= $(LIBDIR)/libmkl_intel_lp64.a
LIBS += -Wl,--start-group -L$(LIBDIR) $(LIBDIR)/libmkl_intel_thread.a $(LIBDIR)/libmkl_core.a -Wl,--end-group -L$(LIBDIR) -liomp5 -lpthread

MKLPATH:=$(CXXDIR)/mkl/lib/em64t
MPILIBS:=$(MKLPATH)/libmkl_scalapack_ilp64.a $(MKLPATH)/libmkl_solver_ilp64_sequential.a -Wl,--start-group $(MKLPATH)/libmkl_intel_ilp64.a $(MKLPATH)/libmkl_sequential.a $(MKLPATH)/libmkl_core.a $(MKLPATH)/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -lpthread -lm -lirc

DGEQRF : $(OBJECTS) MPI_DGEQRF.o
$(CC) $(CCFLAGS) -o $@ $(OBJECTS) MPI_DGEQRF.o -L$(LIBDIR) $(MPILIBS)

Ivan_Latkin__Intel_ · ‎09-10-2010

Matt,

I have studied your code and tried to reproduce the issue. I have several comments.

1) Your makefile should contain explicit rule for compilation (you shouldpass -DMKL_ILP64 option to compiler), like:
%.o : %.c
$(CC) $(CCFLAGS) -DMKL_ILP64 -c -o $@ $<

2) You declared some integer variablesused by ScaLAPACK and BLACS routines as 'int':

int iam=0, nprocs=0;
int nprow=2, npcol=2, myrow=-1, mycol=-1;
int descA_distr[9],descA[9];
int lld=0;
int mp, nq, lld_distr;

but they shoud have 'MKL_INT' type

3) You don't have to call MPIstuff directly - it is invoked by BLACS routines (for instance, blacs_pinfo_ calls MPI_Comm_size, MPI_Comm_rank). So, you should removeMPI calls and leave only BLACS. Additional MPI calls can also cause an error: for example,if you call MPI_Finalize after blacs_exit_.

4) Size of work array for pdgeqrf_ is incorrect. But you can define correct valueby means of additional call of pdgeqrf_with LWORK=-1. This call will beconsidered as query of a workspaseandcorrect value for LWORK will be returned in WORK[ 0 ].

I hope this will help resolving the issue.

Best regards,
Ivan Latkin

hpc-matt · ‎09-11-2010

I inserted the compile line, as well as changed all function calls to MKL_INT where appropriate (matched it with the mkl_scalapack.h file), however, I think it has to do with BLACS that my issues lay. After a call to the blacs_init functions, the nprow and npcol should be 2x2 correct? if I am invoking my mpiexec -n 4 ./myexecutable?

when I run just the following lines:

printf("Thread %d: Before init, nprow=%d, npcol=%d, myrow=%d, mycol=%d\n",myrank_mpi,nprow,npcol,myrow,mycol);
blacs_get_( &i_negone, &i_zero, &ictxt );
blacs_gridinit_( &ictxt, "R", &nprow, &npcol );
blacs_gridinfo_( &ictxt, &nprow, &npcol, &myrow, &mycol );
printf("Thread %d: After init, nprow=%d, npcol=%d, myrow=%d, mycol=%d\n",myrank_mpi,nprow,npcol,myrow,mycol);

I recieve

Fatal error in MPI_Comm_group: Invalid communicator, error stack:
MPI_Comm_group(150): MPI_Comm_group(comm=0x0, group=0x7fff3b097fd4) failed

when I invoke just blacs (comment everything else out).

I figure it must be a configuration issue, for after allot of searching online it would seem that the code should work, and not give me mpi errors.

Sequence for installing Scalapack for MKL on my machine should be:

Install MKL
Install MPI
Install BLACS(?)

Thanks again for trying out the code, I appreciate the help.

Matt

hpc-matt · ‎09-13-2010

Basically over the weekend, it turns out that what I need to know how to do is install and get blacs to work on Ubuntu 9.10.

I have messed around with their website, and its frustrating at the very least. Is there anyone who has gotten BLACS installed on their Ubuntu 9.10 using mpiech2 machine successfully?

Thanks,
Matthew

Gennady_F_Intel · ‎09-13-2010

Matthew, Iguess you mean mpiech2. What MPICH2 version do you use?

--Gennady

hpc-matt · ‎09-14-2010

I am using the latest stable version from:
http://www.mcs.anl.gov/research/projects/mpich2/

which is MPICH2 1.2.1p1. Is that the info you were asking about?

Matthew

Ivan_Latkin__Intel_ · ‎09-17-2010

Matt,

Tryto build your test with GNUcompiler. If it doesn't help tryusingolder versions of mpich2 (1.2.1or 1.0.8).

Best regards,
Ivan Latkin