Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library
- ScaLapack in C essentials

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

hpc-matt

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-26-2010
04:20 AM

525 Views

ScaLapack in C essentials

Forgive my seemingly simple question, I am still new at using Scalapack, and I am using it in C. I have read through many forum posts, and have looked at how to use the correct sequence for invoking ScaLapack routines,

Initialize MPI

Initialize CBlacs grid

Construct Data <- My issues (sort of)

Execute function call

Cleanup

When writing a program to compute QR decomposition (pdgeqrf) in C, where can I find the C interfaces to the auxiliary routines in ScaLapack. for instance, numroc_ or descinit_? Also, when constructing my data, must I use the standard scatter gather interface when using 2d block cyclic formats across multiple machines, or are those details in the invocation of the scala[ack routine?

Thanks again for all your help and patience with us new users,

Matt

Link Copied

12 Replies

Ivan_Latkin__Intel_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-27-2010
04:23 AM

525 Views

Hello, Matt

Unfortunately, there is no C interface for ScaLAPACK or PBLAS.All parametersshould be passed into routines and functionsby reference, you can also define constants (i_one for 1, i_negone for -1, d_two for 2.0E+0 etc.) to pass into routines.Matrices should bestoredas 1d array(A[ i + lda*j ], not A

To invoke ScaLAPACK routines in your program, you should first initialize grid via BLACS routines (BLACS is enough).

Second, you should distribute your matrix over process grid (block cyclic 2d distribution). You can do this by means of pdgeadd_ PBLAS routine. This routine cumputes sum of two matrices A, B: B:=alpha*A+beta*B). Matrices can have different distribution,in particularmatrixA can be owned by only one process, thus, setting alpha=1, beta=0 you cansimply copy your non-distributed matrix A into distributed matrix B.

Third, call pdgeqrf_ for matrix B. In the end of ScaLAPACK part of code, you can collect results on one process (just copy distributed matrix into local one via pdgeadd_). Finally, close grid via blacs_gridexit_ and blacs_exit_.

After all, ScaLAPACK-using program should contain following:

void main(){

// Useful constants

const int i_one = 1, i_negone = -1, i_zero = 0;

const double zero=0.0E+0, one=1.0E+0;

...

// Input your parameters: m, n - matrix dimensions, mb, nb - blocking parameters,

// nprow, npcol - grid dimensions

...

// Part with invoking of ScaLAPACK routines. Initialize process grid, first

blacs_get_( &i_negone, &i_zero, &ictxt );

blacs_gridinit_( &ictxt, "R", &nprow, &npcol );

blacs_gridinfo_( &ictxt, &nprow, &npcol, &myrow, &mycol );

if ( myrow==0 && mycol==0 ){

A = malloc(m*n,sizeof(double));

//input matrix A here

...

}else{

A = NULL

//other processes don't contain parts of A

}

// Compute dimensions of local part of distributed matrix A_distr

mp = numroc_( &m, &mb, &myrow, &i_zero, &nprow );

nq = numroc_( &n, &nb, &mycol, &i_zero, &npcol );

A_distr = malloc( mp*nq, sizeof(double));

// Initialize discriptors (local matrix A is considered as distributed with blocking parameters

// m, n,i.e. there is only one block - whole matrix A - which is located on process (0,0) )

lld = MAX( numroc_( &n, &n, &myrow, &i_zero, &nprow ), 1 );

descinit_( descA, &m, &n, &m, &n, &i_zero, &i_zero, &ictxt, &lld, &info );

lld_distr = MAX( mp, 1 );

descinit_( descA_distr, &m, &n, &mb, &nb, &i_zero, &i_zero, &ictxt, &lld_distr, &info );

// Call pdgeadd_ to distribute matrix (i.e. copy A into A_distr)

pdgeadd_( "N", &m, &n, &one, A, &i_one, &i_one, descA, &zero, A_distr, &i_one, &i_one, descA_distr );

// Now call ScaLAPACK routines

pdgeqrf_(&m, &n, A_distr, &i_one, &i_one, descA_distr, tau, work, &lwork, &info);

// Copy result into local matrix

pdgeadd_( "N", &m, &n, &one, A_distr, &i_one, &i_one, descA_distr, &zero, A, &i_one, &i_one, descA );

free( A_distr );

if( myrow==0 && mycol==0 ){

free( A );

}

// End of ScaLAPACK part. Exit process grid.

blacs_gridexit_( &ictxt );

blacs_exit_( &i_zero );

}

You can also study C examples for PBLAS they are available in MKL since 10.3.b release.

I hope my answer will be helpful for you. Feel free to ask questions.

Best regards,

Ivan Latkin

hpc-matt

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-31-2010
11:05 AM

525 Views

When using functions like numroc and descinit, as written above, the blacs grid terminates. Is there anything different I must do in order to ensure these functions are linked correctly to their fortran equivalent?

My linking line is:

MKLPATH:=$(CXXDIR)/mkl/lib/em64t

MPILIBS:=$(MKLPATH)/libmkl_scalapack_ilp64.a $(MKLPATH)/libmkl_solver_ilp64_sequential.a -Wl,--start-group $(MKLPATH)/libmkl_intel_ilp64.a $(MKLPATH)/libmkl_sequential.a $(MKLPATH)/libmkl_core.a $(MKLPATH)/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -lpthread -lm -lirc

mpicc -o MPI_DGEQRFDriver.o -L$(MPILIBS)

And the heder files I am using are:

#include

#include

#include

#include "mpi.h"

#include

EDIT:

Also, when invoking the blacs library, I must pass it blacs_gridinit_(0,0,&itext), not with -1. If this happens, I recieve a MPI_Comm_group: Invalid communicator error.

Thanks again for your further assistance!

Matt

Ivan_Latkin__Intel_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-01-2010
02:33 AM

525 Views

As far as you are linking ilp64 libraries, you should declare integer variables properly.

1. It's better to use 'MKL_INT' type for integerparameters and datapassed to ScaLAPACK and BLACS rutines. Constants also should be declared: "const MKL_INT i_negone=-1, i_zero=0, i_one=1;"

2. You shouldinclude "mkl_scalapack.h" header (it also refers to "mkl_types.h" where the 'MKL_INT' is defined).

3. Youshould compile your source codewith -DMKL_ILP64 option in case you're linking ilp64 libraries

Please, don't mix up two styles for BLACS routine invocation: Cblacs_get(-1, 0, &ictxt) and blacs_get_(&i_negone, &i_zero, &ictxt)-I would recommend using second one, because it works for sure.

Routine call like blacs_get_(-1, 0, &ictxt) is incorrect because for this case parameters should be passed by pointer, not by value

I hope this will help to resolve the problem

Best regards,

Ivan Latkin

hpc-matt

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-08-2010
06:24 PM

525 Views

Is there a way to confirm that the BLACS stuff is working correctly? Whenever I try to call numroc, my grid errors out. I also initialize a grid using the MPI calls exactly as above, but the grid info returned from the blacs get info command is not the same (the nprow, npcol is not correct, always 0).

Also, when I call

blacs_get_( &i_negone, &i_zero, &ictxt );

My program crashes saying that that is an invalid context. if I call:

blacs_get_( &i_zero, &i_zero, &ictxt );

my program still doesnt work, but its a different error.

Any additional light that you can shed on these issues would be greatly appreciated.

Matt

Ivan_Latkin__Intel_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-09-2010
07:30 AM

525 Views

It is difficult to say where is a mistake. Could you please provide your source file? Iwould try to reproduce the issue.

To testBLACSroutines you can useNetlib tests(but they can be used for lp64 only). You can download Netlib BLACS- it is supplied with tests. Then you should to editmakefiles in the BLACS top directory in proper way, setpaths to BLACS (it can be path to MKL BLACS library) library and MPI, and print 'make tester' to create executables - they will be located in the TESTING/EXE folder. Finally, launch the executables via mpiexec (with 4 processes at least), if all tests pass you will see: " THERE WERE NO FAILURES IN THIS TEST RUN".

Best regards,

Ivan Latkin

hpc-matt

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-09-2010
10:55 PM

525 Views

Here is the contents of my file:

#include

#include

#include

#include "mpi.h"

#include "mkl_scalapack.h"

#include

#define INDEX(i,j,lda) (i+j*lda)

static int MAX( int a, int b ){

if (a>b) return(a); else return(b);

}

int main(int argc, char** argv){

// Useful constants for ScaLapack

const MKL_INT i_one = 1, i_negone = -1, i_zero = 0;

double zero=0.0E+0, one=1.0E+0;

//used for MPI init

int iam=0, nprocs=0;

int myrank_mpi, nprocs_mpi;

//Used for BLACS grid

int nprow=2, npcol=2, myrow=-1, mycol=-1;

//dimension for global matrix, Row and col blocking params

MKL_INT M, N, mb, nb, LWORK, INFO=0, ictxt=0;

int descA_distr[9],descA[9];

int lld=0;

double *A;

//dimension for local matrix

int mp, nq, lld_distr;

double *A_distr;

double *TAU,*WORK;

//counters, seeds

int i=0, itemp, seed;

//call MPI init commands

MPI_Init( &argc, &argv);

MPI_Comm_rank(MPI_COMM_WORLD, &myrank_mpi);

MPI_Comm_size(MPI_COMM_WORLD, &nprocs_mpi);

// Input your parameters: m, n - matrix dimensions, mb, nb - blocking parameters,

// nprow, npcol - grid dimensions

//Want 2x2 grid, will change in future

nprow = 2;

npcol = 2;

//Matrix Size

M=16;

N=16;

//Matrix blocks

mb=2;

nb=2;

//for local work

LWORK=nb*M;

// Part with invoking of ScaLAPACK routines. Initialize process grid, first

/*

//Cblacs routines, do not work

Cblacs_pinfo( &iam, &nprocs ) ;

Cblacs_get( -1, 0, &ictxt );

Cblacs_gridinit( &ictxt, "Row", nprow, npcol );

Cblacs_gridinfo( ictxt, &nprow, &npcol, &myrow, &mycol );

printf("Thread=%d, Cblacs_pinfo, iam=%d,nprocs=%d\n",myrank_mpi,iam,nprocs);

printf("Thread=%d, After info, nprow=%d, npcol=%d, myrow=%d, mycol=%d\n",myrank_mpi,nprow,npcol,myrow,mycol);

*/

//blacs Method, does not work

printf("Thread %d: Before init, nprow=%d, npcol=%d, myrow=%d, mycol=%d\n",myrank_mpi,nprow,npcol,myrow,mycol);

//printf("before, ictxt=%d\n",ictxt);

blacs_get_( &i_zero, &i_zero, &ictxt );

blacs_gridinit_( &ictxt, "R", &nprow, &npcol );

blacs_gridinfo_( &ictxt, &nprow, &npcol, &myrow, &mycol );

printf("Thread %d: After init, nprow=%d, npcol=%d, myrow=%d, mycol=%d\n",myrank_mpi,nprow,npcol,myrow,mycol);

//Generate Data

if ( myrow==0 && mycol==0 ){

A = malloc(M*N*sizeof(double));

//just generate some random samples for now,

for(i =0; i

A

}

//FOR DEBUG

/*A[0]=1.0;

A[1]=2.0;

A[2]=3.0;

A[3]=4.0;

A[4]=4.0;

A[5]=3.0;

A[6]=2.0;

A[7]=1.0;

A[8]=1.0;

A[9]=2.0;

A[10]=3.0;

A[11]=4.0;

A[12]=4.0;

A[13]=3.0;

A[14]=2.0;

A[15]=1.0;*/

}else{

A = NULL;

//other processes don't contain parts of A

}

// Compute dimensions of local part of distributed matrix A_distr

/*

* DEBUG

printf("M=%d\n",M);

printf("mb=%d\n",mb);

printf("myrow=%d\n",myrow);

printf("izero=%d\n",i_zero);

printf("nprow=%d\n",nprow);

* */

mp = numroc_( &M, &mb, &myrow, &i_zero, &nprow);

nq = numroc_( &N, &nb, &mycol, &i_zero, &npcol );

printf("Thread %d: After mp=%d, np=%d\n",myrank_mpi,mp, nq);

A_distr = malloc( mp*nq*sizeof(double));

WORK = (double *)malloc(N*sizeof(double));

TAU = (double *)malloc(N*sizeof(double));

// Initialize discriptors (local matrix A is considered as distributed with blocking parameters

// m, n, i.e. there is only one block - whole matrix A - which is located on process (0,0) )

lld = MAX( numroc_( &N, &N, &myrow, &i_zero, &nprow ), 1 );

descinit_( descA, &M, &N, &M, &N, &i_zero, &i_zero, &ictxt, &lld, &INFO );

lld_distr = MAX( mp, 1 );

descinit_( descA_distr, &M, &N, &mb, &nb, &i_zero, &i_zero, &ictxt, &lld_distr, &INFO );

// Call pdgeadd_ to distribute matrix (i.e. copy A into A_distr)

pdgeadd_( "N", &M, &N, &one, A, &i_one, &i_one, descA, &zero, A_distr, &i_one, &i_one, descA_distr );

// Now call ScaLAPACK routines

pdgeqrf_( &M, &N, A_distr, &i_one, &i_one, descA_distr, TAU, WORK, &LWORK, &INFO);

// Copy result into local matrix

pdgeadd_( "N", &M, &N, &one, A_distr, &i_one, &i_one, descA_distr, &zero, A, &i_one, &i_one, descA );

free( A_distr );

if( myrow==0 && mycol==0 ){

free( A );

}

// End of ScaLAPACK part. Exit process grid.

blacs_gridexit_( &ictxt );

blacs_exit_( &i_zero );

//finalize MPI grid

// MPI_Finalize();

exit(0);

}

So I comment out all but the blacs invocation, using blacs_get_( &i_negone, &i_zero, &ictxt ); , I recieve the following error:

Fatal error in MPI_Comm_group: Invalid communicator, error stack:

MPI_Comm_group(150): MPI_Comm_group(comm=0x0, group=0x7fff3b3ad954) failed

If I sitll leave Scalapack comented out, but use blacs_get_( &i_zero, &i_zero, &ictxt ); , the program will execute giving me the following output:

Thread 0: Before init, nprow=2, npcol=2, myrow=-1, mycol=-1

Thread 2: Before init, nprow=2, npcol=2, myrow=-1, mycol=-1

Thread 1: Before init, nprow=2, npcol=2, myrow=-1, mycol=-1

Thread 3: Before init, nprow=2, npcol=2, myrow=-1, mycol=-1

Thread 2: After init, nprow=0, npcol=0, myrow=0, mycol=0

Thread 0: After init, nprow=0, npcol=0, myrow=0, mycol=0

Thread 3: After init, nprow=0, npcol=0, myrow=0, mycol=1

Thread 1: After init, nprow=0, npcol=0, myrow=0, mycol=1

As you can see it's still wrong, as the myrow and mycol variables should be different. This causes numroc to malfunction, and everything crashes.

Sorry for such a seemingly simple problem, but it has got me stumped.

Thanks again for your help,

Matt

P.S. My makefile is as follows:

CC = mpicc

CCFLAGS = -O3

CXXDIR = /opt/intel/Compiler/11.1/069

LIBDIR:= $(CXXDIR)/mkl/lib/em64t

LIBS:= $(LIBDIR)/libmkl_intel_lp64.a

LIBS += -Wl,--start-group -L$(LIBDIR) $(LIBDIR)/libmkl_intel_thread.a $(LIBDIR)/libmkl_core.a -Wl,--end-group -L$(LIBDIR) -liomp5 -lpthread

MKLPATH:=$(CXXDIR)/mkl/lib/em64t

MPILIBS:=$(MKLPATH)/libmkl_scalapack_ilp64.a $(MKLPATH)/libmkl_solver_ilp64_sequential.a -Wl,--start-group $(MKLPATH)/libmkl_intel_ilp64.a $(MKLPATH)/libmkl_sequential.a $(MKLPATH)/libmkl_core.a $(MKLPATH)/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -lpthread -lm -lirc

DGEQRF : $(OBJECTS) MPI_DGEQRF.o

$(CC) $(CCFLAGS) -o $@ $(OBJECTS) MPI_DGEQRF.o -L$(LIBDIR) $(MPILIBS)

Ivan_Latkin__Intel_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-10-2010
02:46 AM

525 Views

I have studied your code and tried to reproduce the issue. I have several comments.

1) Your makefile should contain explicit rule for compilation (you shouldpass -DMKL_ILP64 option to compiler), like:

%.o : %.c

$(CC) $(CCFLAGS) -DMKL_ILP64 -c -o $@ $<

2) You declared some integer variablesused by ScaLAPACK and BLACS routines as 'int':

int iam=0, nprocs=0;

int nprow=2, npcol=2, myrow=-1, mycol=-1;

int descA_distr[9],descA[9];

int lld=0;

int mp, nq, lld_distr;

but they shoud have 'MKL_INT' type

3) You don't have to call MPIstuff directly - it is invoked by BLACS routines (for instance, blacs_pinfo_ calls MPI_Comm_size, MPI_Comm_rank). So, you should removeMPI calls and leave only BLACS. Additional MPI calls can also cause an error: for example,if you call MPI_Finalize after blacs_exit_.

4) Size of work array for pdgeqrf_ is incorrect. But you can define correct valueby means of additional call of pdgeqrf_with LWORK=-1. This call will beconsidered as query of a workspaseandcorrect value for LWORK will be returned in WORK[ 0 ].

I hope this will help resolving the issue.

Best regards,

Ivan Latkin

hpc-matt

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-11-2010
02:11 AM

525 Views

when I run just the following lines:

printf("Thread %d: Before init, nprow=%d, npcol=%d, myrow=%d, mycol=%d\n",myrank_mpi,nprow,npcol,myrow,mycol);

blacs_get_( &i_negone, &i_zero, &ictxt );

blacs_gridinit_( &ictxt, "R", &nprow, &npcol );

blacs_gridinfo_( &ictxt, &nprow, &npcol, &myrow, &mycol );

printf("Thread %d: After init, nprow=%d, npcol=%d, myrow=%d, mycol=%d\n",myrank_mpi,nprow,npcol,myrow,mycol);

I recieve

Fatal error in MPI_Comm_group: Invalid communicator, error stack:

MPI_Comm_group(150): MPI_Comm_group(comm=0x0, group=0x7fff3b097fd4) failed

when I invoke just blacs (comment everything else out).

I figure it must be a configuration issue, for after allot of searching online it would seem that the code should work, and not give me mpi errors.

Sequence for installing Scalapack for MKL on my machine should be:

Install MKL

Install MPI

Install BLACS(?)

Thanks again for trying out the code, I appreciate the help.

Matt

hpc-matt

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-13-2010
02:38 AM

525 Views

I have messed around with their website, and its frustrating at the very least. Is there anyone who has gotten BLACS installed on their Ubuntu 9.10 using mpiech2 machine successfully?

Thanks,

Matthew

Gennady_F_Intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-13-2010
11:12 PM

525 Views

Matthew, Iguess you mean mpiech2. What MPICH2 version do you use?

--Gennady

hpc-matt

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-14-2010
04:34 PM

525 Views

http://www.mcs.anl.gov/research/projects/mpich2/

which is MPICH2 1.2.1p1. Is that the info you were asking about?

Matthew

Ivan_Latkin__Intel_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-17-2010
10:48 PM

525 Views

Tryto build your test with GNUcompiler. If it doesn't help tryusingolder versions of mpich2 (1.2.1or 1.0.8).

Best regards,

Ivan Latkin

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.