Scalapack p*gemr2d return codes

John_Young · ‎08-21-2013

Hi,

I'm using the MKL Scalapack functions P*RGEM2D to distribute matrices between different contexts. Most MKL Scalapack functions take an 'info' argument to return various error information from the function. I cannot find any intel documentation for the meaning of the returned values in 'info' for the P*RGEM2D functions. Any help concerning the meaning of the info values would be appreciated.

Thanks,

John

John_Young · ‎08-21-2013

Sorry, I meant P*GEMR2D and not P*RGEM2D

Zhang_Z_Intel · ‎08-21-2013

MKL doesn't seem to offer this routine, although it is found in Netlib ScaLAPACK. I'm trying to get an explanation from the MKL engineering team and will report back here.

John_Young · ‎08-21-2013

It's definitely in Intel's MKL library because we have been using it for several years. However, it is undocumented in the Intel MKL guides. It is documented in the Scalapack book on the netlib web site. It is an extremely useful Scalapack routine as it's the only one I know of that does an inter-context transfer.

Zhang_Z_Intel · ‎08-22-2013

It looks like MKL has a documentation gap for this routine. A bug report will be created to have the gap filled in future MKL releases.

Zhang_Z_Intel · ‎08-23-2013

Before the documentation is fixed, please use the information below for descriptions of the PDGEMR2D routine. Note that it is a FORTRAN routine, but it can be called from C code as pdgemr2d_(...)

-- ScaLAPACK routine (version 1.7) --

Oak Ridge National Laboratory, Univ. of Tennessee, and Univ. of

California, Berkeley.

October 31, 1994.

SUBROUTINE PDGEMR2D( M, N,

$ A, IA, JA, ADESC,

$ B, IB, JB, BDESC,

$ CTXT)

------------------------------------------------------------------------

Purpose

=======

PDGEMR2D copies a submatrix of A on a submatrix of B.

A and B can have different distributions: they can be on different

processor grids, they can have different blocksizes, the beginning

of the area to be copied can be at a different places on A and B.

The parameters can be confusing when the grids of A and B are

partially or completly disjoint, in the case a processor calls

this routines but is either not in the A context or B context, the

ADESC[CTXT] or BDESC[CTXT] must be equal to -1, to ensure the

routine recognise this situation.

To summarize the rule:

- If a processor is in A context, all parameters related to A must be valid.

- If a processor is in B context, all parameters related to B must be valid.

- ADESC[CTXT] and BDESC[CTXT] must be either valid contexts or equal to -1.

- M and N must be valid for everyone.

- other parameters are not examined.

Notes

=====

A description vector is associated with each 2D block-cyclicly dis-

tributed matrix. This vector stores the information required to

establish the mapping between a matrix entry and its corresponding

process and memory location.

In the following comments, the character _ should be read as

"of the distributed matrix". Let A be a generic term for any 2D

block cyclicly distributed matrix. Its description vector is DESC_A:

NOTATION STORED IN EXPLANATION

--------------- -------------- --------------------------------------

DT_A (global) DESCA( DT_ ) The descriptor type.

CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating

the BLACS process grid A is distribu-

ted over. The context itself is glo-

bal, but the handle (the integer

value) may vary.

M_A (global) DESCA( M_ ) The number of rows in the distributed

matrix A.

N_A (global) DESCA( N_ ) The number of columns in the distri-

buted matrix A.

MB_A (global) DESCA( MB_ ) The blocking factor used to distribute

the rows of A.

NB_A (global) DESCA( NB_ ) The blocking factor used to distribute

the columns of A.

RSRC_A (global) DESCA( RSRC_ ) The process row over which the first

row of the matrix A is distributed.

CSRC_A (global) DESCA( CSRC_ ) The process column over which the

first column of A is distributed.

LLD_A (local) DESCA( LLD_ ) The leading dimension of the local

array storing the local blocks of the

distributed matrix A.

LLD_A >= MAX(1,LOCp(M_A)).

Important notice

================

The parameters of the routine have changed in April 1996

There is a new last argument. It must be a context englobing

all processors involved in the initial and final distribution.

Be aware that all processors included in this

context must call the redistribution routine.

Parameters

==========

M (input) INTEGER.

On entry, M specifies the number of rows of the

submatrix to be copied. M must be at least zero.

Unchanged on exit.

N (input) INTEGER.

On entry, N specifies the number of cols of the submatrix

to be redistributed.rows of B. M must be at least zero.

Unchanged on exit.

A (input) DOUBLE PRECISION

On entry, the source matrix.

Unchanged on exit.

IA,JA (input) INTEGER

On entry,the coordinates of the beginning of the submatrix

of A to copy.

1 <= IA <= M_A - M + 1,1 <= JA <= N_A - N + 1,

Unchanged on exit.

ADESC (input) A description vector (see Notes above)

If the current processor is not part of the context of A

the ADESC[CTXT] must be equal to -1.

B (output) DOUBLE PRECISION

On entry, the destination matrix.

The portion corresponding to the defined submatrix are updated.

IB,JB (input) INTEGER

On entry,the coordinates of the beginning of the submatrix

of B that will be updated.

1 <= IB <= M_B - M + 1,1 <= JB <= N_B - N + 1,

Unchanged on exit.

BDESC (input) B description vector (see Notes above)

For processors not part of the context of B

BDESC[CTXT] must be equal to -1.

CTXT (input) a context englobing at least all processors included

in either A context or B context

Memory requirement :

====================

for the processors belonging to grid 0, one buffer of size block 0

and for the processors belonging to grid 1, also one buffer of size

block 1.

C interface:

pdgemr2d_ (MKL_INT *m, MKL_INT *n, double *A, MKL_INT *ia, MKL_INT *ja, MKL_INT *desc_A,

MKL_INT *B, MKL_INT *ib, MKL_INT *jb, MKL_INT *desc_B, MKL_INT *gcontext);