- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm using the MKL Scalapack functions P*RGEM2D to distribute matrices between different contexts. Most MKL Scalapack functions take an 'info' argument to return various error information from the function. I cannot find any intel documentation for the meaning of the returned values in 'info' for the P*RGEM2D functions. Any help concerning the meaning of the info values would be appreciated.
Thanks,
John
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, I meant P*GEMR2D and not P*RGEM2D
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
MKL doesn't seem to offer this routine, although it is found in Netlib ScaLAPACK. I'm trying to get an explanation from the MKL engineering team and will report back here.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's definitely in Intel's MKL library because we have been using it for several years. However, it is undocumented in the Intel MKL guides. It is documented in the Scalapack book on the netlib web site. It is an extremely useful Scalapack routine as it's the only one I know of that does an inter-context transfer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It looks like MKL has a documentation gap for this routine. A bug report will be created to have the gap filled in future MKL releases.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Before the documentation is fixed, please use the information below for descriptions of the PDGEMR2D routine. Note that it is a FORTRAN routine, but it can be called from C code as pdgemr2d_(...)
-- ScaLAPACK routine (version 1.7) --
Oak Ridge National Laboratory, Univ. of Tennessee, and Univ. of
California, Berkeley.
October 31, 1994.
SUBROUTINE PDGEMR2D( M, N,
$ A, IA, JA, ADESC,
$ B, IB, JB, BDESC,
$ CTXT)
------------------------------------------------------------------------
Purpose
=======
PDGEMR2D copies a submatrix of A on a submatrix of B.
A and B can have different distributions: they can be on different
processor grids, they can have different blocksizes, the beginning
of the area to be copied can be at a different places on A and B.
The parameters can be confusing when the grids of A and B are
partially or completly disjoint, in the case a processor calls
this routines but is either not in the A context or B context, the
ADESC[CTXT] or BDESC[CTXT] must be equal to -1, to ensure the
routine recognise this situation.
To summarize the rule:
- If a processor is in A context, all parameters related to A must be valid.
- If a processor is in B context, all parameters related to B must be valid.
- ADESC[CTXT] and BDESC[CTXT] must be either valid contexts or equal to -1.
- M and N must be valid for everyone.
- other parameters are not examined.
Notes
=====
A description vector is associated with each 2D block-cyclicly dis-
tributed matrix. This vector stores the information required to
establish the mapping between a matrix entry and its corresponding
process and memory location.
In the following comments, the character _ should be read as
"of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is DESC_A:
NOTATION STORED IN EXPLANATION
--------------- -------------- --------------------------------------
DT_A (global) DESCA( DT_ ) The descriptor type.
CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating
the BLACS process grid A is distribu-
ted over. The context itself is glo-
bal, but the handle (the integer
value) may vary.
M_A (global) DESCA( M_ ) The number of rows in the distributed
matrix A.
N_A (global) DESCA( N_ ) The number of columns in the distri-
buted matrix A.
MB_A (global) DESCA( MB_ ) The blocking factor used to distribute
the rows of A.
NB_A (global) DESCA( NB_ ) The blocking factor used to distribute
the columns of A.
RSRC_A (global) DESCA( RSRC_ ) The process row over which the first
row of the matrix A is distributed.
CSRC_A (global) DESCA( CSRC_ ) The process column over which the
first column of A is distributed.
LLD_A (local) DESCA( LLD_ ) The leading dimension of the local
array storing the local blocks of the
distributed matrix A.
LLD_A >= MAX(1,LOCp(M_A)).
Important notice
================
The parameters of the routine have changed in April 1996
There is a new last argument. It must be a context englobing
all processors involved in the initial and final distribution.
Be aware that all processors included in this
context must call the redistribution routine.
Parameters
==========
M (input) INTEGER.
On entry, M specifies the number of rows of the
submatrix to be copied. M must be at least zero.
Unchanged on exit.
N (input) INTEGER.
On entry, N specifies the number of cols of the submatrix
to be redistributed.rows of B. M must be at least zero.
Unchanged on exit.
A (input) DOUBLE PRECISION
On entry, the source matrix.
Unchanged on exit.
IA,JA (input) INTEGER
On entry,the coordinates of the beginning of the submatrix
of A to copy.
1 <= IA <= M_A - M + 1,1 <= JA <= N_A - N + 1,
Unchanged on exit.
ADESC (input) A description vector (see Notes above)
If the current processor is not part of the context of A
the ADESC[CTXT] must be equal to -1.
B (output) DOUBLE PRECISION
On entry, the destination matrix.
The portion corresponding to the defined submatrix are updated.
IB,JB (input) INTEGER
On entry,the coordinates of the beginning of the submatrix
of B that will be updated.
1 <= IB <= M_B - M + 1,1 <= JB <= N_B - N + 1,
Unchanged on exit.
BDESC (input) B description vector (see Notes above)
For processors not part of the context of B
BDESC[CTXT] must be equal to -1.
CTXT (input) a context englobing at least all processors included
in either A context or B context
Memory requirement :
====================
for the processors belonging to grid 0, one buffer of size block 0
and for the processors belonging to grid 1, also one buffer of size
block 1.
C interface:
pdgemr2d_ (MKL_INT *m, MKL_INT *n, double *A, MKL_INT *ia, MKL_INT *ja, MKL_INT *desc_A,
MKL_INT *B, MKL_INT *ib, MKL_INT *jb, MKL_INT *desc_B, MKL_INT *gcontext);

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page