I am using pdlacpy. Documentation says this function copies all or part of a distributed matrix (A) to other distributed matrix (B). It also says no communication is performed (e.g. the function performs a local copy). Without communication, only matrices which are equally distributed (same dimensions and same block size) can be used with this function. Is this right?
In addition, without communication, the submatrix A to be copied to matrix B must be in the same position (i.e. ia == ib && ja == jb).
Finally, could be only copied the upper or lower triangular of the global matrix.
It is not clear for me the two first restrictions from the documentation (and why you can specify ia, ja and ib, jb). Without taking care of these restrictions, I get wrong results. This makes me feel also unsure about the function is working properly.
ScaLAPACK uses the two-dimensional block-cyclic data distribution as a layout for dense matrix computations. For a global matrix, the blocking factors for its rows and columns are fixed after data distribution. Depending on how you distribute the data, some processes may get more blocks than other processes. It is not necessarily an equal distribution.
pdlacpy performs local copying between submatrices of the global matrices A and B. But these submatrices do not have to be in the same position. For example, you can copy a submatrix of A into a bigger submatrix of B; and the starting location of the source can be different than the starting location of the destination. In this case, ia != ib && ja != jb.
When uplo is neither 'U' nor 'L' then this routine copies the entire submatrix.
I am not sure I understand.
Supose you have two matrices of dimensions 4x4 and a block size of 2x2. You also have 4 processes distributed in a 2x2 grid. Then, the matrices will be distributed in 2x2 blocks and each process will have one block. For instance, for matrix A I will have the blocks:
In this simple case, the first block A11 will be in process (0,0), A12 in process (0,1), A21 in process (1,0) and A22 in process (1,1). Same distribution on B. I cannot copy the A11 submatrix to B22 submatrix of B without communication. am I right?
I addition, suppose now that B is distributed using a 1x1 block size, I will have blocks:
B11 B12 B13 B14
B21 B22 B23 B24
B31 B32 B33 B34
B41 B42 B43 B44
Then, the process (0,0) will have blocks:
I think (if I understand properly) in this situation I cannot copy the block A11 of matrix A to B without communication. In this situation, the 2x2 A11 block should be divided in 4 smaller 1x1 blocks and distributed between the four processes. Is this right?
Because p?lacpy performs only local copy, you cannot copy a block on one process to another block on a different process.
In your first example, A11 and B22 sit on different processes. You cannot use p?lacpy to copy between them.
In your second example, if the block size for A is 2x2 while the block size for B is 1x1, then you can copy submarix A11 into the memory space where [B11, B13, B31, B33] is. But this might not be what you want. It would be easier to manage for sure if A and B are distributed using the same block size.
According to this, (if I understand properly) in many situations when ia != ib or ja != jb a submatrix A can not be copied without communication to a region of B. If you do that, you will not get what you could expect.
I think I understand what the function does. But (for me) the documentation is a little bit confusing and it is not clear which restrictions must be imposed for avoiding unexpected results.