I am currently linking a custom version of DGEMM to my program, but I would still like to use PBLAS's PDGEMM routine to split the work up across processes. Unfortunately, I've noticed that even with one process, PDGEMM will split up the matrix, and make multiple calls to my custom DGEMM function. This splitting of the matrix is done without regard to my choosen block sizes.
Is there any way to prevent this from happening? It's wasteful to do this kind of splitting with my custom DGEMM routine.
Thanks,
William
Is there any way to prevent this from happening? It's wasteful to do this kind of splitting with my custom DGEMM routine.
Thanks,
William
链接已复制
4 回复数
Sridevi,
I am calling pdgemm from a C++ program. One example of me calling it is with the following syntax:
int ione=1;
pdgemm(&transa, &transb, &m, &n, &k, α,
a.val, &ione, &ione, a.desc_,
b.val, &ione, &ione, b.desc_,
β, val, &ione, &ione, desc_);
Where val is the starting address of the matrix C. desc_ contains the MDESC data type description for the matrix C. I believe I have filled in the descriptor correctly, including the block sizes being correctly at a.desc_[4], a.desc_[5], etc. As I said before, changing the values of the block sizes does not seem to modify the way PDGEMM is splitting up the work. For example, in my code I multiply a 156196 x 256 and a 256 x 156196 matrix together. PDGEMM splits this into 8 calls, splitting up the 256 dimension into 8 groups of 32.
I am calling pdgemm from a C++ program. One example of me calling it is with the following syntax:
int ione=1;
pdgemm(&transa, &transb, &m, &n, &k, α,
a.val, &ione, &ione, a.desc_,
b.val, &ione, &ione, b.desc_,
β, val, &ione, &ione, desc_);
Where val is the starting address of the matrix C. desc_ contains the MDESC data type description for the matrix C. I believe I have filled in the descriptor correctly, including the block sizes being correctly at a.desc_[4], a.desc_[5], etc. As I said before, changing the values of the block sizes does not seem to modify the way PDGEMM is splitting up the work. For example, in my code I multiply a 156196 x 256 and a 256 x 156196 matrix together. PDGEMM splits this into 8 calls, splitting up the 256 dimension into 8 groups of 32.
First of all, matrix should be distributed among the BLACS processes before call of PDGEMM. Also, PDGEMM defines M, N, K for internal call of gemm basing on blocksizes, but M, N, K are not equal to block sizes. You can try to figure out how it happens by reading PDGEMM sources (available via netlib).
--andrew
