- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am currently linking a custom version of DGEMM to my program, but I would still like to use PBLAS's PDGEMM routine to split the work up across processes. Unfortunately, I've noticed that even with one process, PDGEMM will split up the matrix, and make multiple calls to my custom DGEMM function. This splitting of the matrix is done without regard to my choosen block sizes.
Is there any way to prevent this from happening? It's wasteful to do this kind of splitting with my custom DGEMM routine.
Thanks,
William
Is there any way to prevent this from happening? It's wasteful to do this kind of splitting with my custom DGEMM routine.
Thanks,
William
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Can you please give me the syntax, on how you are calling or usingthe pdgemm?
Thanks,
Sridevi
Can you please give me the syntax, on how you are calling or usingthe pdgemm?
Thanks,
Sridevi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sridevi,
I am calling pdgemm from a C++ program. One example of me calling it is with the following syntax:
int ione=1;
pdgemm(&transa, &transb, &m, &n, &k, α,
a.val, &ione, &ione, a.desc_,
b.val, &ione, &ione, b.desc_,
β, val, &ione, &ione, desc_);
Where val is the starting address of the matrix C. desc_ contains the MDESC data type description for the matrix C. I believe I have filled in the descriptor correctly, including the block sizes being correctly at a.desc_[4], a.desc_[5], etc. As I said before, changing the values of the block sizes does not seem to modify the way PDGEMM is splitting up the work. For example, in my code I multiply a 156196 x 256 and a 256 x 156196 matrix together. PDGEMM splits this into 8 calls, splitting up the 256 dimension into 8 groups of 32.
I am calling pdgemm from a C++ program. One example of me calling it is with the following syntax:
int ione=1;
pdgemm(&transa, &transb, &m, &n, &k, α,
a.val, &ione, &ione, a.desc_,
b.val, &ione, &ione, b.desc_,
β, val, &ione, &ione, desc_);
Where val is the starting address of the matrix C. desc_ contains the MDESC data type description for the matrix C. I believe I have filled in the descriptor correctly, including the block sizes being correctly at a.desc_[4], a.desc_[5], etc. As I said before, changing the values of the block sizes does not seem to modify the way PDGEMM is splitting up the work. For example, in my code I multiply a 156196 x 256 and a 256 x 156196 matrix together. PDGEMM splits this into 8 calls, splitting up the 256 dimension into 8 groups of 32.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
First of all, matrix should be distributed among the BLACS processes before call of PDGEMM. Also, PDGEMM defines M, N, K for internal call of gemm basing on blocksizes, but M, N, K are not equal to block sizes. You can try to figure out how it happens by reading PDGEMM sources (available via netlib).
--andrew
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Andrew,
Thanks for the response. I will investigate the NETLIB source and see if I can change the behavior.
William Dawson
Thanks for the response. I will investigate the NETLIB source and see if I can change the behavior.
William Dawson
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page