- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to use MPI to solve a problem as if
C=A*B
where A=A(M,N), B=B(N,O)
Before calculating C, I needs to create A and B using MPI paralelized code. However, because some reson, A can only be paralelized using M as distribution index, ie. A is distributed as AM(1:N) in different CPUs. On the other hand, B can onlydistributed as BO(1:N). Since both A and B are so larger, both gather and broadcast are not good for memory. So I am thinking just keep A and B as they were. When I calculate C, I use BN distribution as CPU index, when I need the information of A (ie. AM), I go to the responsible CPU to get the AM. as this:
do i=1,M
do j=1,O
// do l=1,N // parallelized,
mpi_send(AM, i, ..., o,..)
mpi_recv(AM,o, ..., i,...)
CMO= sum(AM*BO)
//enddo
enddo
enddo
Of coz, this will not work because the send and receive are distributed in single thread.
So, I am here to ask for help. Is there any better idea? Thanks.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would suggest you look at breaking up the AM*BO into tiles (stripes) and also do partial sum() on partial product matricies, and then sum the partial sums. In this way you can queue up the production of the AM stripes and queue up the production of the BM stripe(s) (assuming BM is stripeable). Break the code up into two loops, One to queue up (mpi_send...) and then another to product the partial product when data becomes available.
Start with
do i=1,M
do j=1,O
// do l=1,N // parallelized,
mpi_send(AM, i, ..., o,..)
// enddo
enddo
do j=1,O
// do l=1,N // parallelized,
mpi_recv(AM,o, ..., i,...)
CMO= sum(AM*BO)
// enddo
enddo
enddo
Then break down the AM into stripes and produce the partial sums of the partial products.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would suggest you look at breaking up the AM*BO into tiles (stripes) and also do partial sum() on partial product matricies, and then sum the partial sums. In this way you can queue up the production of the AM stripes and queue up the production of the BM stripe(s) (assuming BM is stripeable). Break the code up into two loops, One to queue up (mpi_send...) and then another to product the partial product when data becomes available.
Start with
do i=1,M
do j=1,O
// do l=1,N // parallelized,
mpi_send(AM, i, ..., o,..)
// enddo
enddo
do j=1,O
// do l=1,N // parallelized,
mpi_recv(AM,o, ..., i,...)
CMO= sum(AM*BO)
// enddo
enddo
enddo
Then break down the AM into stripes and produce the partial sums of the partial products.
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page