- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In ourcode, we have many allocatable arrays before parallel do session.
To remove the data dependence, at the present, our code is like this:
allocate(A(M1,N1,number_of_threads))
allocate(B(M2,N2,L2,number_of_threads))
!$OMP parallel do
DO I = 1, II
....
ENDDO
Since ourarrays size is very large and we would like to reduce the memory. If the following is
more efficient in terms of memory usage as well as the performance?
!$OMP parallel do
DO I =1, II
Allocate(A(M1,N1))
Allocate(B(M2,N2,L2))
.....
deallocate(A,B)
ENDDO
Thanks
To remove the data dependence, at the present, our code is like this:
allocate(A(M1,N1,number_of_threads))
allocate(B(M2,N2,L2,number_of_threads))
!$OMP parallel do
DO I = 1, II
....
ENDDO
Since ourarrays size is very large and we would like to reduce the memory. If the following is
more efficient in terms of memory usage as well as the performance?
!$OMP parallel do
DO I =1, II
Allocate(A(M1,N1))
Allocate(B(M2,N2,L2))
.....
deallocate(A,B)
ENDDO
Thanks
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The following will yield better performance
!$OMP parallel
Allocate(A(M1,N1))
Allocate(B(M2,N2,L2))
!$OMP do
DO I =1, II
.....
ENDDO
deallocate(A,B)
!$OMP end parallel
The total storage for allocations in your original example and in the above will be the same.
The number of times you perform allocations will be less.
*** However
The allocations are in smaller chunks (i.e. not *number_of_threads)
Smaller allocation size has higher probability of being granted as memory fragments.
If A and B are temps .AND. if I represents one of the indexes for both .AND. if the I'th element of both arrays are completely independent from the I'th+x element .AND. if II is relatively large. Then consider breaking the iteration space into smaller pieces, each piece done in parallel.
Jim Dempsey
!$OMP parallel
Allocate(A(M1,N1))
Allocate(B(M2,N2,L2))
!$OMP do
DO I =1, II
.....
ENDDO
deallocate(A,B)
!$OMP end parallel
The total storage for allocations in your original example and in the above will be the same.
The number of times you perform allocations will be less.
*** However
The allocations are in smaller chunks (i.e. not *number_of_threads)
Smaller allocation size has higher probability of being granted as memory fragments.
If A and B are temps .AND. if I represents one of the indexes for both .AND. if the I'th element of both arrays are completely independent from the I'th+x element .AND. if II is relatively large. Then consider breaking the iteration space into smaller pieces, each piece done in parallel.
Jim Dempsey
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page