- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
one way to set values to all elements of an array is:
Do i=1, N A(i)=2*B(i) C(1,i)=2*B(i) End do
Another possibility is to use this:
A(:)=2*B(:) C(1,:)=2*B(:)
Which is the optimum way to set values of an entire array? And with parallel computing?
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Although the compiler has the ability for auto-parallelism (when enabled), it is often much better to use explicit parallelization via OpenMP (need to enable).
!$OMP PARALLEL DO Do i=1, N A(i)=2*B(i) C(1,i)=2*B(i) End do
Note, C(1,i)= above has a stride of N. This is inefficient when uses such as above. It would be more efficient to swap the indices (allocations and use elsewhere) such that the innter most (or only) loop access is the left most index (this is reverse from C/C++).
Note 2, if you change the loop indexing, then the C(i,1)=... can be vectorized (without scatter if your CPU supports scatter).
!$OMP PARALLEL DO SIMD Do i=1, N A(i)=2*B(i) C(i,1)=2*B(i) ! requires change in index order End do
Additional note on the above: The computation in the above loop is relatively small with regards to the memory fetch and store. For loops like this, you may find it more efficient to restrict the number of threads to a small-ish number. The most effective number will vary from system to system. The number for the above loop would likely depend on the number of memory channels available on the system.
Jim Dempsey
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Although the compiler has the ability for auto-parallelism (when enabled), it is often much better to use explicit parallelization via OpenMP (need to enable).
!$OMP PARALLEL DO Do i=1, N A(i)=2*B(i) C(1,i)=2*B(i) End do
Note, C(1,i)= above has a stride of N. This is inefficient when uses such as above. It would be more efficient to swap the indices (allocations and use elsewhere) such that the innter most (or only) loop access is the left most index (this is reverse from C/C++).
Note 2, if you change the loop indexing, then the C(i,1)=... can be vectorized (without scatter if your CPU supports scatter).
!$OMP PARALLEL DO SIMD Do i=1, N A(i)=2*B(i) C(i,1)=2*B(i) ! requires change in index order End do
Additional note on the above: The computation in the above loop is relatively small with regards to the memory fetch and store. For loops like this, you may find it more efficient to restrict the number of threads to a small-ish number. The most effective number will vary from system to system. The number for the above loop would likely depend on the number of memory channels available on the system.
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page