- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
the code below aggregates values from vector b into vector a using vector p as a position index for a:
Program Test real, allocatable :: a(:), b(:) integer, allocatable :: p(:), f(:) integer :: i,j allocate(b(5),p(5)) p=(/1,2,0,2,1/) b=(/0.5,0.1,0.5,0.3,0.1/) !!option 1 allocate(a(1:maxval(p)),source=0.0) Do i=1,size(b,1) if(p(i)==0) cycle a(p(i))=a(p(i))+b(i) End Do write(*,*) a !!end option 1 deallocate(a) !!option 2 allocate(a(0:maxval(p)),source=0.0) Do i=1,size(b,1) a(p(i))=a(p(i))+b(i) End Do a(0)=0.0 write(*,*) a(1:ubound(a,1)) !!end option 2 deallocate(a) !!option 3 allocate(a(maxval(p)),source=0.0) allocate(f(count(p>0))) f=(/1,2,4,5/) Do i=1,size(f,1) a(p(f(i)))=a(p(f(i)))+b(f(i)) End Do write(*,*) a !!end option 3 end Program Test
The value of zero in index vector p indicates that the related value in b should be skipped.
To my knowledge there are three options to achieve this: a) an if branch, b) starting vector "a" at position zero and c) an additional position vector "f". The first has a drawback of possibly difficult to predict branching. The second has the problem that the position zero of "a" has to be loaded into the cpu every time a zero occures and subsequent operations on that array are more tricky to optimize due to the "weird" array boundaries. The third option poses an additional overhead in terms of memory due to having to allocate memory for "f". In addition there might be some low down due to double indexing.
There are some boundary conditions to this problem:
1. arrays "a", "b" and "p" can be huge (e.g. 100,000,000)
2. all arrays can be two dimensional which would require a zero lower boundary for the first dimension if option 2 is consideredaa
3. none of the arrays can be reallocated before entering the loop
4. the number and distribution of zeros in "p" is determined by the data at run time and their order cannot be changed. There might be no zeros at all.
5. non-zero values in "p" may or may not cover the full length array "a"
Any suggestions for another option are appreciated.
Thanhs
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't know whether it is "smart", but the WHERE statement/construct appear to be just what the doctor would have ordered. For example:
Program Test real, allocatable :: a(:), b(:) integer, allocatable :: p(:), f(:) allocate(b(5),p(5)) p=(/1,2,0,2,1/) b=(/0.5,0.1,0.5,0.3,0.1/) allocate(a(1:maxval(p)),source=0.0) where(p /= 0) a(p)=a(p)+b write(*,*) a deallocate(a) end Program Test
The code is certainly more readable as a result of using WHERE.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
from my perspective "where" is just a short-hand for a loop with an if branch. I suspect the performance isnt any better.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I suggest you take a look at the disassembly code of mecej4's suggestion (use larger arrays). The syntax he is using will likely enable the compiler to generate vectorized code using gather, test, masked add, masked scatter store.
If the density (frequency) of stores exceeds one per cache line, the code should run faster, if average of 1 store per cache line then same speed. If less than 1 store per cache line it still may be a tad faster.
Generate the code, run VTune, then expand the section using the Disassembly view.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
mecej4 wrote:I don't know whether it is "smart", but the WHERE statement/construct appear to be just what the doctor would have ordered. For example:
Program Test real, allocatable :: a(:), b(:) integer, allocatable :: p(:), f(:) allocate(b(5),p(5)) p=(/1,2,0,2,1/) b=(/0.5,0.1,0.5,0.3,0.1/) allocate(a(1:maxval(p)),source=0.0) where(p /= 0) a(p)=a(p)+b write(*,*) a deallocate(a) end Program TestThe code is certainly more readable as a result of using WHERE.
Good suggestion but note the Fortran standard states, "In each where-assignment-stmt, the mask-expr and the variable being defined shall be arrays of the same shape" which is not met per OP's shape declarations.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page