Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28334 Discussions

## a smart(er) way to deal with zero positions in vector subscripts .... any suggestions

Beginner
360 Views

Hi,

the code below aggregates values from vector b into vector a using vector p as a position index for a:

```Program Test
real, allocatable :: a(:), b(:)
integer, allocatable :: p(:), f(:)
integer :: i,j
allocate(b(5),p(5))
p=(/1,2,0,2,1/)
b=(/0.5,0.1,0.5,0.3,0.1/)
!!option 1
allocate(a(1:maxval(p)),source=0.0)
Do i=1,size(b,1)
if(p(i)==0) cycle
a(p(i))=a(p(i))+b(i)
End Do
write(*,*) a
!!end option 1
deallocate(a)
!!option 2
allocate(a(0:maxval(p)),source=0.0)
Do i=1,size(b,1)
a(p(i))=a(p(i))+b(i)
End Do
a(0)=0.0
write(*,*) a(1:ubound(a,1))
!!end option 2
deallocate(a)
!!option 3
allocate(a(maxval(p)),source=0.0)
allocate(f(count(p>0)))
f=(/1,2,4,5/)
Do i=1,size(f,1)
a(p(f(i)))=a(p(f(i)))+b(f(i))
End Do
write(*,*) a
!!end option 3
end Program Test```

The value of zero in index vector p indicates that the related value in b should be skipped.

To my knowledge there are three options to achieve this: a) an if branch, b) starting vector "a" at position zero and c) an additional position vector "f". The first has a drawback of possibly difficult to predict branching. The second has the problem that the position zero of "a" has to be loaded into the cpu every time a zero occures and subsequent operations on that array are more tricky to optimize due to the "weird" array boundaries. The third option poses an additional overhead in terms of memory due to having to allocate memory for "f". In addition there might be some low down due to double indexing.

There are some boundary conditions to this problem:

1. arrays "a", "b" and "p" can be huge (e.g. 100,000,000)

2. all arrays can be two dimensional which would require a zero lower boundary for the first dimension if option 2 is consideredaa

3. none of the arrays can be reallocated before entering the loop

4. the number and distribution of zeros in "p" is determined by the data at run time and their order cannot be changed. There might be no zeros at all.

5. non-zero values in "p" may or may not cover the full length array "a"

Any suggestions for another option are appreciated.

Thanhs

4 Replies
Black Belt
360 Views

I don't know whether it is "smart", but the WHERE statement/construct appear to be just what the doctor would have ordered. For example:

```Program Test
real, allocatable :: a(:), b(:)
integer, allocatable :: p(:), f(:)
allocate(b(5),p(5))
p=(/1,2,0,2,1/)
b=(/0.5,0.1,0.5,0.3,0.1/)
allocate(a(1:maxval(p)),source=0.0)
where(p /= 0) a(p)=a(p)+b
write(*,*) a
deallocate(a)
end Program Test```

The code is certainly more readable as a result of using WHERE.

Beginner
360 Views

Hi,

from my perspective "where" is just a short-hand for a loop with an if branch. I suspect the performance isnt any better.

Black Belt
360 Views

I suggest you take a look at the disassembly code of mecej4's suggestion (use larger arrays). The syntax he is using will likely enable the compiler to generate vectorized code using gather, test, masked add, masked scatter store.

If the density (frequency) of stores exceeds one per cache line, the code should run faster, if average of 1 store per cache line then same speed. If less than 1 store per cache line it still may be a tad faster.

Generate the code, run VTune, then expand the section using the Disassembly view.

Jim Dempsey

Honored Contributor II
360 Views

mecej4 wrote:

I don't know whether it is "smart", but the WHERE statement/construct appear to be just what the doctor would have ordered. For example:

```Program Test
real, allocatable :: a(:), b(:)
integer, allocatable :: p(:), f(:)
allocate(b(5),p(5))
p=(/1,2,0,2,1/)
b=(/0.5,0.1,0.5,0.3,0.1/)
allocate(a(1:maxval(p)),source=0.0)
where(p /= 0) a(p)=a(p)+b
write(*,*) a
deallocate(a)
end Program Test```

The code is certainly more readable as a result of using WHERE.

Good suggestion but note the Fortran standard states, "In each where-assignment-stmt, the mask-expr and the variable being defined shall be arrays of the same shape" which is not met per OP's shape declarations.