- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is a question of programming style and effectiveness (speed). I have the option of creating two possible array accessors:
MODULE Stuff INTERFACE Get_Val MODULE PROCEDURE Get_Val_One, Get_Val_Three END INTERFACE Get_Val INTEGER, PRIVATE :: II, JJ, KK, IJK INTEGER, PARAMETER :: F = SELECTED_REAL_KIND(6, 30) REAL (KIND = F), DIMENSION(:), ALLOCATABLE, PRIVATE :: ValArray CONTAINS REAL (KIND = F) FUNCTION Get_Val_One(IDX) INTEGER, INTENT(IN) :: IDX Get_Val_One = ValArray(IDX) END FUNCTION Get_Val_One REAL (KIND = F) FUNCTION Get_Val_Three(I, J, K) INTEGER, INTENT(IN) :: I, J, K INTEGER :: IDX IDX = ((K - 1) * JJ + J - 1) * II + I Get_Val_Three = ValArray(IDX) END FUNCTION Get_Val_Three ! Other functions setting values of the ValArray and one- and three-dimensional array bounds END MODULE Stuff
The other array accessor would be the following:
MODULE OtherStuff INTEGER, PRIVATE :: II, JJ, KK, IJK INTEGER, PARAMETER :: F = SELECTED_REAL_KIND(6, 30) REAL (KIND = F), DIMENSION(:), ALLOCATABLE, PRIVATE :: ValArray CONTAINS REAL (KIND = F) FUNCTION Get_Val(I, J, K) INTEGER, INTENT(IN) :: I INTEGER, INTENT(IN), OPTIONAL :: J, K INTEGER :: IDX IF (PRESENT(J) .AND. PRESENT(K)) THEN IDX = ((K - 1) * JJ + J - 1) * II + I ELSE IDX = I END IF Get_Val = ValArray(IDX) END FUNCTION Get_Val ! Other routines setting ValArray values and one- and three-dimensional array bounds END MODULE OtherStuff
Is there any difference speed-wise and memory-footprint wise between the two options. Which option might be preferable?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In the first case, the compiler resolves the test for number of indexes. In the second case, when the code is not inlined, the number of argument tests incurs a test and branch (some overhead). When inlined, you would expect the compiler to elide the dead code and be as efficient as the first method (but there is no guarantee that this would be the case). I would use the first method.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you use the first version, how you do the selection between calling with one argument and calling with three arguments is what will impact efficiency most.
Presumably, there are some nested i-j-k loops from within which the functions will be called. The test for selection should be in the outermost loop possible. If, in fact, the test can be made completely outside the loops, that would be even better.
The second version is easier to understand, but it requires the test to be made every time. It will probably not be efficient. However, efficiency would be an issue only if the functions are expected to be called millions of times or more.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The methods also beg the question as to whether to define the array as one- or three-dimensional. The one-dimensional form seemed natural. The code I have is using the INTERFACE form, with the multiplicative index replaced by an actual INTEGER array value from another module. Again I am uncertain if this is helping, but the numbers are on the order of 10^6 calls for a simulation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Mecej4,
The forst method is by way of generic interface. Both the following are valid with the first method:
X = Get_Val(I)
Y = Get_Val(I,J,K)
The compiler choses the appropriate function (Get_Val_One or Get_Val_Three) based on arguments.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jim, I understood that. What I meant was: the efficiency depends on whether the calls are coded as
1. Efficient version:
IF (condition) THEN
do i=1,N
X(i) = Get_Val(I)
end do
ELSE
do i=1,N1; do j=1,N2; do k=1,N3
Y(i,j,k) = Get_Val(i,j,k)
end do; end do; end do
OR
2. Less efficient:
do i=1,N1; do j=1,N2; do k=1,N3
IF(cond)THEN
X(i)=Get_Val(i)
ELSE
Y(i,j,k) = Get_Val(i,j,k)
ENDIF
end do; end do; end do
With trivial code such as this, the optimizer may be able to make the "less efficient" version equally efficient as the other one, but with more complex code with hundred of lines can it do the same?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Any sane programmer would not program al la 2. You could potentially have N2*N3 number of
X(i) = Get_Val(i)
for a given i (or multiple i's for that matter NotNotThatItMatters)
While the compiler may be able to rearrange the code, especially with loop invariant code, why rely on the compiler to fix your mistakes in programming?
Back to the original post.
I have a program that I am consulting on for optimization that has a collection of 4D arrays, at least 7 such arrays, all allocated to the same dimensions. A significant portion of time is spent in the subscript overhead of extracting from each array descriptor the same values to compute the same relative offset from the allocation. By use of a similar technique, the 1D equivalent index can be calculated once, then use (re used) 6 more times.
Steve (if you are following),
In thinking about this project that I am working on, it seems to me that it is not unusual for applications to have multiple multi-dimension arrays of the same rank and extents and for the program to contain a series of expressions containing the same indices. In situations like this, it may be beneficial to have a compiler directive to aid in optimization:
!*** HYPOTHETICAL ***
!DIR$ ASSUME_SAME_RANK_AND_SIZE :: FEE, FI, FO, FUM
X = FEE(I,J,K,L) + FI (I,J,K,L)+ FO(I,J,K,L) + FUM(I,J,K,L)
Then the above could compute the relative index once, and re-use it 3 additional times. In the above example, the array subscripting overhead for each array might be ~4x that of the overhead to use the value (assuming in cache). This feature, for the above statement, might yield a 2x improvement in performance (for that statement). Weights 5+5+5+5 verses 5+2+2+2
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Interesting idea, Jim, but I'm uncertain how that would work with vectorization and parallelization.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I do not see any issue with vectorization nor parallelization. In both cases (vector lanes and/or array slices) use the same indicies. With a little re-work of the above statement we have
DO I=1,N
X(I) = FEE(I,J,K,L) + FI (I,J,K,L)+ FO(I,J,K,L) + FUM(I,J,K,L)
ENDDO
Loosely vectorized to
DO I=1,N,8
X(I:I+7) = FEE(I:I+7,J,K,L) + FI (I:I+7,J,K,L)+ FO(I:I+7,J,K,L) + FUM(I:I+7,J,K,L)
ENDDO
And in parallel form
!$OMP PARALLEL DO
DO I=1,N
! Thread 0 has I as 1:N/nThreads
! Thread 1 has I as 1+N/nThreads:2*N/nThreads
! ...
X(I) = FEE(I,J,K,L) + FI (I,J,K,L)+ FO(I,J,K,L) + FUM(I,J,K,L)
ENDDO
All array references using the same relative offsets. The code would only need to reach into one of the array descriptors to calculate the relative offset, and do this once, then for each reference, add this relative offset to the base address (also held in the array descriptor). This assumes the availability of the directive discussed above. As you do with ASSUME_ALIGNED, it is the programmer's responsibility to not mislead the compiler by giving a false directive. The only thing keeping the compiler from doing this now, is it (usually) does not know the UBOUNDs and LBOUNDs are the same. When these arrays are declared statically (fixed dimensions), I venture to assume the compiler does collapse the relative subscript calculation to occur but once (for the above scenario). i.e., the only thing missing is a means for the compiler to know these arrays are dimensioned the same... ergo the proffered !DIR$ --- eh.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I must point out Jim's comment is the case in my code as well. There are times I need to access multiple arrays at the same relative index simultaneously. The old school method was to turn the 3-D arrays into 1-D arrays and step linearly through them, thereby maintaining the relative index. Without the compiler directive, that would appear to be the only way in.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page