- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello!
I have a question. The optimization manual
(https://www.umr-cnrm.fr/gmapdoc/IMG/pdf/general_fortran_optimization_guide_manual.pdf)
says the following (page 8):
Let us now reconsider the same kind of loop where the arrays are dummy arguments :
REAL, INTENT(OUT) :: A(N)
REAL, INTENT(IN) :: B(N)
REAL, INTENT(IN) :: C(N)
DO J=1,N
A(J)=B(J)*C(J)
ENDDO
That loop has a good spacial locality because the explicit dimensionning of the dummy arrays instructs the compiler that the data in each array are contiguous in memory.
However the Fortran language offer the possibility to write the same code with implicit declarations, which can make the code more robust against bugs :
REAL, INTENT(OUT) :: A(:)
REAL, INTENT(IN) :: B(:)
REAL, INTENT(IN) :: C(:)
A(:)=B(:)*C(:)
Unfortunately that implicit shaped declarations also instruct the compiler that these dummy arrays are not necessarily contiguous in memory. The spacial locality of data being unknown to the compiler optimizer, it may prefer to disable any data prefetching rather to risk cache misses. Consequently that loop will run slower than the former one.
!!!!!Therefore, Implicit shaped declarations should not be used!!!!!!
---------------------------------------------------------------------------
Can you tell me if that last recommendation is correct?
I assumed that the performance for the both examples would be the same.
Thanks.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Andrew_ ,
Firstly, I suggest you check the compiled object code generated by the compiler and evaluate it for performance. In addition to Intel tools, you may also find Godbolt useful in case you have not used it: https://godbolt.org/
Then I suggest you consider the CONTIGUOUS attribute in the language and whether the compiler is able to take advantage of it:
real, contiguous, intent(out) :: a(:)
real, contiguous, intent(in) :: b(:)
real, contiguous, intent(in) :: c(:)
a = b*c
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, I want to try it that way. I develop numerical methods and publish articles in journals. Consequently, the computational performance of the computation is critical. Using implicit setting of array bounds is very convenient and allows you to reduce the number of errors in a program. However, if it excludes the possibility of optimization, this case will be unacceptable for numerical methods.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I will further add that the implicit array will have an array descriptor . The format and contents of that descriptor are compiler (processor) dependant. However, the descriptor will have all sorts of useful info like the strides of the dimensions and in the case of intel there is a flag for contiguous or not. The author in saying don't use A = B*C because the compiler might be dumb in some instances. This might be true of some compilers at some point in history but things change continuously. I am not convinced about spending my time unrolling implied loops manually, I will leave that to the compiler to decide and if my code is slow I will focus only on the hot spots from profilers.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The question is as follows. If the array to be passed is not continuous in memory, what will the compiler do? Create a continuous temporary copy of the array? If the compiler will create a continuous temporary copy of the array, that's fine. In this case, you can assume that a continuous array is always passed inside the procedure, and the compiler can perform code optimization as efficiently as possible.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Normally creating temporary copies has a large overhead cost ( allocate some memory and copy to and from) so is often not fine. If there is no explicit interface and you are doing Fortran 77 style array passing then you are in big big trouble if it is not continuous. memory so I don't thing that is a reasonable comparison to make. If you are using POD (plain old data types) and not passing none-contiguous slices I would trust the compiler to make sensible decisions and not constrain the code based on presumptions of compiler behaviour. If hot spot analysis show some bottlenecks then focus on understanding those.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Even if assumed-shape declarations cause slower code than explicit shapes (do you have actual measurements of that?), then you can still use them, as, like you say, they tend to make the interface simpler and less error-prone:
subroutine do_something( array )
real, dimension(:) :: array
call do_something_private( array, size(array) )
...
end subroutine do_something
subroutine do_something_private( array, n )
real, dimension(n) :: array
...
end subroutine do_something_private
By only exposing the simplified interface and doing the actual work in a private routine, you ought to get the best of both worlds. Of course there will be a slight overhead for the extra call, but if that is of concern, the workload will be very small indeed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, this is an interesting way to overcome the problem. But it doesn't look good, because debugging numerical procedures is very difficult. Also, programs usually include mpi, openmp, and such code redundancy is a nightmare. If I had one procedure, I would use the above method, but there are dozens of such procedures, and this approach adds bugs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think one can try to pass an array X(1:N:2) to the subroutine.
Then use run-time environment to see if a temporary array will be created.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Look at the optimisation reports.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found the answer.
There are two cases.
CASE 1----------------------------------
subroutine do_something( array )
real, dimension(:),contiguous :: array
print *,is_contiguous(array)
end subroutine do_something
CASE 2----------------------------------
subroutine do_something( array )
real, dimension(:) , contiguous :: array
print *,is_contiguous(array)
end subroutine do_something
In the first case,
the array will always be continuous. However, if the Array parameter is non-contiguous, a temporary array will be created.
In the second case, a temporary array is never created when a parameter is passed. However,
print *,is_contiguous(array) will be FALSE.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page