Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Implicit shaped declarations

Andrew_
Beginner
759 Views

Hello!

I have a question. The optimization manual

(https://www.umr-cnrm.fr/gmapdoc/IMG/pdf/general_fortran_optimization_guide_manual.pdf)

says the following (page 8):

Let us now reconsider the same kind of loop where the arrays are dummy arguments :


REAL, INTENT(OUT) :: A(N)
REAL, INTENT(IN) :: B(N)
REAL, INTENT(IN) :: C(N)
DO J=1,N
A(J)=B(J)*C(J)
ENDDO


That loop has a good spacial locality because the explicit dimensionning of the dummy arrays instructs the compiler that the data in each array are contiguous in memory.
However the Fortran language offer the possibility to write the same code with implicit declarations, which can make the code more robust against bugs :


REAL, INTENT(OUT) :: A(:)
REAL, INTENT(IN) :: B(:)
REAL, INTENT(IN) :: C(:)
A(:)=B(:)*C(:)


Unfortunately that implicit shaped declarations also instruct the compiler that these dummy arrays are not necessarily contiguous in memory. The spacial locality of data being unknown to the compiler optimizer, it may prefer to disable any data prefetching rather to risk cache misses. Consequently that loop will run slower than the former one.

!!!!!Therefore, Implicit shaped declarations should not be used!!!!!!

---------------------------------------------------------------------------

Can you tell me if that last recommendation is correct?

I assumed that the performance for the both examples would be the same.

Thanks.

 

 

0 Kudos
11 Replies
andrew_4619
Honored Contributor II
740 Views
Any coding where decisions have to be made at run time rather than compile time might involve a little more work runtime. That said in this case I would not worry as any good compiler will optimise both and they get better all the time. Focus first on robust code IMO.
FortranFan
Honored Contributor II
717 Views

@Andrew_ ,

Firstly, I suggest you check the compiled object code generated by the compiler and evaluate it for performance.  In addition to Intel tools, you may also find Godbolt useful in case you have not used it: https://godbolt.org/

Then I suggest you consider the CONTIGUOUS attribute in the language and whether the compiler is able to take advantage of it:

      real, contiguous, intent(out) :: a(:)
      real, contiguous, intent(in)  :: b(:)
      real, contiguous, intent(in)  :: c(:)
      a = b*c
Andrew_
Beginner
703 Views

Yes, I want to try it that way. I develop numerical methods and publish articles in journals. Consequently, the computational performance of the computation is critical. Using implicit setting of array bounds is very convenient and allows you to reduce the number of errors in a program. However, if it excludes the possibility of optimization, this case will be unacceptable for numerical methods.

andrew_4619
Honored Contributor II
708 Views

I will further add that the implicit array will have an array descriptor . The format and contents of that descriptor are compiler (processor) dependant. However,  the descriptor  will have all sorts of useful info like the strides of the  dimensions  and in the case of intel there is a flag for contiguous or not. The author in saying don't use A = B*C because the compiler might be dumb in some instances.  This might be true of some compilers at some point in history but things change continuously. I am not convinced about spending my time unrolling implied loops manually, I will leave that to the compiler to decide and if my code is slow I will focus only on the hot spots from profilers.

Andrew_
Beginner
655 Views

The question is as follows. If the array to be passed is not continuous in memory, what will the compiler do? Create a continuous temporary copy of the array? If the compiler will create a continuous temporary copy of the array, that's fine. In this case, you can assume that a continuous array is always passed inside the procedure, and the compiler can perform code optimization as efficiently as possible.



andrew_4619
Honored Contributor II
646 Views

Normally creating temporary copies has a large overhead cost ( allocate some memory and copy to and from) so is often not fine. If there is no explicit interface and you are doing Fortran 77 style array passing then  you are in big big trouble if it is not continuous. memory so I don't thing that is a reasonable comparison to make.   If you are using POD (plain old data types) and not passing none-contiguous slices  I would trust the compiler to make sensible decisions and not constrain the code based on presumptions of compiler behaviour.  If hot spot analysis show some bottlenecks then focus on understanding those.

Arjen_Markus
Honored Contributor I
681 Views

Even if assumed-shape declarations cause slower code than explicit shapes (do you have actual measurements of that?), then you can still use them, as, like you say, they tend to make the interface simpler and less error-prone:

subroutine do_something( array )
    real, dimension(:) :: array
    call do_something_private( array, size(array) )
    ...
end subroutine do_something
subroutine do_something_private( array, n )
    real, dimension(n) :: array
    ...
end subroutine do_something_private

By only exposing the simplified interface and doing the actual work in a private routine, you ought to get the best of both worlds. Of course there will be a slight overhead for the extra call, but if that is of concern, the workload will be very small indeed.

Andrew_
Beginner
654 Views

Yes, this is an interesting way to overcome the problem. But it doesn't look good, because debugging numerical procedures is very difficult. Also, programs usually include mpi, openmp, and such code redundancy is a nightmare. If I had one procedure, I would use the above method, but there are dozens of such procedures, and this approach adds bugs.

Andrew_
Beginner
654 Views

I think one can try to pass an array X(1:N:2) to the subroutine.

Then use run-time environment to see if a temporary array will be created.

andrew_4619
Honored Contributor II
646 Views

Look at the optimisation reports.

Andrew_
Beginner
591 Views

I found the answer.

There are two cases.

CASE 1----------------------------------

subroutine do_something( array )

real, dimension(:),contiguous :: array

print *,is_contiguous(array) 

end subroutine do_something

 

CASE 2----------------------------------

subroutine do_something( array )

real, dimension(:) , contiguous :: array

print *,is_contiguous(array)

end subroutine do_something

 

In the first case,

the array will always be continuous. However, if the Array parameter is non-contiguous, a temporary array will be created.

 

In the second case, a temporary array is never created when a parameter is passed. However,
print *,is_contiguous(array) will be FALSE.

Reply