Quote:Sergio wrote:

eoseret · ‎01-15-2014

Hello,
When I compile with ifort 14.0.1 the following loop using an allocatable instead of a generic pointer for the val field of i1_t, I get a ~15-20% speedup. Culprit are extra address calculation instructions (2x LEA+IMUL, the loop being unrolled twice). Is ifort too conservative here, is there a "performance bug" ?
Thank you in advance
<!--break-->
type i1_t
integer, pointer :: val(:) ! speedup when replacing pointer with allocatable
(...)
end type i1_t

do i = 1, size_outer
  elt => elts(i)%p ! type(elt) = i1_t

  do j = 1, size_inner
    array (elt%val(j)) = 0.
  end do
end do

In the full application (with a lot of structures i1_t or similar in hot routines), a global 30-40% performance improvement was measured by replacing the pointer keyword with the allocatable one. ifort O3 instead of O2 is not helping... A similar speedup is measured with gfortran 4.8.2 (the only difference being the loop is not unrolled by default).
All required files are enclosed: kernel.f90 and driver.f90

$ ifort -fpp -g -O2 -c kernel.f90 -o kernel_pointer.o
$ ifort -fpp -D ALLOC -g -O2 -c kernel.f90 -o kernel_allocatable.o
$ ifort driver.f90 kernel_pointer.o -o pointer
$ ifort driver.f90 kernel_allocatable.o -o alloc
$ time ./pointer
user    0m0.422s
$ time ./alloc
user    0m0.346s

Sergio · ‎01-15-2014

The gurus here will be able to give you a much better answer, but the reason for both allocatable arrays and pointers existing in Fortran is performance. Using allocatable arrays avoids pointer aliasing, which limits parallelism.

eoseret · ‎01-15-2014

Trying to reformat via a new post (not possible to edit an existing one) and using another computer/browser (formatting lost)
Hello,
When I compile with ifort 14.0.1 the following loop using an allocatable instead of a generic pointer for the val field of i1_t, I get a ~15-20% speedup. Culprit are extra address calculation instructions (2x LEA+IMUL, the loop being unrolled twice). Is ifort too conservative here, is there a "performance bug" ?
Thank you in advance


type i1_t
integer, pointer :: val(:) ! speedup when replacing pointer with allocatable
(...)
end type i1_t

do i = 1, size_outer
elt => elts(i)%p ! type(elt) = i1_t
do j = 1, size_inner
array (elt%val(j)) = 0.
end do
end do

In the full application (with a lot of structures i1_t or similar in hot routines), a global 30-40% performance improvement was measured by replacing the pointer keyword with the allocatable one. ifort O3 instead of O2 is not helping... A similar speedup is measured with gfortran 4.8.2 (the only difference being the loop is not unrolled by default). All required files are enclosed: kernel.f90 and driver.f90

$ ifort -fpp -g -O2 -c kernel.f90 -o kernel_pointer.o
$ ifort -fpp -D ALLOC -g -O2 -c kernel.f90 -o kernel_allocatable.o
$ ifort driver.f90 kernel_pointer.o -o pointer
$ ifort driver.f90 kernel_allocatable.o -o alloc

$ time ./pointer
user 0m0.422s
$ time ./alloc
user 0m0.346s

eoseret · ‎01-15-2014

Sergio wrote:

The gurus here will be able to give you a much better answer, but the reason for both allocatable arrays and pointers existing in Fortran is performance. Using allocatable arrays avoids pointer aliasing, which limits parallelism.

Thanks, understood: it will be harder for the compiler to generate an optimal code.

My new questions are then:

Is elt%val really considered by the compiler as a pointer to an array ? In that case, val(j+1) is contiguous to val(j) and no extra address computation instructions (LEA, IMUL...) should be required in the inner loop (but just base and index registers in memory operands, as with allocatable).
In other words, for any compiler that must optimize safely, should aliasing be assumed in the inner loop of my example ? elt is constant and then elt%val is constant, only j is varying, from 1 to size_inner (stride 1).

IanH · ‎01-15-2014

I'm not clear whether your new questions assume an allocatable or pointer component.

As already mentioned, it is easier for the compiler to determine that an allocatable component isn't aliased with something else in the statement. From your code example I'm not so sure that's relevant here though.

An allocatable component is also always contiguous. A non-contiguous pointer component may not be - it could have a stride other than one. I think this is the issue here.

Note that you can declare a pointer component to be CONTIGUOUS (F2008), in which case it is always ... contiguous!

Separate to performance, allocatable components also have "code safety" benefits, in terms of automatic management of their lifetime.

As a general rule, if you don't need to point at something else, then don't use pointers.

jimdempseyatthecove · ‎01-16-2014

IanH,

I ran a test variant of eoseret's program where the val pointer was attributed with CONTIGUOUS. This made a very slight change in performance.

SINGLE_INDIR, USE_CONTIGUOUS, time
undef, undef, 3.395
def, undef, 3.392
undef, def, 3.392
def,def, 3.398

[fortran]

! PtrVsArray.f90
!
! FUNCTIONS:
! PtrVsArray - Entry point of console application.
!

!****************************************************************************
!
! PROGRAM: PtrVsArray
!
! PURPOSE: Entry point for the console application.
!
!****************************************************************************
module mymodule

type i1_t
#ifdef USE_CONTIGUOUS
     integer, CONTIGUOUS, pointer :: val(:)
#else
     integer, pointer :: val(:)
#endif
     type(i1_t), pointer :: foo
end type i1_t

type elt_t
integer :: bar
type (i1_t), pointer :: p
end type elt_t

contains

subroutine myroutine(size_outer, size_inner, elts, array)

implicit none

    integer :: size_outer, size_inner, i, j
#ifdef USE_CONTIGUOUS
    type(elt_t), pointer :: elt
    type(elt_t), CONTIGUOUS, pointer :: elts(:)
#else
    type(elt_t), pointer :: elt, elts(:)
#endif
    real, pointer :: array(:)
#ifdef SINGLE_INDIR
    type (i1_t), pointer :: elt_p
#endif

    do i = 1, size_outer
       elt => elts(i)
#ifdef SINGLE_INDIR
       elt_p => elt%p
#endif

       do j = 1, size_inner
#ifdef SINGLE_INDIR
          array (elt_p%val(j)) = 0.
#else
          array (elt%p%val(j)) = 0.
#endif
       end do
    end do

end subroutine myroutine

end module mymodule

program PtrVsArray
use omp_lib
use mymodule

implicit none
real(8) :: t0
integer i, j
integer, target :: ind (1000)
type(i1_t), target :: i1 (1000)
type(elt_t), target :: elts (1000)
real, target :: array (1000)
type(elt_t), pointer :: elts_ptr(:)
real, pointer :: array_ptr(:)

elts_ptr => elts
array_ptr => array

do i = 1, 1000
     ind(i) = i
     i1(i)%val => ind
     elts(i)%p => i1(i)
end do
    t0 = omp_get_wtime()
do i = 1, 5000
     call myroutine (1000, 1000, elts_ptr, array_ptr)
end do
   t0 = omp_get_wtime() - t0
    write(*,*) t0
    end program PtrVsArray
[/fortran]

Jim Dempsey

Slowdown when using pointer instead of allocatable