- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, When I compile with ifort 14.0.1 the following loop using an allocatable instead of a generic pointer for the val field of i1_t, I get a ~15-20% speedup. Culprit are extra address calculation instructions (2x LEA+IMUL, the loop being unrolled twice). Is ifort too conservative here, is there a "performance bug" ? Thank you in advance <!--break--> type i1_t integer, pointer :: val(:) ! speedup when replacing pointer with allocatable (...) end type i1_t do i = 1, size_outer elt => elts(i)%p ! type(elt) = i1_t do j = 1, size_inner array (elt%val(j)) = 0. end do end do In the full application (with a lot of structures i1_t or similar in hot routines), a global 30-40% performance improvement was measured by replacing the pointer keyword with the allocatable one. ifort O3 instead of O2 is not helping... A similar speedup is measured with gfortran 4.8.2 (the only difference being the loop is not unrolled by default). All required files are enclosed: kernel.f90 and driver.f90 $ ifort -fpp -g -O2 -c kernel.f90 -o kernel_pointer.o $ ifort -fpp -D ALLOC -g -O2 -c kernel.f90 -o kernel_allocatable.o $ ifort driver.f90 kernel_pointer.o -o pointer $ ifort driver.f90 kernel_allocatable.o -o alloc $ time ./pointer user 0m0.422s $ time ./alloc user 0m0.346s
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The gurus here will be able to give you a much better answer, but the reason for both allocatable arrays and pointers existing in Fortran is performance. Using allocatable arrays avoids pointer aliasing, which limits parallelism.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Trying to reformat via a new post (not possible to edit an existing one) and using another computer/browser (formatting lost)
Hello,
When I compile with ifort 14.0.1 the following loop using an allocatable instead of a generic pointer for the val field of i1_t, I get a ~15-20% speedup. Culprit are extra address calculation instructions (2x LEA+IMUL, the loop being unrolled twice). Is ifort too conservative here, is there a "performance bug" ?
Thank you in advance
<!--break-->
type i1_t
integer, pointer :: val(:) ! speedup when replacing pointer with allocatable
(...)
end type i1_t
do i = 1, size_outer
elt => elts(i)%p ! type(elt) = i1_t
do j = 1, size_inner
array (elt%val(j)) = 0.
end do
end do
In the full application (with a lot of structures i1_t or similar in hot routines), a global 30-40% performance improvement was measured by replacing the pointer keyword with the allocatable one. ifort O3 instead of O2 is not helping... A similar speedup is measured with gfortran 4.8.2 (the only difference being the loop is not unrolled by default). All required files are enclosed: kernel.f90 and driver.f90
$ ifort -fpp -g -O2 -c kernel.f90 -o kernel_pointer.o
$ ifort -fpp -D ALLOC -g -O2 -c kernel.f90 -o kernel_allocatable.o
$ ifort driver.f90 kernel_pointer.o -o pointer
$ ifort driver.f90 kernel_allocatable.o -o alloc
$ time ./pointer
user 0m0.422s
$ time ./alloc
user 0m0.346s
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sergio wrote:
The gurus here will be able to give you a much better answer, but the reason for both allocatable arrays and pointers existing in Fortran is performance. Using allocatable arrays avoids pointer aliasing, which limits parallelism.
Thanks, understood: it will be harder for the compiler to generate an optimal code.
My new questions are then:
- Is elt%val really considered by the compiler as a pointer to an array ? In that case, val(j+1) is contiguous to val(j) and no extra address computation instructions (LEA, IMUL...) should be required in the inner loop (but just base and index registers in memory operands, as with allocatable).
- In other words, for any compiler that must optimize safely, should aliasing be assumed in the inner loop of my example ? elt is constant and then elt%val is constant, only j is varying, from 1 to size_inner (stride 1).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm not clear whether your new questions assume an allocatable or pointer component.
As already mentioned, it is easier for the compiler to determine that an allocatable component isn't aliased with something else in the statement. From your code example I'm not so sure that's relevant here though.
An allocatable component is also always contiguous. A non-contiguous pointer component may not be - it could have a stride other than one. I think this is the issue here.
Note that you can declare a pointer component to be CONTIGUOUS (F2008), in which case it is always ... contiguous!
Separate to performance, allocatable components also have "code safety" benefits, in terms of automatic management of their lifetime.
As a general rule, if you don't need to point at something else, then don't use pointers.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
IanH,
I ran a test variant of eoseret's program where the val pointer was attributed with CONTIGUOUS. This made a very slight change in performance.
SINGLE_INDIR, USE_CONTIGUOUS, time
undef, undef, 3.395
def, undef, 3.392
undef, def, 3.392
def,def, 3.398
[fortran]
! PtrVsArray.f90
!
! FUNCTIONS:
! PtrVsArray - Entry point of console application.
!
!****************************************************************************
!
! PROGRAM: PtrVsArray
!
! PURPOSE: Entry point for the console application.
!
!****************************************************************************
module mymodule
type i1_t
#ifdef USE_CONTIGUOUS
integer, CONTIGUOUS, pointer :: val(:)
#else
integer, pointer :: val(:)
#endif
type(i1_t), pointer :: foo
end type i1_t
type elt_t
integer :: bar
type (i1_t), pointer :: p
end type elt_t
contains
subroutine myroutine(size_outer, size_inner, elts, array)
implicit none
integer :: size_outer, size_inner, i, j
#ifdef USE_CONTIGUOUS
type(elt_t), pointer :: elt
type(elt_t), CONTIGUOUS, pointer :: elts(:)
#else
type(elt_t), pointer :: elt, elts(:)
#endif
real, pointer :: array(:)
#ifdef SINGLE_INDIR
type (i1_t), pointer :: elt_p
#endif
do i = 1, size_outer
elt => elts(i)
#ifdef SINGLE_INDIR
elt_p => elt%p
#endif
do j = 1, size_inner
#ifdef SINGLE_INDIR
array (elt_p%val(j)) = 0.
#else
array (elt%p%val(j)) = 0.
#endif
end do
end do
end subroutine myroutine
end module mymodule
program PtrVsArray
use omp_lib
use mymodule
implicit none
real(8) :: t0
integer i, j
integer, target :: ind (1000)
type(i1_t), target :: i1 (1000)
type(elt_t), target :: elts (1000)
real, target :: array (1000)
type(elt_t), pointer :: elts_ptr(:)
real, pointer :: array_ptr(:)
elts_ptr => elts
array_ptr => array
do i = 1, 1000
ind(i) = i
i1(i)%val => ind
elts(i)%p => i1(i)
end do
t0 = omp_get_wtime()
do i = 1, 5000
call myroutine (1000, 1000, elts_ptr, array_ptr)
end do
t0 = omp_get_wtime() - t0
write(*,*) t0
end program PtrVsArray
[/fortran]
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page