- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The following code works very slow when the arrays are declared allocatable
> real , dimension (:,:), allocatable :: p,pp
> allocate ( p(x,y) , pp(x,y) )
- allocatable test
ifx tttt.f90
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.2.0 Build 20240602
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.
Microsoft (R) Incremental Linker Version 14.32.31332.0
Copyright (C) Microsoft Corporation. All rights reserved.
-out:tttt.exe
-subsystem:console
tttt.obj
>tttt
Steps done = 1000
Steps done = 2000
Steps done = 3000
Steps done = 4000
Steps done = 5000
Steps done = 6000
Steps done = 7000
Steps done = 8000
Steps done = 9000
Steps done = 10000
Code time is = 89.00400 seconds
The code
module testing_pure
implicit none
contains
pure function pure_function ( r,i, j, pi, pj, mi, mj,x,y )
implicit none
integer , intent ( in ) :: x,y
real , dimension ( x, y), intent ( in ) :: r
real , dimension ( x, y ) :: pure_function
integer ,intent ( in ) :: i, j, pj, mj, pi,mi
pure_function(i,j) = r(pi,j) + r(mi,j) + r(i,mj) + r(i,pj) - 4.0*r(i,j)
end function pure_function
end module testing_pure
program test_pure_function
use testing_pure
implicit none
integer , parameter :: x = 100
integer , parameter :: y = 100
integer :: counter,i,j,pi,mi,pj,mj
integer :: rate, stop_count, start_count
real :: computed_time
!real, dimension (x,y) :: p,pp
real , dimension (:,:), allocatable :: p,pp
allocate ( p(x,y) , pp(x,y) )
call random_number (p)
! ==================
call system_clock (count=start_count , count_rate=rate)
do counter = 1, 10000
do j = 1, y
do i =1, x
pj = j + 1
mj = j - 1
pi = i + 1
mi = i - 1
if ( mi == 0 ) mi = x
if ( pi == ( x + 1 ) ) pi = 1
if ( mj == 0 ) mj = y
if ( pj == ( y + 1 ) ) pj = 1
!some computation
pp = pure_function ( p , i , j, pi, pj, mi, mj, x, y )
end do
end do
if ( mod( counter, 1000 ) .eq. 0 ) print *, ' Steps done = ', counter
end do
call system_clock (count=stop_count)
computed_time = real(max(stop_count - start_count , 1_8 )) /real(rate)
!=========
print*, ' Code time is = ', computed_time ,' seconds'
end program test_pure_function
- without allocatable
> real, dimension (x,y) :: p,pp
it runs very fast
ifx tttt.f90
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.2.0 Build 20240602
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.
Microsoft (R) Incremental Linker Version 14.32.31332.0
Copyright (C) Microsoft Corporation. All rights reserved.
-out:tttt.exe
-subsystem:console
tttt.obj
>tttt
Steps done = 1000
Steps done = 2000
Steps done = 3000
Steps done = 4000
Steps done = 5000
Steps done = 6000
Steps done = 7000
Steps done = 8000
Steps done = 9000
Steps done = 10000
Code time is = 6.1999999E-02 seconds
with gfortran compiler
there is no change in performance i.e.
- with allocatable
>gfortran tttt.f90 -o t
>t
Steps done = 1000
Steps done = 2000
Steps done = 3000
Steps done = 4000
Steps done = 5000
Steps done = 6000
Steps done = 7000
Steps done = 8000
Steps done = 9000
Steps done = 10000
Code time is = 0.968999982 seconds
- without allocatable
>gfortran tttt.f90 -o t
>t
Steps done = 1000
Steps done = 2000
Steps done = 3000
Steps done = 4000
Steps done = 5000
Steps done = 6000
Steps done = 7000
Steps done = 8000
Steps done = 9000
Steps done = 10000
Code time is = 1.07799995 seconds
So what makes it slow with allocatable arrays ?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are the allocates withing the timed loop? Repeat allocate/deallocate will always take some time. The load times may be different but you are not timing that. The saving with allocate is that you can grab the memory you need not your best case guess of max memory that you think might be needed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The arrays are already allocated before going to the time loop. I do not know if the allocatable attribute remains active for the whole code duration.
Also does that mean intel compiler (ifx) handles allocatable arrays differently than the gfortran ? I do not see any performance issues with gfortran. If that is the case how to make a portable code with allocatables ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Do you use heap arrays option? Allocates are always on the heap. Stack is usually faster but you are limited by stack size.
Something clearly goes wrong with the first test case! However I will note that having now looked at you code a smart optimiser could ditch your inner test loop as it does nothing other than set counter to the loop exit value if I have read that correctly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a newer version of the previous code.
Modification
I added pure subroutine and it was called from the time loop too.
The code (version 2)
! ifx allocatable array test
!
!
! version 2 :
! pure subroutine is used here.
module testing_pure
implicit none
contains
function pure_function ( r,i, j, pi, pj, mi, mj,x,y )
implicit none
integer , intent ( in ) :: x,y
real , dimension ( x, y), intent ( in ) :: r
real , dimension ( x, y ) :: pure_function
integer ,intent ( in ) :: i, j, pj, mj, pi,mi
pure_function(i,j) = r(pi,j) + r(mi,j) + r(i,mj) + r(i,pj) - 4.0*r(i,j)
end function pure_function
pure subroutine pure_test ( rr, r,i, j, pi, pj, mi, mj,x,y )
implicit none
integer , intent ( in ) :: x,y
real , dimension ( x, y), intent ( in ) :: r
real , dimension ( x, y ), intent (out) :: rr
integer ,intent ( in ) :: i, j, pj, mj, pi,mi
rr(i,j) = r(pi,j) + r(mi,j) + r(i,mj) + r(i,pj) - 4.0*r(i,j)
end subroutine pure_test
end module testing_pure
program test_pure_function
use testing_pure
implicit none
integer , parameter :: x = 100
integer , parameter :: y = 100
integer :: counter,i,j,pi,mi,pj,mj
integer :: rate, stop_count, start_count
real :: computed_time
! real, dimension (x,y) :: p,pp
real , dimension (:,:), allocatable :: p,pp
allocate ( p(x,y) , pp(x,y) )
call random_number (p)
! ==================
call system_clock (count=start_count , count_rate=rate)
do counter = 1, 10000
do j = 1, y
do i =1, x
pj = j + 1
mj = j - 1
pi = i + 1
mi = i - 1
if ( mi == 0 ) mi = x
if ( pi == ( x + 1 ) ) pi = 1
if ( mj == 0 ) mj = y
if ( pj == ( y + 1 ) ) pj = 1
!some computation
! pp = pure_function ( p , i , j, pi, pj, mi, mj, x, y )
call pure_test ( pp, p,i, j, pi, pj, mi, mj,x,y )
end do
end do
if ( mod( counter, 1000 ) .eq. 0 ) print *, ' Steps done = ', counter
end do
call system_clock (count=stop_count)
computed_time = real(max(stop_count - start_count , 1_8 )) /real(rate)
!=========
print*, ' Code time is = ', computed_time ,' seconds'
end program test_pure_function
test 1
I run the test as before
>ifx tttt1.f90
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.2.0 Build 20240602
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.
Microsoft (R) Incremental Linker Version 14.32.31332.0
Copyright (C) Microsoft Corporation. All rights reserved.
-out:tttt1.exe
-subsystem:console
tttt1.obj
>tttt1
Steps done = 1000
Steps done = 2000
Steps done = 3000
Steps done = 4000
Steps done = 5000
Steps done = 6000
Steps done = 7000
Steps done = 8000
Steps done = 9000
Steps done = 10000
Code time is = 0.1090000 seconds
It seems like the calling of pure routine has no performance loss with the same allocatable arrays. So it seems like it has something to do with function.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK I did your second test and indeed the function and subroutine versions have run times that are 1000x different!!
Just to be sure I added a couple of lines at the end (outside the timer) to print a random element of the pp array and that made no difference so there is not some code elimination optimisation anomaly.
Dear compiler team there is indeed something very BAD happening!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I used VTune on the allocatable version:
Does look like massive amounts of unnecessary memory copying. Its worrying that an allocatable array can cause this much performance loss. I expect many of us will be watching for a resolution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hmmm that last post made me look harder at the example. The function is returning a 100x100 real array but only making assignment to a single element within it, that doesn't seem to be a good thing to do. None the less it should be making 10000 assignments on function return. The loc of the array in the function is not the same loc as the pp array in the caller as one would expect.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The pure function coding style, while compiling the function, the compiler has no way of determining if the result of the function is possibly an alias of an input argument. Therefore, it copies the/an array. In the call coding style, the programmer's requirement is to not use argument aliases. I do not know if the Fortran Specification states anything about function output aliasing an input argument and therefore presumes it may be a possibility, thus requiring the copying of the array (use of array temporary).
This is presumption on my part.
Note, in your first code example (using function), add ", intent(out)"
real , dimension ( x, y ), intent (out) :: pure_function
See if this makes a difference.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried and received error message
intel
ifx tttt.f90
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.2.0 Build 20240602
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.
tttt.f90(21): error #6413: This global name is invalid in this context. [PURE_FUNCTION]
real , dimension ( x, y ), intent ( out) :: pure_function
----------------------------------------------------^
compilation aborted for tttt.f90 (code 1)
gfortran
gfortran tttt.f90
tttt.f90:16:4:
16 | function pure_function ( r,i, j, pi, pj, mi, mj,x,y )
| 1
Error: Symbol at (1) is not a DUMMY variable
tttt.f90:38:9:
38 | use testing_pure
| 1
Fatal Error: Cannot open module file 'testing_pure.mod' for reading at (1): No such file or directory
compilation terminated.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page