- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
from what I understand ifort 19.04 supports types having allocatable type components of its own. This would allow to create linked list which can be easily copied via "move_alloc" and having deep deallocation. An example:
module testt
!!list element
type :: lle
integer(kind=8) :: i
type(lle), allocatable :: next
end type lle
!!list container
type llc
integer(kind=8) :: n=0
type(lle), allocatable :: start
type(lle), pointer :: last=>null()
contains
procedure :: addend=>subaddend
procedure :: addstart=>subaddstart
end type llc
contains
subroutine subaddend(this,i)
class(llc), intent(inout), target :: this
integer(kind=8), intent(in) :: i
if(.not.allocated(this%start)) Then
allocate(this%start)
this%last=>this%start
else
allocate(this%last%next)
this%last=>this%last%next
end if
this%last%i=i
this%n=this%n+1
end subroutine subaddend
subroutine subaddstart(this,i)
class(llc), intent(inout), target :: this
integer(kind=8), intent(in) :: i
type(lle), allocatable :: x
if(.not.allocated(this%start)) Then
allocate(this%start)
else
call move_alloc(this%start,x)
allocate(this%start)
call move_alloc(x,this%start%next)
end if
this%start%i=i
this%n=this%n+1
end subroutine subaddstart
end module testt
program test
use testt
implicit none
type(llc), allocatable :: x
integer(kind=8) :: k,i,j=100000000
Do k=1,5
write(*,*) k
allocate(x)
do i=1,j
call x%addstart(i)
!call x%addend(i)
end do
deallocate(x)
end Do
end program test
I run this program on a computer with 32GB of RAM.
subroutine "addend" adds to the end of the list via an end pointer, subroutine "addstart" adds to the beginning of the list without using any pointer.
Compiled with gfortran 9.1 the memory usage builds up in round 1 to ~40% in round 1 and then remains there over subsequent iterations implying that deep deallocation works. This happens invariably of whether I use "addstart" or "addend".
Compiled with ifort 19.04 when using "addend" the deallocate statement did not seem have an effect as the program quickly flooded all available RAM.
Using "addstart" it still used ~80% of the memory which builds up over the first two rounds, then it remains at ~80%.
My suspicion is that this is a bug in ifort, but maybe somebody can confirm or point to what I did wrong before I launch the bug report.
Thanks
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I could at least get ifort to work by adding a finalizer and a deleter
module testt
!!the list element
type :: lle
integer(kind=8) :: i
type(lle), allocatable :: next
end type lle
!!the container
type llc
integer(kind=8) :: n=0
type(lle), allocatable :: start
type(lle), pointer :: last=>null()
contains
procedure :: addend=>subaddend
procedure :: addstart=>subaddstart
procedure :: del => subdel
Final :: subfinal
end type llc
contains
subroutine subaddend(this,i)
class(llc), intent(inout), target :: this
integer(kind=8), intent(in) :: i
if(.not.allocated(this%start)) Then
allocate(this%start)
this%last=>this%start
else
allocate(this%last%next)
this%last=>this%last%next
end if
this%last%i=i
this%n=this%n+1
end subroutine subaddend
subroutine subaddstart(this,i)
class(llc), intent(inout), target :: this
integer(kind=8), intent(in) :: i
type(lle), allocatable :: x
if(.not.allocated(this%start)) Then
allocate(this%start)
else
call move_alloc(this%start,x)
allocate(this%start)
call move_alloc(x,this%start%next)
end if
this%start%i=i
this%n=this%n+1
end subroutine subaddstart
Subroutine subdel(this)
class(llc), intent(inout), target :: this
integer :: i
type(lle), allocatable :: x
do
if(allocated(this%start%next)) Then
call move_alloc(this%start%next,x)
deallocate(this%start)
call move_alloc(x,this%start)
else
deallocate(this%start)
exit
end if
end do
this%n=0
nullify(this%last)
end Subroutine subdel
Subroutine Subfinal(this)
type(llc), intent(inout) :: this
call this%del()
end Subroutine Subfinal
end module testt
program test
use testt
implicit none
type(llc), allocatable :: x,y
integer(kind=8) :: k,i,j=100000000
real(kind=8) :: t0,t1
Do k=1,5
write(*,*) k
allocate(x)
call cpu_time(t0)
do i=1,j
call x%addend(i)
end do
call cpu_time(t1)
write(*,*) "add ",t1-t0
call cpu_time(t0)
deallocate(x)
call cpu_time(t1)
write(*,*) "delete ",t1-t0
end Do
end program test
The additional subdel was necessary to circumvent a segfault in gfortran which occured when the subdel code was put into the finalizer directly.
However, what struck me most was the performance difference. The program was compiled with -03 only. gfortran needed ~10% of the RAM and had this timing:
add 2.6331760000000002
delete 0.74638599999999977
2
add 0.95624600000000015
delete 0.87373999999999974
3
add 1.1742239999999997
delete 0.74350200000000033
4
add 1.4223770000000009
delete 0.83331300000000041
5
add 0.87081300000000006
delete 0.73836900000000050
ifort 19.04 needed 20% of the RAM and had this timing:
1
add 4.32092700000000
delete 3.47127200000000
2
add 2.19527500000000
delete 4.37747900000000
3
add 2.19691300000000
delete 3.48525300000000
4
add 2.17025300000000
delete 4.34131900000000
5
add 2.15258800000000
delete 3.50221600000000
This is a performance difference of almost factor 3.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
may.ka wrote:.. My suspicion is that this is a bug in ifort, but maybe somebody can confirm or point to what I did wrong before I launch the bug report. ..
I fail to notice any bug in Intel Fortran with the code in the original post.
The standard for Fortran leaves a lot of flexibility to the processors and for the shown code, it seems to me Intel Fortran does its "thing" i.e., whatever it deems appropriate with the high frequency allocations and move_alloc's but which a coder may not "like" per the code design being followed. The user may be able to take it up with the vendor not as a bug report but as a feature request.
By the way, the code in the original post fails with gfortran 10.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FortranFan wrote:I fail to notice any bug in Intel Fortran with the code in the original post.
Does that mean you run the code or you looked at it???
Why should the program go to 19% RAM usage during add, and then use another +60% (so in total +80%) during deallocation??
I cannot imagine that this is the default behavior intended by Intel.
Cheers
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>Why should the program go to 19% RAM usage during add, and then use another +60% (so in total +80%) during deallocation??
This will depend on the heap manager .AND. the method you use to determine how much memory is used. If you are using the Virtual Memory footprint, this is the total memory touched since process start and not the amount of free heap (available virtual memory addresses touched or not).
The increase on deletion (~3x that of initial allocation) would seem to indicate that the method use is (unintentionally) performing allocations during deallocation. Possibly by performing an object copy operation during deallocation. Note, the total number of visible and unseen allocations equal the total number of visible and unseen deallocations, but due to the technique use in the heap manager, the effect is to walk virtual memory. IOW an object freed, while returning to the heap, is intentionally (by design) not immediately available for reallocation. This is either done for performance reasons on allocations and/or security reasons (to hide stale data). If on Windows, you have an option to choose to use LFH (Low Fragmentation Heap). I do not recall how to do this on Linux without googling it.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When run under windows, runtime memory diagnostics conducted at the level of the underlying heap manager show no evidence of a leak. As others have noticed, heap managers can still suffer from this like fragmentation even if all objects have been released, and typically a heap manager does not return memory to an operating system, so at a lower level you may still see significant memory use being reported against the program even if the Fortran runtime is doing the right thing.
On the Windows platform, the deallocation of the container structure is done using recursive calls. For a deeply nested structure, this series of calls may result in a stack overflow. I suspect some of your observations relate to this use of recursion.
Use of recursion to deallocate a recursive structure seems reasonable to me. On the other hand, a recursive structure with several million levels of recursion is extreme. I don't consider this a compiler bug.
Use of the addend procedure would be non-conforming in the code as otherwise shown - the actual argument to the call does not have the target attribute, so pointers referencing components in the actual argument become undefined when the procedure completes.
Be careful defining non-default-kind integers with large values that do not have the appropriate kind suffix.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
thanks for the comments. As an additional information I tested the program on Linux, kernel 5.3. I monitored the memory usage with "top". Now one may argue that that does not give the full/true picture, but lets just stick at an even more "superficial" level.
Below are two programs for the linked list, both doing exactly the same, one with pointer, one with allocatable.
With "superficial" I mean that when running with pointers, RAM usages peaks at ~30% (of 32GB) and the computer is still usable. When running with allocatable, the RAM usage is so bad that kswapd sets in and the whole computer freezes. I am happy that you guys telling me that everything is fine ........... but the above observation renders the allocatable implementation useless. Thus it cannot do what it is meant to do .......... replacing the pointer implementation and with that allowing for deep copying and deep deallocation.
Here the allocatable version (pretty much the same as in the op, but kind=8 replaced by iso_fortran and the number of list elements increased to 150,000,000)
module testt
use, intrinsic :: iso_fortran_env, only: int64
implicit none
!!the list element
type :: lle
integer(int64) :: i
type(lle), allocatable :: next
end type lle
!!the container
type llc
integer(int64) :: n=0
type(lle), allocatable :: start
type(lle), pointer :: last=>null()
contains
procedure :: addend=>subaddend
procedure :: addstart=>subaddstart
end type llc
contains
subroutine subaddend(this,i)
class(llc), intent(inout), target :: this
integer(int64), intent(in) :: i
if(.not.allocated(this%start)) Then
allocate(this%start)
this%last=>this%start
else
allocate(this%last%next)
this%last=>this%last%next
end if
this%last%i=i;this%n=this%n+1
end subroutine subaddend
subroutine subaddstart(this,i)
class(llc), intent(inout), target :: this
integer(int64), intent(in) :: i
type(lle), allocatable :: x
if(.not.allocated(this%start)) Then
allocate(this%start)
else
call move_alloc(this%start,x)
allocate(this%start)
call move_alloc(x,this%start%next)
end if
this%start%i=i;this%n=this%n+1
end subroutine subaddstart
end module testt
program test
use testt
implicit none
type(llc), allocatable :: x,y
integer(int64) :: k,i,j=150000000
real(real64) :: t0,t1
Do k=1,5
write(*,*) k
allocate(x)
call cpu_time(t0)
do i=1,j
call x%addend(i)
end do
call cpu_time(t1)
write(*,*) "add ",t1-t0
call cpu_time(t0)
deallocate(x)
call cpu_time(t1)
write(*,*) "delete ",t1-t0
end Do
end program test
The pointer version. Accounting for the pointers it has a finalizer, but the rest is pretty much the same as the allocatable version:
module testt
use, intrinsic :: iso_fortran_env, only: int64, real64
implicit none
!!the list element
type :: lle
integer(int64) :: i
type(lle), pointer :: next=>null()
end type lle
!!the container
type llc
integer(int64) :: n=0
type(lle), pointer :: start=>null(), last=>null()
contains
procedure :: addend=>subaddend
procedure :: addstart=>subaddstart
procedure :: del => subdel
Final :: subfinal
end type llc
contains
subroutine subaddend(this,i)
class(llc), intent(inout), target :: this
integer(int64), intent(in) :: i
if(.not.associated(this%start)) Then
allocate(this%start)
this%last=>this%start
else
allocate(this%last%next)
this%last=>this%last%next
end if
this%last%i=i;this%n=this%n+1
end subroutine subaddend
subroutine subaddstart(this,i)
class(llc), intent(inout), target :: this
integer(int64), intent(in) :: i
type(lle), pointer :: x=>null()
if(.not.associated(this%start)) Then
allocate(this%start)
else
x=>this%start;nullify(this%start)
allocate(this%start);this%start%next=>x
end if
this%start%i=i;this%n=this%n+1
end subroutine subaddstart
Subroutine subdel(this)
class(llc), intent(inout), target :: this
type(lle), pointer :: x=>null()
x=>this%start
do
if(associated(this%start%next)) Then
this%start=>this%start%next
deallocate(x);x=>this%start
else
deallocate(this%start);exit
end if
end do
nullify(this%start,this%last)
end Subroutine subdel
Subroutine Subfinal(this)
type(llc), intent(inout) :: this
call this%del()
end Subroutine Subfinal
end module testt
program test
use testt
implicit none
type(llc), allocatable :: x,y
integer(int64) :: k,i,j=150000000
real(real64) :: t0,t1
Do k=1,5
write(*,*) k
allocate(x)
call cpu_time(t0)
do i=1,j
call x%addend(i)
end do
call cpu_time(t1)
write(*,*) "add ",t1-t0
call cpu_time(t0)
deallocate(x)
call cpu_time(t1)
write(*,*) "delete ",t1-t0
end Do
end program test
BTW, 150,000,000 elements in a linked list is standard in my applications and given that fortran is meant to be THE hpc language my assumption would be that it can easily cope with it (and the pointer version indeed does).
Thanks again for the comments.
cheers
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Conceptually, you don't have a linked list if you are using an allocatable component, you have a (very) deeply nested, very large object of derived type. The value of the container object is the entire structure. Concepts do not always map through to implementation, but it does not surprise me that you are running into issues here.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think this may be an exemplar of CS courses instruction that recursion should be favored over iteration. IOW ease of programming is to be made over costs in resource consumption. It is clear to see from the two methods listed in post #7 that the iterative process is less resource intensive, and for this type of usage, should be implemented as opposed to the recursive technique.
I will leave it as an exercise for may.ka to modify the allocatable implementation to use an iterative deallocate, and post back here the results.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
thanks for the comments
jimdempseyatthecove (Blackbelt) wrote:I will leave it as an exercise for may.ka to modify the allocatable implementation to use an iterative deallocate, and post back here the results.
Jim Dempsey
have a look at post #2.
Cheers
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>have a look at post #2
Why did you use move_alloc???
Why not iteratively:
1) locate last node and next to last node
2) delete last node by way of next to last node
3) condition former next to last node now into last node
Note, if you maintain a pointer to prior node, you can avoid multiple step 1)'s
While you could recursively search for last node, then delete on pop of recursion. this still presents a problem of excessive use of stack.
For linked lists, using pointer's alone is much less resource intensive than having recursive allocatables.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jim.
I used "move_alloc" to have an implementation which is almost pointer-free (except last).
My interpretation of "move_alloc" is that "move_alloc" is the same as moving a pointer except that the pointee can be owned by one pointer only. In that sense an allocatable entity is an unbreakable combination of a pointer and a resource. The only option to assign another pointer includes nullifing the initial pointer (e.g. "move_alloc"). If that interpretation holds, the complexity of "move_alloc" should be the same as assigning a pointer.
Anyway, in the #2 implementation I delete the list from the start and iterate towards the end. Since it is still a single linked list, though via allocation, I cannot see how one could delete the list from the end without recursion. If you don't mind you can change the example #2 to make it more clear what you mean.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
while list not empty
1) locate last node and next to last node
2) delete last node by way of next to last node
3) condition former next to last node now into last node
end while
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
may.ka wrote:Quote:
FortranFan wrote:
I fail to notice any bug in Intel Fortran with the code in the original post.
Does that mean you run the code or you looked at it???
Why should the program go to 19% RAM usage during add, and then use another +60% (so in total +80%) during deallocation??
I cannot imagine that this is the default behavior intended by Intel.
Cheers
My point is I do not think what someone, say OP, might think is a less "efficient" compiler implementation of the Fortran 2008 feature of "allocatable components of recursive type" relative to their expectation or with memory/CPU usage of another compiler to be a bug. Sure someone, like OP, can contact the vendor such as Intel and say, hey code built using that vendor's development toolchain is not competitive in terms of whatever performance measurements they have relative to whatever benchmark they can agree upon mutually and work out the next steps. But that ain't a bug per se.
Looking at the code in the original post, I don't see Intel Fortran compiler having a bug or causing a memory leak, etc. One might view Intel Fortran's treatment of this facility - allocatable components of recursive type - as being way more demanding than it needs to be in terms of system memory availability and utilization, say, gfortran on some OS platforms resulting in slower performance but as I indicate above, it's a different argument. I do not notice a memory leak.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page