- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
from what I understand ifort 19.04 supports types having allocatable type components of its own. This would allow to create linked list which can be easily copied via "move_alloc" and having deep deallocation. An example:
module testt !!list element type :: lle integer(kind=8) :: i type(lle), allocatable :: next end type lle !!list container type llc integer(kind=8) :: n=0 type(lle), allocatable :: start type(lle), pointer :: last=>null() contains procedure :: addend=>subaddend procedure :: addstart=>subaddstart end type llc contains subroutine subaddend(this,i) class(llc), intent(inout), target :: this integer(kind=8), intent(in) :: i if(.not.allocated(this%start)) Then allocate(this%start) this%last=>this%start else allocate(this%last%next) this%last=>this%last%next end if this%last%i=i this%n=this%n+1 end subroutine subaddend subroutine subaddstart(this,i) class(llc), intent(inout), target :: this integer(kind=8), intent(in) :: i type(lle), allocatable :: x if(.not.allocated(this%start)) Then allocate(this%start) else call move_alloc(this%start,x) allocate(this%start) call move_alloc(x,this%start%next) end if this%start%i=i this%n=this%n+1 end subroutine subaddstart end module testt program test use testt implicit none type(llc), allocatable :: x integer(kind=8) :: k,i,j=100000000 Do k=1,5 write(*,*) k allocate(x) do i=1,j call x%addstart(i) !call x%addend(i) end do deallocate(x) end Do end program test
I run this program on a computer with 32GB of RAM.
subroutine "addend" adds to the end of the list via an end pointer, subroutine "addstart" adds to the beginning of the list without using any pointer.
Compiled with gfortran 9.1 the memory usage builds up in round 1 to ~40% in round 1 and then remains there over subsequent iterations implying that deep deallocation works. This happens invariably of whether I use "addstart" or "addend".
Compiled with ifort 19.04 when using "addend" the deallocate statement did not seem have an effect as the program quickly flooded all available RAM.
Using "addstart" it still used ~80% of the memory which builds up over the first two rounds, then it remains at ~80%.
My suspicion is that this is a bug in ifort, but maybe somebody can confirm or point to what I did wrong before I launch the bug report.
Thanks
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I could at least get ifort to work by adding a finalizer and a deleter
module testt !!the list element type :: lle integer(kind=8) :: i type(lle), allocatable :: next end type lle !!the container type llc integer(kind=8) :: n=0 type(lle), allocatable :: start type(lle), pointer :: last=>null() contains procedure :: addend=>subaddend procedure :: addstart=>subaddstart procedure :: del => subdel Final :: subfinal end type llc contains subroutine subaddend(this,i) class(llc), intent(inout), target :: this integer(kind=8), intent(in) :: i if(.not.allocated(this%start)) Then allocate(this%start) this%last=>this%start else allocate(this%last%next) this%last=>this%last%next end if this%last%i=i this%n=this%n+1 end subroutine subaddend subroutine subaddstart(this,i) class(llc), intent(inout), target :: this integer(kind=8), intent(in) :: i type(lle), allocatable :: x if(.not.allocated(this%start)) Then allocate(this%start) else call move_alloc(this%start,x) allocate(this%start) call move_alloc(x,this%start%next) end if this%start%i=i this%n=this%n+1 end subroutine subaddstart Subroutine subdel(this) class(llc), intent(inout), target :: this integer :: i type(lle), allocatable :: x do if(allocated(this%start%next)) Then call move_alloc(this%start%next,x) deallocate(this%start) call move_alloc(x,this%start) else deallocate(this%start) exit end if end do this%n=0 nullify(this%last) end Subroutine subdel Subroutine Subfinal(this) type(llc), intent(inout) :: this call this%del() end Subroutine Subfinal end module testt program test use testt implicit none type(llc), allocatable :: x,y integer(kind=8) :: k,i,j=100000000 real(kind=8) :: t0,t1 Do k=1,5 write(*,*) k allocate(x) call cpu_time(t0) do i=1,j call x%addend(i) end do call cpu_time(t1) write(*,*) "add ",t1-t0 call cpu_time(t0) deallocate(x) call cpu_time(t1) write(*,*) "delete ",t1-t0 end Do end program test
The additional subdel was necessary to circumvent a segfault in gfortran which occured when the subdel code was put into the finalizer directly.
However, what struck me most was the performance difference. The program was compiled with -03 only. gfortran needed ~10% of the RAM and had this timing:
add 2.6331760000000002 delete 0.74638599999999977 2 add 0.95624600000000015 delete 0.87373999999999974 3 add 1.1742239999999997 delete 0.74350200000000033 4 add 1.4223770000000009 delete 0.83331300000000041 5 add 0.87081300000000006 delete 0.73836900000000050
ifort 19.04 needed 20% of the RAM and had this timing:
1 add 4.32092700000000 delete 3.47127200000000 2 add 2.19527500000000 delete 4.37747900000000 3 add 2.19691300000000 delete 3.48525300000000 4 add 2.17025300000000 delete 4.34131900000000 5 add 2.15258800000000 delete 3.50221600000000
This is a performance difference of almost factor 3.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
may.ka wrote:.. My suspicion is that this is a bug in ifort, but maybe somebody can confirm or point to what I did wrong before I launch the bug report. ..
I fail to notice any bug in Intel Fortran with the code in the original post.
The standard for Fortran leaves a lot of flexibility to the processors and for the shown code, it seems to me Intel Fortran does its "thing" i.e., whatever it deems appropriate with the high frequency allocations and move_alloc's but which a coder may not "like" per the code design being followed. The user may be able to take it up with the vendor not as a bug report but as a feature request.
By the way, the code in the original post fails with gfortran 10.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FortranFan wrote:I fail to notice any bug in Intel Fortran with the code in the original post.
Does that mean you run the code or you looked at it???
Why should the program go to 19% RAM usage during add, and then use another +60% (so in total +80%) during deallocation??
I cannot imagine that this is the default behavior intended by Intel.
Cheers
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>Why should the program go to 19% RAM usage during add, and then use another +60% (so in total +80%) during deallocation??
This will depend on the heap manager .AND. the method you use to determine how much memory is used. If you are using the Virtual Memory footprint, this is the total memory touched since process start and not the amount of free heap (available virtual memory addresses touched or not).
The increase on deletion (~3x that of initial allocation) would seem to indicate that the method use is (unintentionally) performing allocations during deallocation. Possibly by performing an object copy operation during deallocation. Note, the total number of visible and unseen allocations equal the total number of visible and unseen deallocations, but due to the technique use in the heap manager, the effect is to walk virtual memory. IOW an object freed, while returning to the heap, is intentionally (by design) not immediately available for reallocation. This is either done for performance reasons on allocations and/or security reasons (to hide stale data). If on Windows, you have an option to choose to use LFH (Low Fragmentation Heap). I do not recall how to do this on Linux without googling it.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When run under windows, runtime memory diagnostics conducted at the level of the underlying heap manager show no evidence of a leak. As others have noticed, heap managers can still suffer from this like fragmentation even if all objects have been released, and typically a heap manager does not return memory to an operating system, so at a lower level you may still see significant memory use being reported against the program even if the Fortran runtime is doing the right thing.
On the Windows platform, the deallocation of the container structure is done using recursive calls. For a deeply nested structure, this series of calls may result in a stack overflow. I suspect some of your observations relate to this use of recursion.
Use of recursion to deallocate a recursive structure seems reasonable to me. On the other hand, a recursive structure with several million levels of recursion is extreme. I don't consider this a compiler bug.
Use of the addend procedure would be non-conforming in the code as otherwise shown - the actual argument to the call does not have the target attribute, so pointers referencing components in the actual argument become undefined when the procedure completes.
Be careful defining non-default-kind integers with large values that do not have the appropriate kind suffix.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
thanks for the comments. As an additional information I tested the program on Linux, kernel 5.3. I monitored the memory usage with "top". Now one may argue that that does not give the full/true picture, but lets just stick at an even more "superficial" level.
Below are two programs for the linked list, both doing exactly the same, one with pointer, one with allocatable.
With "superficial" I mean that when running with pointers, RAM usages peaks at ~30% (of 32GB) and the computer is still usable. When running with allocatable, the RAM usage is so bad that kswapd sets in and the whole computer freezes. I am happy that you guys telling me that everything is fine ........... but the above observation renders the allocatable implementation useless. Thus it cannot do what it is meant to do .......... replacing the pointer implementation and with that allowing for deep copying and deep deallocation.
Here the allocatable version (pretty much the same as in the op, but kind=8 replaced by iso_fortran and the number of list elements increased to 150,000,000)
module testt use, intrinsic :: iso_fortran_env, only: int64 implicit none !!the list element type :: lle integer(int64) :: i type(lle), allocatable :: next end type lle !!the container type llc integer(int64) :: n=0 type(lle), allocatable :: start type(lle), pointer :: last=>null() contains procedure :: addend=>subaddend procedure :: addstart=>subaddstart end type llc contains subroutine subaddend(this,i) class(llc), intent(inout), target :: this integer(int64), intent(in) :: i if(.not.allocated(this%start)) Then allocate(this%start) this%last=>this%start else allocate(this%last%next) this%last=>this%last%next end if this%last%i=i;this%n=this%n+1 end subroutine subaddend subroutine subaddstart(this,i) class(llc), intent(inout), target :: this integer(int64), intent(in) :: i type(lle), allocatable :: x if(.not.allocated(this%start)) Then allocate(this%start) else call move_alloc(this%start,x) allocate(this%start) call move_alloc(x,this%start%next) end if this%start%i=i;this%n=this%n+1 end subroutine subaddstart end module testt program test use testt implicit none type(llc), allocatable :: x,y integer(int64) :: k,i,j=150000000 real(real64) :: t0,t1 Do k=1,5 write(*,*) k allocate(x) call cpu_time(t0) do i=1,j call x%addend(i) end do call cpu_time(t1) write(*,*) "add ",t1-t0 call cpu_time(t0) deallocate(x) call cpu_time(t1) write(*,*) "delete ",t1-t0 end Do end program test
The pointer version. Accounting for the pointers it has a finalizer, but the rest is pretty much the same as the allocatable version:
module testt use, intrinsic :: iso_fortran_env, only: int64, real64 implicit none !!the list element type :: lle integer(int64) :: i type(lle), pointer :: next=>null() end type lle !!the container type llc integer(int64) :: n=0 type(lle), pointer :: start=>null(), last=>null() contains procedure :: addend=>subaddend procedure :: addstart=>subaddstart procedure :: del => subdel Final :: subfinal end type llc contains subroutine subaddend(this,i) class(llc), intent(inout), target :: this integer(int64), intent(in) :: i if(.not.associated(this%start)) Then allocate(this%start) this%last=>this%start else allocate(this%last%next) this%last=>this%last%next end if this%last%i=i;this%n=this%n+1 end subroutine subaddend subroutine subaddstart(this,i) class(llc), intent(inout), target :: this integer(int64), intent(in) :: i type(lle), pointer :: x=>null() if(.not.associated(this%start)) Then allocate(this%start) else x=>this%start;nullify(this%start) allocate(this%start);this%start%next=>x end if this%start%i=i;this%n=this%n+1 end subroutine subaddstart Subroutine subdel(this) class(llc), intent(inout), target :: this type(lle), pointer :: x=>null() x=>this%start do if(associated(this%start%next)) Then this%start=>this%start%next deallocate(x);x=>this%start else deallocate(this%start);exit end if end do nullify(this%start,this%last) end Subroutine subdel Subroutine Subfinal(this) type(llc), intent(inout) :: this call this%del() end Subroutine Subfinal end module testt program test use testt implicit none type(llc), allocatable :: x,y integer(int64) :: k,i,j=150000000 real(real64) :: t0,t1 Do k=1,5 write(*,*) k allocate(x) call cpu_time(t0) do i=1,j call x%addend(i) end do call cpu_time(t1) write(*,*) "add ",t1-t0 call cpu_time(t0) deallocate(x) call cpu_time(t1) write(*,*) "delete ",t1-t0 end Do end program test
BTW, 150,000,000 elements in a linked list is standard in my applications and given that fortran is meant to be THE hpc language my assumption would be that it can easily cope with it (and the pointer version indeed does).
Thanks again for the comments.
cheers
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Conceptually, you don't have a linked list if you are using an allocatable component, you have a (very) deeply nested, very large object of derived type. The value of the container object is the entire structure. Concepts do not always map through to implementation, but it does not surprise me that you are running into issues here.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think this may be an exemplar of CS courses instruction that recursion should be favored over iteration. IOW ease of programming is to be made over costs in resource consumption. It is clear to see from the two methods listed in post #7 that the iterative process is less resource intensive, and for this type of usage, should be implemented as opposed to the recursive technique.
I will leave it as an exercise for may.ka to modify the allocatable implementation to use an iterative deallocate, and post back here the results.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
thanks for the comments
jimdempseyatthecove (Blackbelt) wrote:I will leave it as an exercise for may.ka to modify the allocatable implementation to use an iterative deallocate, and post back here the results.
Jim Dempsey
have a look at post #2.
Cheers
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>have a look at post #2
Why did you use move_alloc???
Why not iteratively:
1) locate last node and next to last node
2) delete last node by way of next to last node
3) condition former next to last node now into last node
Note, if you maintain a pointer to prior node, you can avoid multiple step 1)'s
While you could recursively search for last node, then delete on pop of recursion. this still presents a problem of excessive use of stack.
For linked lists, using pointer's alone is much less resource intensive than having recursive allocatables.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jim.
I used "move_alloc" to have an implementation which is almost pointer-free (except last).
My interpretation of "move_alloc" is that "move_alloc" is the same as moving a pointer except that the pointee can be owned by one pointer only. In that sense an allocatable entity is an unbreakable combination of a pointer and a resource. The only option to assign another pointer includes nullifing the initial pointer (e.g. "move_alloc"). If that interpretation holds, the complexity of "move_alloc" should be the same as assigning a pointer.
Anyway, in the #2 implementation I delete the list from the start and iterate towards the end. Since it is still a single linked list, though via allocation, I cannot see how one could delete the list from the end without recursion. If you don't mind you can change the example #2 to make it more clear what you mean.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
while list not empty
1) locate last node and next to last node
2) delete last node by way of next to last node
3) condition former next to last node now into last node
end while
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
may.ka wrote:Quote:
FortranFan wrote:
I fail to notice any bug in Intel Fortran with the code in the original post.
Does that mean you run the code or you looked at it???
Why should the program go to 19% RAM usage during add, and then use another +60% (so in total +80%) during deallocation??
I cannot imagine that this is the default behavior intended by Intel.
Cheers
My point is I do not think what someone, say OP, might think is a less "efficient" compiler implementation of the Fortran 2008 feature of "allocatable components of recursive type" relative to their expectation or with memory/CPU usage of another compiler to be a bug. Sure someone, like OP, can contact the vendor such as Intel and say, hey code built using that vendor's development toolchain is not competitive in terms of whatever performance measurements they have relative to whatever benchmark they can agree upon mutually and work out the next steps. But that ain't a bug per se.
Looking at the code in the original post, I don't see Intel Fortran compiler having a bug or causing a memory leak, etc. One might view Intel Fortran's treatment of this facility - allocatable components of recursive type - as being way more demanding than it needs to be in terms of system memory availability and utilization, say, gfortran on some OS platforms resulting in slower performance but as I indicate above, it's a different argument. I do not notice a memory leak.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page