- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi there,
I realized that the program below, which implements a linked list, will differ in its ram requirement conditional on using the compiler flag "-parallel"
Module mod_ll
Type :: llele
integer*8 :: a,b
type(llele), pointer :: next=>null()
end type llele
Type :: container
integer*8 :: length
Type(llele), pointer :: start=>null(), end=>null()
contains
procedure, pass :: add => Subadd
end type container
contains
Subroutine SubAdd(this,val1,val2)
Implicit none
class(container), intent(inout) :: this
Integer*8, intent(in) :: val1, val2
if(.not.associated(this%start)) then
allocate(this%start)
this%end=>this%start
this%start%a=val1
this%start%b=val2
else
allocate(this%end%next)
this%end=>this%end%next
this%end%a=val1
this%end%b=val2
End if
this%length=this%length+1
end Subroutine SubAdd
End Module mod_ll
Program Test
use mod_ll, only: container
Type(container), allocatable :: xx
integer*8 :: i
allocate(xx)
Do i=1,50000000
call xx%add(i,i)
end Do
read(*,*)
end Program Test
Compiling with
ifort -O3 -o Test Test.f90
will more than double the ram demand compared to compiling with
ifort -O3 -parallel -o Test Test.f90
Tested on linux kernel 4.14 with ifort 17.05, the first will require about 3.7GB of the ram, whereas the later 1.5GB. I measured the ram with "top".
Any reasons for that!?
Thanks
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
IIF your code is indeed run in parallel. it is not thread-safe. To be thread-safe (in parallel), the interior body of the add subroutine would have to be in a critical section (as written).
As to if this is causing the excess memory consumption, I cannot say. An additional cause can be the stack requirements for the additional threads. How many threads will be created and what are the stack requirements for each thread?
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jim,
thanks for the comment.
However, it implies that you understood the version compiled with "-parallel" has the excessive memory usage. But it is exactly the opposite. The one compiled WITHOUT "-parallel" has the excessive memory usage.
I am aware that the code cannot be run in parallel, and when excuted, invariable whether "-parallel" was set, no multiple thread usage occured (I assume the compiler did not find anything to parallelize, which was also not intended).
Cheers
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Type llele is 24 bytes (assuming 64-bit build).
Each node being an allocatable, with memory node overhead of at least 2 size_t pointers (16 bytes),
Minimally the node load is 40 bytes, but heap granularity can be 16 bytes, ergo node load is likely 48 bytes.
50 million allocations requires 2,400 million bytes, ~2.4GB.
Therefor the run showing 1.5GB ram in use must be in error.
Reversing the logic
1.5GB / 50M nodes = 30 bytes/node (assuming 0B for program and stack)
Even if the node allocation load was node data + (hidden) size_t (iow no link), the 50M allocations would exceed 1.5GB.
Something is amiss in the top figure.
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page