memory demand of linked list differs conditional on -parallel compiler flag

may_ka · ‎12-09-2017

Hi there,

I realized that the program below, which implements a linked list, will differ in its ram requirement conditional on using the compiler flag "-parallel"

Module mod_ll
  Type :: llele
    integer*8 :: a,b
    type(llele), pointer :: next=>null()
  end type llele
  Type :: container
    integer*8 :: length
    Type(llele), pointer :: start=>null(), end=>null()
  contains
    procedure, pass :: add => Subadd
  end type container
contains
  Subroutine SubAdd(this,val1,val2)
    Implicit none
    class(container), intent(inout) :: this
    Integer*8, intent(in) :: val1, val2
    if(.not.associated(this%start)) then
      allocate(this%start)
      this%end=>this%start
      this%start%a=val1
      this%start%b=val2
    else
      allocate(this%end%next)
      this%end=>this%end%next
      this%end%a=val1
      this%end%b=val2
    End if
    this%length=this%length+1
  end Subroutine SubAdd
End Module mod_ll
Program Test
  use mod_ll, only: container
  Type(container), allocatable :: xx
  integer*8 :: i
  allocate(xx)
  Do i=1,50000000
    call xx%add(i,i)
  end Do
  read(*,*)
end Program Test

Compiling with

ifort -O3 -o Test Test.f90

will more than double the ram demand compared to compiling with

ifort -O3 -parallel -o Test Test.f90

Tested on linux kernel 4.14 with ifort 17.05, the first will require about 3.7GB of the ram, whereas the later 1.5GB. I measured the ram with "top".

Any reasons for that!?

Thanks

jimdempseyatthecove · ‎12-09-2017

IIF your code is indeed run in parallel. it is not thread-safe. To be thread-safe (in parallel), the interior body of the add subroutine would have to be in a critical section (as written).

As to if this is causing the excess memory consumption, I cannot say. An additional cause can be the stack requirements for the additional threads. How many threads will be created and what are the stack requirements for each thread?

Jim Dempsey

may_ka · ‎12-09-2017

Hi Jim,

thanks for the comment.

However, it implies that you understood the version compiled with "-parallel" has the excessive memory usage. But it is exactly the opposite. The one compiled WITHOUT "-parallel" has the excessive memory usage.

I am aware that the code cannot be run in parallel, and when excuted, invariable whether "-parallel" was set, no multiple thread usage occured (I assume the compiler did not find anything to parallelize, which was also not intended).

Cheers

jimdempseyatthecove · ‎12-10-2017

Type llele is 24 bytes (assuming 64-bit build).
Each node being an allocatable, with memory node overhead of at least 2 size_t pointers (16 bytes),
Minimally the node load is 40 bytes, but heap granularity can be 16 bytes, ergo node load is likely 48 bytes.
50 million allocations requires 2,400 million bytes, ~2.4GB.

Therefor the run showing 1.5GB ram in use must be in error.

Reversing the logic

1.5GB / 50M nodes = 30 bytes/node (assuming 0B for program and stack)

Even if the node allocation load was node data + (hidden) size_t (iow no link), the 50M allocations would exceed 1.5GB.

Something is amiss in the top figure.

Jim Dempsey