Re: Do reused global arrays need to be deallocated?

j0e · ‎05-03-2009

Hi,

To solve stack overflow problems when using openMP, I have basically rewritten old serial code so that work and output arrays are dynamically allocated. Since the output arrays need to live outside of the procedure, they are placed in a module with a threadprivate directive. When I first coded, arrays would be deallocated at the end of an inner iteration; however, because the arrays are large (> GB often), there seems to be significant overhead associated with reallocating arrays over and over again in the outer loop.

Since the arrays never change size, I removed the deallocation step, and only check for array allocation, and allocate an array if a thread needs it. This has removed the overhead associated with allocating/deallocating, because even when a thread goes idle, it retains the large arrays.

My question is: can this result in memory leaks? I have notice some creep in memory size during a run, but it is acceptable, at least for short runs that last only a few hours. Also, I'm not sure yet if the increase in memory size with runtime is associated with not deallocating the arrays of interest here. My intuition tells me that since the array are just being reused, there shouldn't be a problem, but that intuition counts for little.

cheers,
-joe

TimP · ‎05-03-2009

Explicit deallocation was required in f90, while f95 makes it happen automatically upon exit from the subroutine where the allocation is made. So, as long as the allocations are checked (by STAT and ERRMSG), it seems you should be OK.

j0e · ‎05-03-2009

Since the arrays are in a module, they are global and not deallocated automatically when leaving a procedure. Where things become less clear to me is what occurs within parallel regions of code were each thread allocates similar arrays but within the treads scope (i.e., threadprivate clause). When the parallel region is exited, all the threads go idle (except master) until the parallel region is reentered. It appears that the idle threads still have the global arrays allocated, so they don't end up reallocating space when the parallel region is reentered. Since I don't know how threads handle private variables, I don't know if additional memory gets sucked up.

TimP · ‎05-03-2009

OpenMP has a rule about persistence of data in threadprivate between parallel regions, which implies those don't get reallocated.

jimdempseyatthecove · ‎05-03-2009

Joe,

Your thread private allocations should be immune from memory leaks assuming you are not doing something foolish such as

real, pointer :: A(:)
...
allocate(A(12345))
...
NULLIFY(A)

Memory leaks can be mis-diagnosed when an application footprint is observed to be growing. The size of an application, as reported by Windows Task Manager, is NOT the size of the code + static data + allocated data. Instead, the reported size is: size of code + static data + committed heap. Where the committed heap contains both allocated memory + previously allocated but nowreturned memory + residual to heap expansion granularity (granularity isat least one page file page, but potentially several page file pages).

Depending on allocation and deallocation sequence and depending on design of the deallocation, the only recourse may be for memory requirements is to grow:

allocate(A(10000))
allocate(B(100))
deallocate(A)
allocate(C(100))
allocate(A(10000))

Depending on how the deallocations occure, C might reside inside where the first A was allocated. Then subsequent allocation of A will come out of previously unallocated space past B. This symptom can at times be such that the only option is to grow virtual memory. Sometimes this can be corrected by simply swapping the order of allocations and deallocations, but often the root cause is harder to correct.

One technique to avoid this is to use a memory allocation/deallocation that maintain pools of similar allocations. In the above, the deallocation of A would go into a pool of nodes of the size of what A used to be allocated. This pool would be reserved until a subsequent allocation of same or near same size allocation is requested. The allocation of C would come from either a previous allocation of and array of 100 elements (*4/8?) or from the general heap. Only when the general heap reaches some level or other tuning parameter would a consideration be made to break up a larger free node sitting in a pool(usualy done by returning a(some) node(s) from private pools back into the general heap).

Consider looking a sourceforge.net or codeproject.com for examples of pooling memory allocators. This will require a little rework of your code.

If you elect to use a pooling memory allocator you may have an inclination to have each thread maintain seperate pools. This is fine, and will improve performance, up until you enter a situation where one thread habitually allocates and a different thread habitually deallocates. This would result in an accumulation of deleted nodes in the deallocation thread. Your coding (or that from which you select) must be sensitive to this charactristic.

Jim Dempsey

j0e · ‎05-03-2009

Thanks Tim and Jim.

Jim: I'll take a look at the pooling memory allocator model you mentioned. Thanks for the pointers.

-joe