Dynamic Array Memory Problem

Frank_W_1 · ‎05-24-2013

I have defined a Type Called A_Type which has dynamic array inside,

then I construct a dynamic array of A_Type,

When I test the memory I could use ,

I find the available memory decreases after I deallocate all the dynamic array.

Why? The phenomenon looks like memory leak.

Is there any suggestion ?

Frank_W_1 · ‎05-24-2013

The test example

Steven_L_Intel1 · ‎05-24-2013

I don't see evidence of a leak here, and Intel Inspector XE's memory analysis does not indicate a leak. What may be happening is that memory is fragmented by the repeated allocation and deallocation of smaller objects. It may be that the memory pool is not coalescing adjacent free blocks perfectly.

John_Campbell · ‎05-24-2013

Frank,

I tested your example and changed it to recycle 20 times. This identifies that memory is being lost, probably due to the defragmentation as Steve has indicated.
I changed the de-allocate loop from " DO K=1,M" to " DO K=M,1,-1" and this reduced the amount of memory loss taking place. This would indicate that defragmentation of the available memory table is part of the problem, but this change to releasing in the opposite order to allocating did not eliminate it. I have found problems related to this in the past.

The test I carried out was on another 32-bit compiler, which demonstrates the problem. I suspect ifort will perform in a similar way. This example also demostrates that with Windows 7-64 OS, that 32-bit programs can get up to 3.8 gb of memory, although the maximum array size is limited to 2gb.

John

jimdempseyatthecove · ‎05-25-2013

John,

I to reversed the deallocation loop but saw no change in "fragmentation". I also added calls to explicitly enable Low Fragmentation Heap (for Releas build), but this too made no change. I am running on Windows 7 x64 platform.

[txt]

Run as x32:
Position =           0 Memory that could be used(M):        1750
           0
           0
Position =           1 Memory that could be used(M):         460
Position =           2 Memory that could be used(M):         460
Fortran Pause - Enter command<CR> or <CR> to continue.

Run as x64
Position =           0 Memory that could be used(M):                 31890
           0
           0
Position =           1 Memory that could be used(M):                 35850
Position =           2 Memory that could be used(M):                 37640
Fortran Pause - Enter command<CR> or <CR> to continue.
[/txt]

It appears that there is a node consolidation issue with the heap when running x32 on x64 platform

Jim Dempsey

John_Campbell · ‎05-25-2013

Jim,

Try the two versions I attached, which repeat the process 20 times. I found a difference in available memory using another compiler. I'd be interested if ifort does not have a similar problem, as I think the problem is with the memory management system ( I assume called malloc ?).

I have found other examples where the memory availability estimate of malloc gets confused. Perhaps it is because of the non-contiguous chunks of memory being poorly managed, rather than the memory being leaked.
I think there needs to be a better strategy to cleaning up the available memory, at each DEALLOCATE, to improve the availability of big chunks of contiguous memory. It would be interesting if we could get access to the table of available memory that MALLOC is using, to see what is going on.

I think the problem is not memory leakage but having the available pool of memory being split up by small "allocations". It is hard to identify the source of these in the program attached, unless MALLOC is not merging adjacent free memory areas, which I'd expect MALLOC should be able to overcome.

I have written my own F77 memory manager, using a large array, and this clean-out is a basic strategy. My routine, called MEM_ALLOC, has two features that ALLOCATE does not allow, which are, give me the biggest available array size, and a re-size option, for reducing the size of a previously allocated array. Not sure why these were not provided in ALLOCATE. There are a few useful features omitted from the standard!

John

jimdempseyatthecove · ‎05-26-2013

John,

Made minor modifications to each program. This was to exit allocation loop on first failure, and exit deallocation loop on first .not. allocated. This removed buku number of messages.

Both report similar results:

[plain]

TESTING PASS 1

Position = 0 : Memory that could be used 1396 mb (28)
allocate error for B(K)%C(B(K)%N)     1811830         250          41
     1811830 allocated    1907.349      mb
Position = 1 : Memory that could be used 0 mb (28)
     1811830 deallocated
Position = 2 : Memory that could be used 76 mb (28)

TESTING PASS 2

Position = 0 : Memory that could be used 76 mb (28)
allocate error for B(K)%C(B(K)%N)     1813748         250          41
     1813748 allocated    1907.349      mb
Position = 1 : Memory that could be used 0 mb (28)
     1813748 deallocated
Position = 2 : Memory that could be used 76 mb (28)

TESTING PASS 3

Position = 0 : Memory that could be used 76 mb (28)
allocate error for B(K)%C(B(K)%N)     1813619         250          41
     1813619 allocated    1907.349      mb
Position = 1 : Memory that could be used 0 mb (28)
     1813619 deallocated
Position = 2 : Memory that could be used 76 mb (28)
[/plain]

After 1st allocation the heap appears fragmented. This is in x32 Debug build.

Note, IVF is calling MS CRTL runtime routines. However, IVF support team should use your example to determine if the observed fragmentation is a result of MS CRTL alone, or a result of how they use the MS CRTL routines.

Jim Dempsey

Frank_W_1 · ‎05-26-2013

Many thanks to Steve.

Many thanks to John.

Many thanks to Jim.

I agree the heap memory is fragmented .Maybe the next work is that ifort improves it and I have to change my data structure now.

Meanwhile, I find that the CVF6.6 has the same problem too.

John_Campbell · ‎05-26-2013

Jim,

I have returned to work and run the programs I posted on the weekend. I have identified the problem, using ifort IA-32. There are a few errors being reported !!

I modified the program to address the problem you identified and also better report testing options.

1)    Include an exit on the first ALLOCATE error,
2)    Correctly report the memory allocated,
3)    Report the DO loop order for deallocate, and
4)    Report the value of %n being used for memory size test.

There are two key options that identify the problems being identified in this test.

When %N = 150, B is fully allocated, but when B is deallocated, the amount of free memory available, that is identified by ALLOCATE, is reduced with each cycle of the test. The amount of memory identified changes with the order of the DEALLOCATE loop, being 336 mb, after PASS 20, with DO k=1,M, but 411 mb with DO k = M,1,-1. This identifies that the order of deallocate can change the estimate of free memory for subsequent ALLOCATE requests.

When %N = 250, B is only partially allocated, but when B is deallocated, the amount of free memory available, is identified as only 76mb by ALLOCATE, but on a subsequent ALLOCATE of B%C, 1,872 mb of memory is available for B to be allocated. This instability in the amount of free memory available is a problem that I have identified in other programs. There is instability in the amount of free memory being identified to ALLOCATE. This problem identifies that the assessment of the available memory pool is changing.

The example I posted on the weekend also identified that for a 32-bit application running on a 64-bit OS, that up to 3.8gb of memory can be accessed using the equivalent of the /3gb memory option. It would be good to enable this for 32-bit ifort applications running on 64-bit OS, for those applications that can not be easily changed to 64-bit. ( LOC needs to report > 2gb)

I have attached the two DO loop versions from the weekend, with the corrections noted above and for the two cases of %n=150 and %n=250. X90-bat.txt is the batch file I use to test each option. These identify the problems I have reported.

There is a definite problem in the way the pool of free memory is being assessed. It would be good if this problem could be better identified and hopefully addressed. (could a routine defragment_memory_pool or something similar be provided, although DEALLOCATE should do this ?)

John

ps: problems attaching files, as gives an error report

I have attached an updated version that takes an argument of N, with -ve for backwards deallocate. Gives flexibility to see the change to memory loss for different values of N. The pattern of loss, say for N = 100 or -100 is interesting.

Frank_W_1 · ‎05-27-2013

Jhon，

Thank you very much.I hope this problem could be solved by ifort as soon as possible.

Frank

John Campbell wrote:

Jim,

I have returned to work and run the programs I posted on the weekend. I have identified the problem, using ifort IA-32. There are a few errors being reported !!

I modified the program to address the problem you identified and also better report testing options.

1)    Include an exit on the first ALLOCATE error,
2)    Correctly report the memory allocated,
3)    Report the DO loop order for deallocate, and
4)    Report the value of %n being used for memory size test.

There are two key options that identify the problems being identified in this test.

When %N = 150, B is fully allocated, but when B is deallocated, the amount of free memory available, that is identified by ALLOCATE, is reduced with each cycle of the test. The amount of memory identified changes with the order of the DEALLOCATE loop, being 336 mb, after PASS 20, with DO k=1,M, but 411 mb with DO k = M,1,-1. This identifies that the order of deallocate can change the estimate of free memory for subsequent ALLOCATE requests.

When %N = 250, B is only partially allocated, but when B is deallocated, the amount of free memory available, is identified as only 76mb by ALLOCATE, but on a subsequent ALLOCATE of B%C, 1,872 mb of memory is available for B to be allocated. This instability in the amount of free memory available is a problem that I have identified in other programs. There is instability in the amount of free memory being identified to ALLOCATE. This problem identifies that the assessment of the available memory pool is changing.

The example I posted on the weekend also identified that for a 32-bit application running on a 64-bit OS, that up to 3.8gb of memory can be accessed using the equivalent of the /3gb memory option. It would be good to enable this for 32-bit ifort applications running on 64-bit OS, for those applications that can not be easily changed to 64-bit. ( LOC needs to report > 2gb)

I have attached the two DO loop versions from the weekend, with the corrections noted above and for the two cases of %n=150 and %n=250. X90-bat.txt is the batch file I use to test each option. These identify the problems I have reported.

There is a definite problem in the way the pool of free memory is being assessed. It would be good if this problem could be better identified and hopefully addressed. (could a routine defragment_memory_pool or something similar be provided, although DEALLOCATE should do this ?)

John

ps: problems attaching files, as gives an error report

I have attached an updated version that takes an argument of N, with -ve for backwards deallocate. Gives flexibility to see the change to memory loss for different values of N. The pattern of loss, say for N = 100 or -100 is interesting.

jimdempseyatthecove · ‎05-28-2013

Frank,

From John's and my testing, it appears that there is indeed a heap issue that results in fragmentation whereby adjacent nodes of free memory are not consolidated into larger nodes (at least as exhibited by the example programs posted). This leaves you in the position of a) waiting for a fix, or b) producing a work around. I suggest a work around where the large buffers are:
a) allocation of large buffer first checks private list of buffers, if one available take one from private list, if none available ALLOCATE
b) free of large buffer, returns to private list (i.e. does not DEALLOCATE)

Jim Dempsey

Steven_L_Intel1 · ‎05-30-2013

I did an experiment replacing the Fortran ALLOCATE calls with calls to MALLOC (which is a wrapper to the C malloc routine). I saw identical behavior, so the fragmentation is happening within the C heap management. There's nothing Fortran can do about this.

Frank_W_1 · ‎06-05-2013

Steve,

Thank you.

But I dont agree with you.

This bug is caused by the double layers of dynamic array in custom type.

If there is only one layer of dynamic array,the fragmentation will not be happen.

It should be the fortran's problem and could be corrected.

Frank

Steve Lionel (Intel) wrote:

I did an experiment replacing the Fortran ALLOCATE calls with calls to MALLOC (which is a wrapper to the C malloc routine). I saw identical behavior, so the fragmentation is happening within the C heap management. There's nothing Fortran can do about this.

Steven_L_Intel1 · ‎06-05-2013

You have not shown evidence that it's Fortran's problem. What you showed instead is that the pattern of allocations and deallocations creates the fragmentation. As I said, I replaced the Fortran ALLOCATEs with C mallocs and saw the same behavior.

John_Campbell · ‎06-05-2013

Steve,

We have shown that it is a problem of using ALLOCATE ( B(K)%C(B(K)%N) )

You have been able to reproduce the problem, using MALLOC.

I don’t think that qualifies as not being a Fortran problem. It is certainly a problem for Fortran users.

Do we have any way of identifying the defragmentation of available memory, being either:

· Small remnants of the allocation process, although they are difficult to identify in the code, or
· The free memory table in MALLOC becomes large and unordered, due to the allocation of 2,000,000 B(K)%C(N)’s.

If Malloc uses a table of free memory, is it possible to get access to this list. It would certainly be interesting to analyse this and identify what is the reason for the defragmentation, being either an inability to recognise and merge adjacent free memory blocks, or there are small remnants left over from the B%C(:) structure.

Is there a MALLOC function to sort the free memory table and merge adjacent free blocks, as this could be a useful function to select after many DEALLOCATE operations.

I do note that the worst case occurs when the allocation of B%C is not complete. Could it be there is not a suitably clean recovery from an ALLOCATE error in this case.

It would certainly be good to know a bit more about what causes the problem, rather than it is not Fortran’s problem.

It is clear it is a Fortran user’s problem and it is a problem with Ifort’s implementation of DEALLOCATE (B(K)%C).

John

IanH · ‎06-05-2013

Random thoughts from previous explorations, the passage of time may have corrupted things somewhat:

- The C runtime's malloc is a thin wrapper on top of a base operating system library (HeapAlloc and friends) - which I what it looks like ALLOCATE hits directly for small allocations anyway (I previously thought it was malloc based and can find some ifort compiler test stuff based on that (I've gone to the trouble of writing a module that provides interfaces for all the stuff in crtdbg.h!) - has this changed with ifort version?) (Edit to note - /libs:static is the key!)

- That heap manager works (as you would expect) by reserving largish chunks of address space and then divvying it up for the individual allocations. I recall reading that it is limited in how many chunks it can reserve (perhaps that's an OS specific thing and perhaps I am completely off base and this should be ignored - I can't find whatever documentation made me "recall" this). Once it has reserved a chunk of address space I don't know whether the heap manager ever gives it back to the OS once all the little allocations have been freed - I'd expect not (and simple tests confirm my expectations), but I might be wrong.

(You can have multiple heaps extant at once in a Windows program - one way of forcing the heap manager to give the virtual address space back to the operating system is to destroy a particular heap, but I suspect that's well outside the domain of what you could do considering Fortran allocatable variables.)

- For large allocations ALLOCATE goes via VirtualAlloc and friends - i.e. its strategy changes depending on the size of the allocation (which is sensible given normal use cases).

- Given the above, I wonder whether the method of testing for available memory could potentially be confusing the issue or be part of the problem itself. If your heap manager has reserved lots of address space to cover lots of little allocations, then when VirtualAlloc is called for a big allocation there won't be anything to give out.

- You can walk an OS provided heap using HeapWalk and query the maximum size using HeapCompact (though I'm not sure whether that is the maximum-maximum size, or just the maximum size given the address space that the heap manager has already acquired). It would be easy enough to write a little diagnostic routine using HeapWalk. If you use a tool like vmmap you can see where the address space reserved for heap allocations versus other address space uses sit relative to each other. Some of the other windows debugging tools probably give you further insight (whenever I've had to do heap stuff I've been working via the C runtime's diagnostic support, which might or might not be applicable here).

- The OS heap allocation routines can change their behaviour depending on whether a debugger is attached! Current windows versions may use a so called low fragmentation heap by default, but if a debugger is sensed it will switch to the previous "compatible" style of heap allocation.

Frank_W_1 · ‎06-06-2013

Steve，

As John has said, it is a problem of using ALLOCATE ( B(K)%C(B(K)%N) ) which I called "the dynamic array of double layers with personal type".

I do have no enough evidence to show it is the problem of malloc of C.

But why the memory could be recoverd to the original when we deallocate dynamic arrays with the system data type like integer/real/double ? There is no fragmentation at all.

If ifort could dealloccate the A(N) clearly with no fragmentation, why ifort deallocate A(N)%B(M) with fragmentation (A and B are both dynamic array)? There is no reasonable explaination but ifort has some certain bug in deallocate with the custom type.

The data structure of A(N)%B(M) is very ordinary and popular. If there is some unconvience, it should be improved even if it is the problem of malloc.

I do hope the deallocate could be optimized .

Frank

Steven_L_Intel1 · ‎06-06-2013

Perhaps I have not made myself clear. When you do a Fortran ALLOCATE, it calls C malloc to allocate the storage. (Though for large requests it uses the WIndows API routine VirtualAlloc - that's not happening here.) For DEALLOCATE, it calls C free. It is the C library that actually manages the storage. Any overhead Fortran adds is included in the malloc request and freed appropriately.

My experiment replacing the ALLOCATE/DEALLOCATE with malloc and frree, using the identical amounts of storage, demonstrates the exact same fragmentation issue, which shows that Fortran is not doing something extra resulting in the fragmentation. It is the Microsoft C library that is not properly consolidating the freed blocks.

I will do some more tests to see if I can prove this further, but I see no evidence that the Fortran library is responsible.

IanH · ‎06-06-2013

The fragmentation is happening at an even lower level - at the level of the Windows heap manager and even at the level of the process' address space.

The only way I can think of to "fix" it (assuming this is actually a problem and not just an observation by the OP) is to switch the smaller allocations over to being pointers set up using C_F_POINTER, where the backing memory is taken from a heap dedicated to those allocations. When you have finished with those smaller allocations you then destroy the entire heap. (Or, because of the fixed size and consistent lifetime of the allocations it would be trivial to roll your own allocator that didn't fragment.)

The behaviour of the heap manager will have been "tuned" for typical application usage. The OP's example isn't typical by a long shot, particularly when you consider you have millions of allocations all of exactly the same size, in conjunction with the "search" for a large chunk of contiguous available address space.

Frank_W_1 · ‎06-06-2013

Steve,lanH,

Thank you very much.

I have made a similar data structure with C++.

No fragmentation at all.

Does this mean ifort could be improved?

Frank

Frank_W_1 · ‎06-06-2013

Position =1Memory that could be used(M):1800
Position =2Memory that could be used(M):275
Position =3Memory that could be used(M):1800