Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28477 Discussions

Program running slower over time - due to dynamic memory allocation/deallocation?

calmag
Beginner
1,266 Views

My fortran program requires a large amount of memory and takes a long time to run (a simulation program). To manage memory usage, I used variable-sized arrays of link lists (the link lists are for quick popping in and out of data and I am sure there is no memory leak) and performed frequent memory allocation and deallocation inside several large loops (the loops are also multi-threaded). While I am able to maintain reasonable and steady memory usage, I also observed significant speed degradation over time (50%-100% run time increase from beginning to the end of the program run). An obvious symptom is the reduction of cpu utilization over time. The cpu utilization just continued to drop even through the computer is doing nothing but just to run this program.

My preliminary search for solution seems to indicate that this is caused by memory fragmentation due to my frequent memory allocation/deallocation and long run time. Perhaps there may be some other reasons for this that I am not aware of. Cpu utilization is not something I can control and I have no clue how to mitigatea situation like this.

I have not found any existing discussion helpful to this problem. Perhaps I have not read enough but any help would be greatly appreciated.

Calmag C


0 Kudos
13 Replies
Arjen_Markus
Honored Contributor I
1,266 Views
If it is indeed memory fragmentation that is causing the slowdown, you may consider using
a pool of memory from which to allocate and deallocate meory chunks.

The idea is that you simply allocate one or more large chunks of memory at the start of the program.
During the run you assign small pieces of it and release them when you are done. It will probably
caused you some extra programming, but it may be worth investigating.

Regards,

Arjen
0 Kudos
calmag
Beginner
1,266 Views
Arjen,
Thanks for the quick reply and confirmation. During my search I was also trying to find if I can do some pooled allocation like some are doing in C/C++ but I have not found any libaray/function call that supports pooled memory allocation in Fortran. Is there a typical technique to accomplish what you suggested?
Calmag C
0 Kudos
Arjen_Markus
Honored Contributor I
1,266 Views
Hi Calmag,

I have not tried it myself yet, but a technique that should work is something along
these lines:

For simplicity: suppose you have derived types with fixed components only
Then:
- Allocate an array of such types and a logical array of the same size to register whether
an element is in use.
- When youneed a chunk of memory, find the firstunused element and setit to"in use".
- You can then use a pointer to that element.
- When you are done, set the element to "free".

This is just a sketch of course. It should work with some modifications for assigning chunks
of integersor reals as well. Butbeware: if the amounts of memory vary a lot, you will find
yourself implementing an actual allocator yourself, complete withdefragmentation algorithms
and so on.

Whether that is worth your while - and how general you can make the implementation -
is an open question.

Regards,

Arjen
0 Kudos
calmag
Beginner
1,266 Views
Thank Arjen,
This sounds non-trivial....What makes it worse is that I don't use arrays, I use all link-list data strucure, which is composed pointers. When I pop in and out data from the link list, I do memory allocation/deallocation, and there are tens of millions actions like this taking place. Because in link-list, each allocate/deallocate action changes only a small amount of memory, then I can see how memory got fragmented quickly.
Another wilder idea is to somehow perform memory defragmentation on the fly, which is even far beyond my wildest imagination in terms of implementation.
Regards,
Calmag
0 Kudos
Arjen_Markus
Honored Contributor I
1,266 Views
What I meant to say is that you use the method I sketched for the list elements. So instead of
allocating these elements via the ALLOCATE statement, you "simply" use an element from a pre-allocated array. That way you only use contiguous memory.

Regards,

Arjen
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,266 Views
To expand on Arjen's proposal (linked list of free nodes), use thread local storage to have each thread maintain its own list of free nodes.

COMMON /YouThreadPrivate/ type(yourFooType) freeFooListHead
!$OMP THREADPRIVATE(/YouThreadPrivate/)
...

// in your allocate routine
if(associated(freeListHead)) then
getFooPointer = freeFooListHead
freeFooListHead = freeFooListHead%next
return
endif
allocate(getFooPointer)

-------------
// in your deallocate routine
aFoo%next = freeFooListHead
freeFooListHead = aFoo

Jim Dempsey
0 Kudos
Steven_L_Intel1
Employee
1,266 Views
What evidence do you have that "memory fragmentation" is the issue? Have you used Intel VTune Amplifier XE to analyze the running program and see where the time is being spent?
0 Kudos
calmag
Beginner
1,266 Views
Thanks a lot for the suggestion. I will give it a try!
Calmag
0 Kudos
calmag
Beginner
1,266 Views
No, I have not. There is no hard evidence. My suspicion came from reading about memory fragmentation due to frequent dynamic memory allocation/deallocation over a long run time reported by others, which is similar to my case. I have just downloaded the trial version of VTune Amplifer XE and just ran it. Lots information got reported but I have will have to go through the tutoiroal to make sense of them. Is there anything specific I should look for? Thanks for the suggestion. Will write back if I run into other related questions.
Calmag
0 Kudos
aagaard
Beginner
1,266 Views
I did have some issues - that I assumed was, but was not- related to memory fragmentation. During my analyse phaseI did write a small routine to dump information on all memory blocks used by application to a file. Calling this routine a specific points in my flow did in fact bring some usefull information on memory use.
I'll attach the routine - might be that you could use part of this.

See thread http://software.intel.com/en-us/forums/showthread.php?t=80916&o=a&s=lr&wapkw=(aagaard)
0 Kudos
Steven_L_Intel1
Employee
1,266 Views
The Hotspot Analysis will tell you where your program is spending the most time.
0 Kudos
calmag
Beginner
1,266 Views
Just to report back. I followed Jim's suggestion and the issues were resolved. Thanks for the suggestion!
Calmag
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,266 Views
You are welcome.

You should look at the: Task Manager | Performance | Page File Usage

to observe if you have a similar problem lurking somewhere in your code.

Also, if these allocations are only required up to a point in your code, then if different allocations will be constantly reused later (and present a paging or out of memory issue), then consider adding a parallel region after use of these first allocations, that specifically deallocate these nodes.

It sounds like this is not needed, but look at the Page File Usage to assess the situation.

Jim Dempsey
0 Kudos
Reply