Array with variable size an iterative process

Jeferson_Vanderlinde · ‎03-23-2016

Hello everyone! My name is Jeferson and I have been having trouble with arrays that change their size in each iteration.

First of all I need to do this using an interface block because I plan to use an external subroutine.

Initially I need to create vectors that their sizes are definedincreased during an iterative process depending on the data and a set of conditions. For example:

Data: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Begining of iterative process: A, B and C are empty
Ending of iterative process: A = |1 3 8 10|; B = |2 4 7|; C = |5 6 9|

After a new iterative process of exchange values between A, B and C. For example:
At the end of this new iterative process: A = |1 10|; B = |3 7 8|; C = |2 4 5 6 9| (no matter the order of the numbers in each variable)

I want to do this as efficiently as possible, because I have a huge amount of data.

I implemented this routine by setting the size of A, B and C as a fixed large number, but I do not know if it's the best alternative.

I was searching and found that it is possible to use dynamic allocation of arrays (MOVE_ALLOC) and also through the use of pointers. I have not found many materials that explain about these two possibilities.

Could anyone help me define the most efficient way and how to perform this process?

jimdempseyatthecove · ‎03-23-2016

If you know of a worst case number of elements you can

allocate(Ablob(worstCase), Bblob(worstCase), Cblob(worstCase))
...
A => Ablob(1:nA)
B => Bblob(1:nB)
C => Cblob(1:nC)

This may save you on the cost of allocate/deallocate

Jim Dempsey

andrew_4619 · ‎03-23-2016

MOVE_ALLOC is useful if you have a data array that is populated with data and want to make it bigger whilst preserving the data. It is more efficient in that the data is copied in memory once not twice. If you are repopulating the array at each iteration it does not really help.

John_Campbell · ‎03-23-2016

The main issue for this class of "how long could a piece of string be" problem is when do you know what "worstCase" will be?
Especially, do you know how big "thisCase" is before you start the iteration.

Often "worstCase" is not known until the iteration is completed, so the two main approaches are:

1) How big can I afford to make "worstCase" so that it will probably never fail so choose a large number for worstCase that is very unlikely to be exceeded, but does not exhaust available memory, or
2) do a 2-pass solution so that the first pass counts but does not store the information. An example is to read a list of numbers of unknown length. It could be 3, 10, 1,000 or even 6 billion numbers.

The time cost of doing 2 (say reading data from a file) passes needs to be compared to the cost of using an extremely large upper bound estimate. Often, we can place an upper bound estimate on the size by considering how big a set of items can we handle in some later analysis phase, which can place a practical estimate on the value of "worstCase".

There are other alternatives:

a) Apply a restriction to the data source provider to include the upper estimate with the data set, before the data set is used. With a list of numbers, this can be done by placing the "worstCase" estimate at the start of the data or in another configuration file.

b) If the array is ALLOCATABLE, Write a re-allocate routine that changes the size of the array(s) if it is too small. As you indicated you don't want to call this too often, so the step increase needs to be well chosen. Also, this is sometimes not practical as it can require a copy of the array when two copies of the array would exceed available memory. With 64_bit address this can be easier.

After all this, most of us just give up and declare the arrays to be dynamic and "very large" and if it fails, then re-define the "very large" size, recompile and try again. You must know something of the data set so make an educated guess. Resizing may be an elegant solution, but probably a waste of effort, unless it is a very special case, like managing the data set for a 3D interactive modelling package.

Just use:
n = 1024**2
allocate ( a(n), b(n), c(n) )
memory is cheap !!

John

jimdempseyatthecove · ‎03-24-2016

If you build for x64...

.AND.

If the memory allocator (heap manager) does .NOT. wipe or prefill fist time allocation (Debug build may do this), then you can allocate absurdly large arrays that initially do not consume RAM and/page file space. Consumption is deferred until first touch. This may require that you also specify an absurdly large page file, but the actual consumption should be deferred until the given pages are touched (e.g. read from file).

The potential problem with growing arrays (move_alloc or auto (re)allocation) is that this tends to fragment memory. IOW your footprint may end up requiring 2x the RAM/page file that it minimally requires.

Jim Dempsey