Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

POINTER vs ALLOCATABLE

Dishaw__Jim
Beginner
4,097 Views

I tend to use ALLOCATABLE more than POINTER because I have the impression (right or wrong) that ALLOCATABLE keeps me out of trouble. The one downside to using allocatables is that if you have an allocatable element inside a type, a warning will be generated because it is not standard F95.

Is there any performance difference between ALLOCATABLE vs POINTER? I know there are some cases where a pointer implementation makes sense (e.g. a red/black type iteration)--is there a pragmatic reason for picking one versus the other. I'm leaning towards changing the ALLOCATABLE members inside a type to POINTERs. Is there a good reason why this should?

My code is not destined to be an optimized production code--the main focus is to support my research. The code is used by other researchers, so representing data structures in a manner that matches the algorithm is preferable. While speed is desirable, it is a secondary goal.

0 Kudos
17 Replies
Steven_L_Intel1
Employee
4,097 Views
The semantics of the two are subtly different. For example, a POINTER can point to a scalar, a discontiguous array slice, or another variable where an ALLOCATABLE can do none of those things (well, allocatable character variables are part of F2003 but we haven't implemented that yet.)

When you do intrinsic assignment (=) of one pointer to another, you simply copy the pointer. Copying allocatables that way copies the data (and in F2003, if the shapes don't match, the left side gets reallocated to match the right side - this is not yet in Intel Fortran.) This reallocation also occurs when you assign derived types containing allocatables, and that IS implemented in ifort.

Locally-declared allocatables are automatically deallocated on routine exit unless SAVEd, not pointers. An INTENT(OUT) allocatable is deallocated on routine entry, not pointers.

Performance-wise, allocatables are the better choice as the compiler knows they're always contiguous and never alias another variable.

I would say that if your application can use ALLOCATABLE, it should. But there are valid reasons to use POINTER. The question tends to come up because F90/95 did not allow you to do some things such as have allocatable dummy arguments or derived type components, so people used POINTER instead. F2003 fixed that.
0 Kudos
jimdempseyatthecove
Honored Contributor III
4,097 Views

Steve,

Thanks for the description of the differences. I would like a clarification regarding

"When you do intrinsic assignment (=) of one pointer to another, you simply copy the pointer." As in:

real, pointer :: pA(:), pB(:)
...
pA => findSomething()
pB => findSomethingElse()
pA = pB

Are you saying "=" copies the pointer as opposed to the array data?

There is one more potential difference that is not to be found in the documentation. Perhaps you can comment on this.

If you have a POINTER to an array and then you allocate to the POINTER presumably an array descriptor is allocated as well. Also, presumably, in the array descriptor a POINTER reference counter is set to indicate that one pointer is pointing at theallocated array descriptor.

Now subsequent to the allocation you copy the pointer say by =>. Then, presumably, in the array descriptor a POINTER reference counter is incremented to indicate that one more pointer is pointing at theallocated array descriptor.

Now subsequent to the assignment, one of the two pointers is used to deallocate the array but is not yet NULLIFY'd. Presumably the two pointers point to the allocated array descriptor that indicates a NOT allocated array.

Now subsequently you reassign a pointer => or NULLIFY a pointer presumably the previously pointed toallocated array descriptor POINTER reference counter is decrimented, and if the reference counter goes to 0 then the allocated array descriptor is deallocated.

Are the presumptions correct?
(If not, then there would be a memory leak of the non-deallocated allocated array descriptors.)

Also presumably on return from a subroutine carrying a pointer the allocated array descriptor POINTER reference counter is decrimented, and if the reference counter goes to 0 then the allocated array descriptor is deallocated. (You indicated the allocation would not be).

Jim Dempsey

0 Kudos
Steven_L_Intel1
Employee
4,097 Views
Sorry, I was a bit confused. When you do a = of two pointers, you copy the data and the shapes must match. With allocatables, you copy the data and, if the shapes don't match, the left side is deallocated and then reallocated to match the right side (not yet implemented).

Pointer descriptors do not have a reference count. If you deallocate a pointer when other pointers reference the storage, the other pointers are now "dangling" and undefined.
0 Kudos
jimdempseyatthecove
Honored Contributor III
4,097 Views

Steve,

Thanks for the clarification.

>>Pointer descriptors do not have a reference count

That's fine (like C/C++). This also permits a pointer to be returned to a caller that might not manage the descriptor reference count but otherwise knows how to dereference the descriptor.

Might I suggest a feature

real, pointer, automatic :: pArray(:)

Where the array descriptor maintains a reference count and is handled as described in the presumptions of the previous post.

Note, this is slightly different from

real, allocatable, target :: Array(:)
...
real, pointer :: pArray
...
pArray => Array

In this instance, if Array is declared in a subroutine then depending on something outside of the syntax, the array descriptor is either on stack or in static area of subroutine. If on stack then the pointer reference count could be non-zero when the scope of the subroutine expires and thus any dangling pointers now point to junk memory. On the flipside, if the array descriptor is in the subroutine static area then the descriptor is persistant and can handle the dangling pointer situation, however this also means the subroutine wit regards to Array is notreentrant. To resolve this you might need

real, automatic, allocatable, target :: Array(:)

Now this brings up the can of worms as to if the attributes apply to the descriptor or to the object to which the descriptor references.

Permitting dangling pointers resolves the issue as being unresolved.

Jim Dempsey


0 Kudos
Steven_L_Intel1
Employee
4,097 Views
Reference counts mainly make sense when you have a central list of pointers, and perhaps some notion of garbage collection. In Fortran, pointers can go out of scope and you lose any handle on them.

We're not likely to invent new language in this area. There are clearly many ways Fortran can be improved, but this should be driven through the standards committee.
0 Kudos
Dishaw__Jim
Beginner
4,097 Views
Wow, thanks for the great answer. I have always preferred the use of ALLOCATABLEs over POINTERs and it is good to have opinion backed up by facts. I think I will live with the compiler warnings--does 10.0 generate warnings if there is an ALLOCATABLE inside a TYPE? I would try it out, but I'm under a self-imposed configuration freeze until mid-September.

Locally-declared allocatables are automatically deallocated on routine exit unless SAVEd, not pointers. An INTENT(OUT) allocatable is deallocated on routine entry, not pointers.
I always try to DEALLOCATE when I exit a subroutine of function, so it is good to know that Fortran (or at least ifort) will clean up after me if I forget. I didn't realize that INTENT(OUT) would force a deallocation. With ALLOCATABLE being much more dynamic in the sense of ALLOCATE/DEALLOCATE occurring frequently, how does ifort handle memory management? Does it rely soley on the OS or does it have a memory management routine that sits between ones code and the OS? If it is the latter, how does it manage fragmentation?
0 Kudos
Steven_L_Intel1
Employee
4,097 Views
What warnings? Intel Fortran has supported ALLOCATABLE components of derived types (and ALLOCATABLE dummy arguments and function results) since version 8.0. If you ask for standards checking in version 9.1 it will say these are non-standard, since the checking is against F95. (In 10.0, the default standards checking is against F2003 so no warning.)

ifort does not do its own memory management. Basically it calls malloc and free, though for very large allocations it calls a Win32 API routine (whose name I forget) instead. There's no attempt at managing fragmentation - it is assumed that the C library or OS takes care of that.
0 Kudos
jimdempseyatthecove
Honored Contributor III
4,097 Views

Steve,

>> Reference counts mainly make sense when you have a central list of pointers, and perhaps some notion of garbage collection. In Fortran, pointers can go out of scope and you lose any handle on them.

Reference counts make sense in any language when you have a descriptor which may have multiple references (pointers pointing to the descriptor as opposed to pointing at a block of data). Garbage collection isquite different.

When a pointer goes out of scope the reference count can be decremented. You already handle allocatables going out of scope so you already have a going out of scope cleanup process that could handle pointers as well. This also requires you add to the going in scope initialization of pointer to NULL where you alreadyhave the code to initialize allocatable to not allocated.

This is not a language change, rather it is an implementation change. No different thanthe implementation issue of how to handle uninitialized variables.

And this implementation change that has favorable side effects:

1) Most platforms reserve a block of memory at 0 and cause a runtime faulton access to 0:blocksize-1. Therefore at expense of initializing the pointer, and subsequently initializing/maintaining the reference count you get stronger bug defense and detection.

2) It fixes memory leaks (similar to auto deallocation of allocatables).

3) It exposes residual pointer bugs (multiple pointers, one used todeallocate, then subsequent reference).

Yes, this does add some overhead. The largest overhead will come from the allocate to pointer performing two allocations - one for the descriptor and one for the data being allocated.This minor overhead could be objectionable to some. Therefore the feature should be available as a diagnostic option. And some of us may elect to always use the option.

Note, in this world that we live in there are individuals who's sole purpose in life seems to be to wreck havoc on others by exploiting software bugs. Things such as buffer overruns. Although this suggestion is not a fix fora buffer overrun situation, it has a similar characteristic in that it is an access to memory that ought not to have access situation. Fortunately this can easily be corrected by an implementation change (as opposed to a language change).

Features like this make a product more attractive to purchase or upgrade.

Jim Dempsey

0 Kudos
Dishaw__Jim
Beginner
4,097 Views

The warnings are the F95 standards checking warnings.

0 Kudos
Steven_L_Intel1
Employee
4,097 Views
Ah, I see the confusion. There are not multiple pointers to a descriptor. Each pointer (to an array) is its own descriptor. Scalar pointers are just an address. There's no place to hang a reference count.

You won't get the standards warning for allocatable components in 10.0 (the default is F2003 checking.)
0 Kudos
jimdempseyatthecove
Honored Contributor III
4,097 Views

For pointers to arrays (array slices) one or more pointers point to a descriptor and the descriptor points to the memory block (if allocated). The descriptor contains a count of the number of references. A subsequent pointer can point to either the same descriptor or to a new descriptor that is a slice of the array of some other descriptor (and which incriments the reference count of the descriptor on which it is dependent as well as incrementing the reference count of its own descriptor. Subsequent references to the 2nd level descriptor only need to increment the reference count of the 2nd level descriptor (or 3rd, 4th...)Only when the reference counts of the lower level descriptors decrement to 0 is the parent descriptor (if any) reference count decremented then memory block (if any)is returned, then the descriptor is returned.

The descriptor, when allocated, must be severable from the memory block it references. This is due to the fact that multiple pointers may be referencing the descriptor whos memory block has been deallocated. The other pointers, if dereferenced, must reflect the deallocated memory condition as opposed to referencing the former memory locations.

Managing the reference count is straitforward however you have to pay attention to the details. e.g. an array is allocated to a pointer, a second pointer references a slice of the array, a third pointer references an element of the slice. To properly maintain the reference counter the third pointer has to point to a scaler reference descriptor which then points to the element as well as points back at the descriptor for the slice which points back to the original descriptor.

Jim Dempsey

0 Kudos
Steven_L_Intel1
Employee
4,097 Views
I suppose one could implement it that way, but it adds levels of indirection to pointer accesses and adds complexity not required by the language. Every time you did a pointer assignment or reference you'd have to chase through multiple layers. Does it really buy you anything?

If you use ALLOCATABLEs instead of POINTERs, it's impossible to leak memory (in a correct implementation). I expect that over time use of POINTER will diminish.
0 Kudos
jimdempseyatthecove
Honored Contributor III
4,097 Views

A pointer to array (or array slice) would have no additional levels of indirection. The pointer currently points to a descriptor. The difference is in maintaining a reference count and a potential parent descriptor pointer. Indexing off the pointer to obtain an entry would have no additional overhead. Also note that using a pointer as a call argument (either as pointer or as that to which it points) has no additional overhead. The cost is only in the following areas: Enter Scope containing pointer(initialize), => (dereference,reference), and exit scope with pointer. The overhead is near nil.

A pointer to a scalar or derived type would consist of the address word (as it does now) and a pointer to a parent descriptor that is either 0 if pointing to other scalar, or points to the array descriptor of that who's cell it references. Again the overhead only exists on Enter Scope containing pointer(initialize), => (dereference,reference), and exit scope with pointer.

Jim Dempsey

0 Kudos
Steven_L_Intel1
Employee
4,097 Views
Currently, a pointer does not "point to" a descriptor - the pointer IS the descriptor if an array, or is just an address word if scalar. There is no concept of a "parent descriptor".
0 Kudos
jimdempseyatthecove
Honored Contributor III
4,097 Views

Steve,

I see (looked at dissassembly window of test program).

I would guess that this was an implementation issue. For arrays, you chose to keep the descriptor inside the pointer object as opposed to being (when necessary) allocated from dynamic memory or from a free pool of descriptors.

The advantage of doing it the current way is "pointer(index)" has one less memory cycle than if you had a pointer to a descriptor.

That advantage is only fleeting, in that for multiple references or loops the compiler will very well perform the dereference once.

The disadvantage is significant

a)Pointer => OtherPointer is a copy descriptor

mov eax, 


mov ,eax
mov eax, [ebp-58h]
mov [ebp-7Ch],eax
mov eax, [ebp-54h]
mov [ebp-78h],eax
mov eax, [ebp-50h]
mov [ebp-74h],eax
mov eax, [ebp-4Ch]
mov [ebp-70h],eax
mov eax, [ebp-48h]
mov [ebp-6Ch],eax
mov eax, [ebp-44h]
mov [ebp-68h],eax
mov eax, [ebp-40h]
mov [ebp-64h],eax
mov eax, [ebp-3Ch]
mov [ebp-60h],eax

as opposed to something like this:

    mov edx, [Pointer]
test edx, edx
jz noPointer
mov eax, Parent[edx]
dec References[eax]
jnz OtherReferences
call ReturnDescriptor
OtherReferences:
noPointer:
mov eax, [OtherPointer]
test eax, eax
jz noOtherPointer
inc References[eax]
noOtherPointer:
mov [Pointer], eax


IMHO the majority of the => will be reassignments of pointer previously assigned and with the dereferencing not performing the return of the descriptor. 11 instructions, 6 memory references, one branch. As opposed to the current method18 instructions, 18 memory references, no branches. Therefore the suggested change would result in a ~2x improvement on => where assignment does not create a descriptor. A descriptor is created under two circumstances: 1)on ALLOCATE(pointerToArray(n)), and 2) where the new pointerToArray points to a slice of an array. These new descriptors can come out of a pool of free descriptors so the added overhead on allocation of descriptor is expected to be a very small percentage of the total overhead.

b) In the current method, all pointers to arrays require a descriptor's worth of memory. In the suggested method (depending on application) the vast majority of descriptors are shared. Thus reducing the memory requirement and reducing the cache requirements.

The other advantage of the new method is the ReturnDescriptor function can reclaim what would otherwise have been a memory leak.

An additional programming consideration would be required if the pointer were passed out of IVF into a mixed language and where that call maintains a reference. In this case you would require an increment reference (and decrement reference) function such that you could protect against unwarranted deallocations under those circumstances.

I haven't looked at all potential complications regarding this suggestion but it would seem like the suggestion would result in:

1) faster execution speeds
2) lower memory foot print for data containing pointers to arrays
3) l ower memory foot print for code manipulating pointers to arrays
4) better cache utilization
5) memory leak protection/recovery
6) protection against dereference of no longer valid pointer

This is a lot of pluses. It is unknown yet what the minuses are (besides the one additional level of indirection on an isolated reference of an element of a pointertoarray). I think the pluses outweigh the minuses.

Jim Dempsey

0 Kudos
Steven_L_Intel1
Employee
4,097 Views
Jim,

You have an interesting proposal, but it requires a lot of bookkeeping and I'm not convinced it's faster. In particular, any branches in code are undesireable and there's potential register-killing library calls in the sequence. One would also have to keep a list of all local pointers and do the "dereference" when they go out of scope.
0 Kudos
jimdempseyatthecove
Honored Contributor III
4,097 Views

Steve,

The branch in the code around the call where the decrement reference counter goes to 0 can be change to branch out of line on 0 and jmp back (jz doCall). Therefore the branch taken after decrement of reference count now becomes a branch not taken for the majority of the times executed. The library call will be known by the compiler as to preserve the registers therefore there will be no register killing call as far as the compiler is concerned. Next time you have a meeting regarding instruction set improvements suggest the instruction CALLcc which behaves like MOVcc except a conditional call is performed as opposed to a conditional move. This can then be used to eliminate unnecessary branching/branch back.

There is no runtime list of local pointers in scope it is a compile time list. The compiler inserts the going out of scope code for the pointers in the same place as it currently inserts the allocatable arrays going out of scope code. A tad more code on subroutines/functions declaring pointers to arrays.

 
  mov eax, 


dec References[eax]
mov edx, offset okP
jz doParent
okP:
mov eax,
dec References[eax]
mov edx, offset okQ
jz doParent
okQ:
mov eax,
dec References[eax]
mov edx, offset okR
jz doParent
okR:
...
ret

doParent:
push edx ; return address
jmp DereferenceDescriptor


This would add 4 instructions and 2 memory references for the majority of the uses. The instruction count and memory reference count for the proposed code is still less than that for the old code.

Also the above code sequence could be tweaked to eliminate the "mov edx, offset okP" by inserting an address table of return addresses (pointers are word size, address table entries are word size) therefore the address of the pointer could be used to obtain the address of the return (okP, okQ, okR, ...) assuming all pointers to arrays are arranged together by the compiler. Only stack pointers to arrays can fall out of scope (until the standard implements scoping syntax { } ).

 mov eax, 


dec References[eax]
jz doParent
okP:
mov eax,
dec References[eax]
jz doParent
okQ:
mov eax,
dec References[eax]
jz doParent
okR:
...
ret

retTable:
offset okP
offset okQ
offset okR
...
doParent:
mov edx, offset retTable
add edx, eax
sub edx, P ; 1st pointer in list of local pointers to arrays
push [edx] ; push return address
jmp DereferenceDescriptor

Now it is down to 3 instructions and 2 memory references for pointer going out of scope.

Jim Dempsey

0 Kudos
Reply