Run-time efficiency in using user-defined data types vs intrinsic types

jond · ‎08-28-2007

Hello,

I am trying to get my feet wet in object-oriented programming using Fortran. I have an application written originally using procedural programming methods. I am trying to re-write it using object-oriented methods. During this process, I created several MODULEs and defined several data TYPEs and related methods. The data TYPEs include several pointers to integer and real arrays since F95 doesn't allow ALLOCATABLE arrays in data types. When I compiled and ran the new code, the run-time was about 30% slower than its procedural counterpart which strictly uses one-dimensional arrays.

Then I did an internet search on the topic and came across some benchmark studies regarding object-oriented concepts in Fortran 90. The authors of the studies tested several compilers and concluded that using array pointers in user defined TYPEs produced 40% slower run-times on average. They also attributed to the at-the-time upcoming Fortran 2003 standard and the capability of having ALLOCATABLE arrays as part of data types. They thought that this feature would produce faster code.

Now on to my questions... They are very general. Does anybody have any experience on this issue? I am currently using CVF v6.6C and it has some issues with ALLOCATABLE arrays used in data types (comfirmed by Steve in the past). I heard that in IVF all releated issues are resolved. If I switch to IVF and redefine my array pointers as allocatable arrays, can I expect to achieve faster code? And lastly, I would appriciate any suggestions on the design of object-oriented Fortran code that will produce fast run-times. Most literature I found talks about the efficiency in terms of code re-usability but not run-time efficiency.

Thanks for all responses in advance,
Jon

Steven_L_Intel1 · ‎08-28-2007

We've done a lot of work in Intel Fortran to minimize the overhead of using pointers or allocatables in derived types. It's not entirely gone - you'll always pay SOME penalty over a normal local declaration of a regular array - but it is small. Yes, using ALLOCATABLE will help, though be aware that the semantics are subtly different when you assign one derived-type object to another.

jimdempseyatthecove · ‎08-28-2007

Jon

Pointer versis Array should be no different in runtime excepting if you have a lot of "p=>somewhere".

The problem in performanceI think you are experiencing is in how you are passing array arguments (either by pointer or by array reference). When your program was procedural you likley were passing array arguments as the base of an explicit shaped array. When you converted to "object oriented" you coincidentaly changed the argument passingto deferred shape.

Also note that in conversion to "object oriented"you likely went to

subroutine FOO(pObject, arg, ...)
...
pObject.member(i) = pObject.member(i) + bump

(or using % if you prefer)

You can un-deferr the array by passing as a dummy to a subroutine. i.e. use object oriented shell to call proceedural oriented subroutines. Simplified sample (you can add the interfaces)

type foo
...
real, pointer :: Array(:)! always allocated (1:n)
real :: Sum
...
end type foo

subroutine fooSum(aFoo)
 type(foo) :: aFoo
 aFoo.Sum = SumArray(aFoo.Array,size(aFoo.Array))
end subroutine fooSum

function SumArray(a,n)
 real :: a(1:n) ! *** Explicitly state the lower bound
 integer :: n
 integer :: i
 SumArray = 0.0
 do i=1,n
 SumArray=SumArray + a(i)
 end do
end function SumArray

Now when the loop is performed the lower bound is known in advance and thus a lower boundsadjustment is not required inside the loop. This is a dumbed up example but it should be sufficient for you to get the idea.

As a proof, compile and place the breakpoint on the statement inside the loop. Open Dissassembly window on break. Select loop and copy to clipboard, paste into Notepad. Then change a(1:n) in the subroutine to a(:) and rerun the test. Compare the before and after edits to see the difference.

Jim Dempsey

jimdempseyatthecove · ‎08-28-2007

BTW, compile the Debug session with full optimizaitons as you want to see the code as it will be in the Release state.

Note, even if the compiler optimizer extracts the lower bound from the deferred shape array descriptor and places it into a register this act is increasing register pressure (uses a register) and still is left with the subtraction of the lower bound whereas if it were known at compile time the lower bound can be represented as a constant in the same instruction that performs the indexing: Scale, Index, Base where Base can be offsetted at compile time vs computed at runtime.

Jim Dempsey

jond · ‎08-29-2007

Jim,

Thanks. I followed your suggestion of comparing the output from Disassembly window of your simple example. Even though I can't read assembly language it was interesting to see the difference. When the code was compiled in debug mode, the version with explicit-shape array generated 2 lines less code than the one with deferred-shape array. With full optimization, the version with deferred-shape array had about 40 lines of extra code.

But here is what I don't understand: My procedural code uses ALLOCATABLE arrays. All arrays are allocated in a subroutine at the beginning of the program. Then these arrays are passed to different subroutines through the execution of the program as deferred-shape arrays:

subroutine foo(array1,...,arrayn)
real(8)::array1(:),...,arrayn(:)
.
end subroutine foo

Based on your previous discussion, shouldn't my OO code have similar run-times since it too employes deferred-shape arrays?

Thanks,
Jon

jimdempseyatthecove · ‎08-29-2007

Jon,

Your OO code should have seen similar runtimes.

Consider:

 subroutine foo(array1,...,arrayn)
 real(8)::array1(:),...,arrayn(:)
 .
 end subroutine foo

Which is callable from your old code

call foo(FixedArray1, FixedArray2, ...)

And is callable from your new code

call foo(pObject.PointerToArray1, pObject.PointerToArray2, ...)

The foo subroutine code is identical (assuming compiled with same options).

The only difference is the method of obtaining the address of the array descriptor.

***

Note, foo has it's array arguments declared using "(:)" meaning the array descriptor has an unknown lower bound and unknown upper bound as well as unknown stride. e.g. the array comming in could be (-10:100:5) for indexes of(-10), (-5), (0), (5), ... (100). Therefore the code generated must extract and manipulate the values of lower bound, upper bound and stride from the array descriptor. Whereas if you declared the array arguments as

 real(8)::array1(1:),...,arrayn(1:)

The lower bound is known and the stride is also known to be 1. Therefore the generated code does not have to consult the array descriptor to obtain the lower bound and stride values and thus the code required to generate the indexing becomes much simpler and faster.

Jim Dempsey