Derived type performance

Pellegrini__Etienne · ‎12-08-2015

Hello,

I am trying to solve a problem involving quite a lot of data, and started creating derived types which pack the data together. For simplicity purposes, let's say I have an allocatable array of the PARENT type, each of which contains an allocatable array of the CHILD type, and each child has a allocatable array of DATA. All the data, and derived type definition and declarations reside in a common module, so in global memory.

Most of my operations involve data at the PARENT(i) % CHILD(j) % DATA level. I was wondering if there was a difference in performance when passing (i,j) to a subroutine and access PARENT(i) % CHILD(j) % DATA from the global memory, compared to passing PARENT(i) % CHILD(j) % DATA directly? Since the data is supposedly passed by reference (my DATA arrays are contiguous and I'm not passing non contiguous slices, my dummy arguments are adjustable arrays), I'm guessing the two are approximately equivalent, but I don't know for sure. Also, I want to be able to use openMP to work on several i's at the same time, but I think I can do it with both options.

Thank you very much for your help!

jimdempseyatthecove · ‎12-08-2015

Does your processing have interaction between different PARENTs?
Different CHILDs within PARENT?
Different CHILDs across PARENTs?

You want to assure that your data organization supports (is favorable to) vectorization of the algorithms that you will apply to the data.

do I=1, nI
  do II=I+1, nI
    do J=1, nJ
      do JJ=1, nJ
        call Interact(PARENT(I)%CHILD(J)%DATA, PARENT(II)%CHILD(JJ)%DATA)
      end do
    end do
  end do
end do

And where DATA is an array of properties (e.g. PositionX, PositionY, PositionZ, VelocityX, VelocityY, VelocityZ, MassM, etc...)

Then, though DATA is contiguous, you interactions will mostly require scalar instructions.

Jim Dempsey

Pellegrini__Etienne · ‎12-08-2015

Thank you for your answer. I actually work on the DATA in the structures, but there is no interaction between the different structures.

I was really mostly wondering about the performance of the calls of subroutines using the data, and the data access (accessing an array probably is cheaper than going down several levels of derived types). However, from the programmer point of view, sometimes it's easier to not have to go down several levels of derived types to reach the important data, but sometimes it's easier to pass a structure rather than 25 arguments... I wanted an idea of if the performance would be affected by my choice, or if I could simply use whichever solution is easier from the programmer point of view in each routine.

Thank you!

jimdempseyatthecove · ‎12-09-2015

Although compiler optimizations are good at common sub-expression elimination, sometimes the compiler has issues with efficiently optimizing multi-level-deep nested objects. On a case by case basis (determined with VTune), you may find it necessary to assist the optimizations by calling out to a subroutine passing a reference to a sub-component (e.g. PARTEN(I)%CHILD(J)) as opposed to using it inline. With the ability for procedures to have CONTAINS subroutines it is somewhat easy to relatively clear to do. You also have ASSOCIATE that may be more appropriate.

Jim Dempsey