Solved: Speed up calculations of derived typs with declared module procedures

MariusR · ‎07-09-2020

Hello everybody!

my name is Marius and I am an absolutely newcomer to programming with Fortran and all the stuff accompanying this doing. Due to this, most of the terms are not familiar to me and maybe I have to ask twice to get it. Most of the things I learned by trial and error and by reading some text books written explicit for beginners.

At the moment I am dealing with the implementation of dual numbers / hyper numbers to do automatic differentiation. For this purpose, I have made a module containing:

1. the derived type for the dual numbers, for instance:

TYPE,PUBLIC:: DUAL_NUM
REAL(8)::x_rl
REAL(8)::x_h1
END TYPE DUAL_NUM

2. the necessary procedures for extending the basic arithmetic’s, for instance addition of two dual numbers:

PUBLIC OPERATOR (+)

INTERFACE OPERATOR (+)
   MODULE PROCEDURE ADD_DN
END INTERFACE

CONTAINS

ELEMENTAL FUNCTION ADD_DN(u,v) RESULT(res)
   TYPE (DUAL_NUM), INTENT(IN)::u,v
   TYPE (DUAL_NUM)::res
   res%x_rl = u%x_rl+v%x_rl
   res%x_h1 = u%x_di+v%x_h1
END FUNCTION ADD_DN

I am able to use this module in a main file and the automatic differentiation does great. To get the second derivative, I used numeric differentiation by means of a central difference quotient.

To avoid this, I want to use hyper-dual numbers to get the second partial derivative in the very same manner with analytic precision. Instead of declaring 2 parts (one real and one "dual" part), these hyper-dual numbers requires four parts. The derived type and the procedure for addition reads:

TYPE,PUBLIC:: HYPER_NUM

REAL(8)::x_rl
REAL(8)::x_h1 
REAL(8)::x_h2 
REAL(8)::x_h3

END TYPE HYPER_NUM

PUBLIC OPERATOR (+)

INTERFACE OPERATOR (+)  
MODULE PROCEDURE ADD_DN 
END INTERFACE

CONTAINS

ELEMENTAL FUNCTION ADD_DN(u,v) RESULT(res)
   TYPE (DUAL_NUM), INTENT(IN)::u,v
   TYPE (DUAL_NUM)::res
   res%x_rl = u%x_rl+v%x_rl
   res%x_h1 = u%x_h1+v%x_h1
   res%x_h2 = u%x_h2+v%x_h2 
   res%x_h3 = u%x_h3+v%x_h3
END FUNCTION ADD_DN

Although this also does great, I am encountered with some serious loss of performance by comparing to the use of dual numbers and numeric differentiation (computation time nearly doubles). I am aware that the use of hyper-dual numbers requires a multiple number of floating-point operations (minimum twice the amount compared to dual numbers in the case of an addition), but to do numeric differentiation by means of a central difference quotient, I need to evaluate the function twice. Due to this, the total amount of operations is not so different.

I think the reason for this behavior is that the "standard" operations using generic Fortran functions (addition and multiplication operations) are much faster compared to the module procedures I explained.

One attempt do get around this problem was do place the operations within the lexical extent of an openMP directive-pair (!$OMP parallel sections, !$OMP section, .... , !$OMP end parallel sections), but this does the opposite and actually slows down the whole algorithm. I have to say that I know even less about openMP than I do about Fortran :), but I think all the operations with dual numbers/ hyper-dual numbers can be parallelized.

I would be very thankful if someone could give me some hints or advices to optimize the implementation of dual numbers/ hyper-dual numbers.

greetings

Marius

jimdempseyatthecove · ‎07-11-2020

In your sample code, you have explicitly declared the operator functions as ELEMENTAL. IOW, your intention is to use these functions in an elemental manner (on arrays of your type).

I suggest instead you declare a TYPE that is a collection of these objects, where each property of each object resides in an array. IOW the collection contains an array of x_rl, x_di, ...

Then your operator functions operate on the collections.

If you attribute the collection (TYPE) with TARGET, then, if you desire, you may be able to create a getter/putter that presents a scalar representation of your current type (at a given index into the collection).

Something along the line of this (untested) code:

TYPE DUAL_NUM_COLLECTION
  REAL(8), ALLOCATABLE :: x_rl(:), x_di(:)
CONTAINS
FUNCTION ALLOC(N) RESULT(R)
  INTEGER :: N, R
  ALLOCATE( x_rl(:), x_di(:), STAT=R)
END FUNCTION ALLOC



  INTERFACE OPERATOR(*)
    FUNCTION DUAL_NUM_COLLECTION_MUL(A, B) RESULT(R)
      TYPE(DUAL_NUM_COLLECTION), INTENT(IN) :: A, B
      TYPE(DUAL_NUM_COLLECTION) :: R
    END FUNCTION DUAL_NUM_COLLECTION_MUL
  END INTERFACE

...
FUNCTION DUAL_NUM_COLLECTION_MUL(A, B) RESULT(R)
  TYPE(DUAL_NUM_COLLECTION), INTENT(IN) :: A, B
  TYPE(DUAL_NUM_COLLECTION) :: R
  R%x_rl = A%x_rl * B%x_rl
  R%x_di = A%x_di * B%x_rl + A%x_rl * B%x_di
END FUNCTION DUAL_NUM_COLLECTION_MUL

You would have to add DUAL_NUM_COLLECTION member functions GET and PUT that fetch and store an individual type of DUAL_NUM (think of this as gather and scatter).

Jim Dempsey

View solution in original post

FortranFan · ‎07-09-2020

@MariusR

A few suggestions:

Instead of words such as "I am encountered with some serious loss of performance by comparing to the use of dual numbers and numeric differentiation (computation time nearly doubles)," you will likely receive more valuable and actionable feedback if you can provide some code illustrations of what exactly you're doing with your numerical differentiation computations and comparisons to determine the loss of performance. Even better will be if you can provide a working example, a small reproducer that hints at the issue even if it's nowhere as exact as your actual case which may have constraints in terms of what you can share in the public domain.
If you're working with arrays of your data, if possible consider a design which is "Structure of Arrays" (SoA). Based on your original post, you may be working with Arrays of Structures (AoS). See this.
Your interest appears more generic and pertaining to Fortran and as such, you may want to inquire also at the new Fortran discourse site (https://fortran-lang.discourse.group/). Also to consider will be comp.lang.fortran usenet newsgroup.

MariusR · ‎07-09-2020

@FortranFan

Thank you for your reply.

I will work through your points. Roughly speaking, it's about to implement a thermodynamic model (PC-SAFT and its modifications) as an excel add -in. To get the desired results, this model must be derived with respect to different variables, which is an elaborate procedure, in particular with regard to obtain double derivations. I hope that I can give an illustrative and representative example in the next days

greetings

Marius

FortranFan · ‎07-09-2020

@MariusR ,

For your discussions in the public domain, you can instead consider working with a simpler formulation, say the original van der Waals equation-of-state that can be worked out trivially and analytically for the typical thermodynamic properties of interest, and you can put together your Fortran code with such a model.

You can start with the Helmholtz' free energy relationship in terms of the natural variables of the equation-of-state i.e., temperature, molar volume/density, and mole numbers. All the properties of interest can then be determined by differentiation, either analytically or numerically. Such a prototype can give you insight into code design with other equation-of-state models.

jimdempseyatthecove · ‎07-10-2020

Too little information is provided to offer any of us any hope of making suggestions.

The example you show, only show operator (+).

Do you have (*), (/), (.SomeOtherOperator.)?

For simple (+)

ENUM, BIND(C)
  ENUMERATOR :: x_r1=1,x_h1,x_h2,x_h3,x_next
END ENUM

TYPE, PUBLIC :: HYPER_NUM
  REAL(8) :: x(x_next-1)
END TYPE HYPER_NUM

PUBLIC OPERATOR (+)

INTERFACE OPERATOR (+)  
MODULE PROCEDURE ADD_DN 
END INTERFACE

CONTAINS

ELEMENTAL FUNCTION ADD_DN(u,v) RESULT(res)
   TYPE (HYPER_NUM), INTENT(IN)::u,v
   TYPE (HYPER_NUM)::res
   res%x = u%x+v%x
END FUNCTION ADD_DN

*** However, It may be beneficial (performance wise) to shy away from a single type (OOP) as opposed to an aggregation of types.

ENUM, BIND(C)
  ENUMERATOR :: x_r1=1,x_h1,x_h2,x_h3,x_next
END ENUM
INTEGER, PARAMETER :: HYPER_NUM_SIZE = x_next-1

...
REAL(8), ALLOCATABLE :: COLLECTION_A(:,:) ! either (nObjects, HYPER_NUM_SIZE) or (HYPER_NUM_SIZE, nObjects) 
REAL(8), ALLOCATABLE :: COLLECTION_B(:,:) ! either (nObjects, HYPER_NUM_SIZE) or (HYPER_NUM_SIZE, nObjects) 

...
! equivilent to operator(+) with both collections
COLLECTION_A = COLLECTION_A + COLLECTION_B

*** the order of the subscripts will be dependent upon the algorithms used in your program. One is likely more efficient at vectorization than the other.

Using the dual subscript approach, it is almost trivial to experiment with which order of subscripting is best.

Jim Dempsey

MariusR · ‎07-11-2020

@jimdempseyatthecove

Hello Jim

thank you for your reply. I will try out this kind of data typ (enum), which is new to me. To implement the dual or hyper numbers I used the first best method that I understood => Derived Type. Of course it doesn't have to be the fastest and I'm open for suggestions

And yes, the (*), (/) as well as a lot of other operations have been extended to take the derived type ((**), (log), (exp) , ....). The formula for multiplication and division reads:

ELEMENTAL FUNCTION MULT_DN(u,v) RESULT(res)

 TYPE (DUAL_NUM), INTENT(IN)::u,v
 TYPE (DUAL_NUM)::res

 res%x_rl = u%x_rl*v%x_rl
 res%x_di= u%x_di*v%x_rl + u%x_rl*v%x_di

END FUNCTION MULT_DN

ELEMENTAL FUNCTION DIV_DN(u,v) RESULT(res)

 TYPE (DUAL_NUM), INTENT(IN)::u,v
 TYPE (DUAL_NUM)::res

 res%x_rl = u%x_rl/v%x_rl
 res%x_di =(u%x_di- res%x_rl*v%x_di)/v%x_rl

END FUNCTION DIV_DN

The general formula for scalar functions like exp or log is:

f(dual_number)%x_rl=f(dual_number%x_rl) => Function evaluated at the real-part

f(dual_number)%x_di=dual_number%di*f'(dual_number%x_rl) => Derivative of the function evaluated on the real part times the dual_part

At the moment I'm very busy with exams, so please don't think that I'm ignoring your suggestions just because I need some time to answer.

Greetings

Marius

Steve_Lionel · ‎07-11-2020

ENUM is not actually a new type, it's just another way of declaring a named constant (PARAMETER). There is a proposal for Fortran 202X to introduce a more C-like typed enumerator, but the specification has not yet been agreed to.

jimdempseyatthecove · ‎07-11-2020

In your sample code, you have explicitly declared the operator functions as ELEMENTAL. IOW, your intention is to use these functions in an elemental manner (on arrays of your type).

I suggest instead you declare a TYPE that is a collection of these objects, where each property of each object resides in an array. IOW the collection contains an array of x_rl, x_di, ...

Then your operator functions operate on the collections.

If you attribute the collection (TYPE) with TARGET, then, if you desire, you may be able to create a getter/putter that presents a scalar representation of your current type (at a given index into the collection).

Something along the line of this (untested) code:

TYPE DUAL_NUM_COLLECTION
  REAL(8), ALLOCATABLE :: x_rl(:), x_di(:)
CONTAINS
FUNCTION ALLOC(N) RESULT(R)
  INTEGER :: N, R
  ALLOCATE( x_rl(:), x_di(:), STAT=R)
END FUNCTION ALLOC



  INTERFACE OPERATOR(*)
    FUNCTION DUAL_NUM_COLLECTION_MUL(A, B) RESULT(R)
      TYPE(DUAL_NUM_COLLECTION), INTENT(IN) :: A, B
      TYPE(DUAL_NUM_COLLECTION) :: R
    END FUNCTION DUAL_NUM_COLLECTION_MUL
  END INTERFACE

...
FUNCTION DUAL_NUM_COLLECTION_MUL(A, B) RESULT(R)
  TYPE(DUAL_NUM_COLLECTION), INTENT(IN) :: A, B
  TYPE(DUAL_NUM_COLLECTION) :: R
  R%x_rl = A%x_rl * B%x_rl
  R%x_di = A%x_di * B%x_rl + A%x_rl * B%x_di
END FUNCTION DUAL_NUM_COLLECTION_MUL

You would have to add DUAL_NUM_COLLECTION member functions GET and PUT that fetch and store an individual type of DUAL_NUM (think of this as gather and scatter).

Jim Dempsey

MariusR · ‎07-14-2020

@jimdempseyatthecove

thank you very much for this suggestion! It introduces me to the type-bound-procedures, which seems to be a very nice feature!

To collect the real- and dual parts into arrays instead to do the other way and collect dual_numbers into arrays is a good idea, thank you!

But when I tried to implement this, I ran into some problems. I don't know the rank of my arrays in advance. Is there a way to define a derived type in an assumed-rank-manner? The aim is to do something like this:

program test
    	type(dual_num) :: a
	! a contains a type-bound-procedure to allocate its sub-parts
	call a%alloc(0)		! => a%x_rl and a%x_di will be scalars
	call a%alloc(N1,N2)	! => a%x_rl and a%x_di will be arrays with N1xN2 - dimension
end program test

I searched quite a while and didn’t find an answer, except a statement from Steve Lionel: "It's not possible to have an assumed rank derived type component."

I do not want to define several different derived types in order to accomplish this.

Edit.: Maybe you adressed this problem with some of your statements, sry if i didn't get it

jimdempseyatthecove · ‎07-14-2020

Perhaps you can consider using a generic interface:

https://software.intel.com/content/www/us/en/develop/documentation/fortran-compiler-developer-guide-and-reference/top/language-reference/program-units-and-procedures/procedure-interfaces/defining-generic-names-for-procedures.html

or

Use CLASS to hold a polymorphic TYPE (to use these requires SELECT TYPE)

https://software.intel.com/content/www/us/en/develop/documentation/fortran-compiler-developer-guide-and-reference/top/language-reference/a-to-z-reference/s-1/select-type.html

*** The title of this thread is:

Speed up calculations of derived typs with declared module procedures

But what you are asking to do now is hide the details by introducing an abstraction layer. This generally means making the runtime code figure out what to do (select what to do at runtime). And this is counter-productive to your stated goal.

>> I don't know the rank of my arrays in advance.

Can you work with a limited number of ranks? Say 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 and then define each a type. Then use the generic interface to deal with the abstraction disambiguation.

You can also consider using an allocatable array of size of the desired rank containing a pointer to a type containing an allocatable rank 1 array.

Note, all these indirections can (will?) introduce additional runtime overhead.

Jim Dempsey

Andrew_Smith · ‎07-16-2020

This technique is well known and often called AD.

There are some commercial libraries available (NAG offer one for Fortran).

I have been using this for many years and have similar code to yours. One modification is to have the differential part as a vector so that you can differentiate with respect to many variables at once. The calculation of the vector term lends itself to vectorization optimsations by the compiler.

I have an outer code that defines the length of the vector, and an include file that defines the "dual math" object and implements all the functions. This allows some flexibility without having to use an allocatable vector which is very slow. I have a few pre-compiled lengths available 1, 2, 3, 4, 6. Then if I need a longer vector I go back to repeated calls.

I have tried using the new parameterized length but dropped it because of too many compiler issues in it early implementation of paramertized derived types. I am not sure what the performance would be like compared to fixed size and allocatable arrays.

MariusR · ‎07-23-2020

Thank you all for your help. The idea to organize the individual types into Arrays, as suggested by Jim, gave a lot of performance back. It took quite a while to implement this, because it changed nearly everything, but it was worth it.

Another mistake i have made and not think about was that most of the time the computations doesn't need to be calculated whit the hyper_numbers. Due to this, a lot of computations were zero computations (0*0 and 0+0). I have now implemented a case differentiation, which helped a lot.

Speed up calculations of derived typs with declared module procedures

Fortran Language

Performance

Speed up calculations of derived typs with declared module procedures