Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28435 Discussions

Runtime overhead of using CLASS dummy arguments

thomas_boehme
New Contributor II
811 Views
Hi,

we have a pretty large code in Fortran 2003 which uses type-bound procedures and CLASS arguments quite a lot. During profiling, we observed hotspots in extremely simple routines that basically only passed on a call to a type bound procedure.

I did investigate this a little further and found suprising (at least to me) performance differences depending upon whether CLASS or TYPE was used as a dummy argument. While I expected CLASS to be slower, I did not expect it be that much slower.

I did write a little test program, which is attached at the end of the post. I did a loop calling a subroutine, which in turn calls a type-bound procedure. TheCase 1 uses CLASS dummy arguments in the subroutine, whereas Case 2 uses TYPE dummy arguments. Case3 bypasses type-bound completely anduses directcalling.

Case 1 Case 2 Case 3
(CLASS) (TYPE) (direct, no TBP)
Debug mode 25.4s25.4s 7.25s
Release (all opt.) 15.0s1s0.98s
Release (w/o inlining) 15.8s 13.15s 3.5s

What bothers me most, it that the Case 1 does hardly benefit fromRelease optimizations. Compared to the TYPE variant, the codeperformed 15x slower.Also, it seems like that Case 2is only fast if inlineis turned on. So I expect that in cases that inlining is prevented by some other reason,case 2will also not perform well.

Is this behavior expected?Is there anyway to reduce the overhead caused by usingCLASSdummy arguments and type-bound procedure?

Looking at the assembly, there seem to be a lot more instructions involved with the CLASSdummy arguments. It seems to me that some kind of temporary objects are created. However, I do not have enough understanding of assembly and the internal object structure to fully understand the assembler code.

Any help on this topicwould be greatly appreciated

Edit: I forgot to mention that I see this behavior with the latest IVF 12.1 (Update 6) on Win32. I don't know if the behavior was the same before and I don't have prior versions installed any more.

regards,
Thomas

[fortran]!  FortranPerfTests.f90 
!
!  FUNCTIONS:
!  FortranPerfTests - Entry point of console application.
!

MODULE TypeDefs

TYPE :: MyType
  REAL(8) :: Val = 1.0
CONTAINS
  PROCEDURE :: Add 
END TYPE

TYPE, EXTENDS(MyType) :: MyTypeExt
END TYPE

CONTAINS

SUBROUTINE Add(this, Original)
CLASS (MyType) :: this
CLASS (MyType) :: Original
  this%Val = this%Val + Original%Val 
END SUBROUTINE

SUBROUTINE AddDirect(this, Original)
TYPE (MyTypeExt) :: this
TYPE (MyTypeExt) :: Original
  this%Val = this%Val + Original%Val 
END SUBROUTINE

SUBROUTINE ViaClass(A,B)
  CLASS (MyType) :: A
  CLASS (MyType) :: B
  CALL A%Add(B)
END SUBROUTINE
    
SUBROUTINE ViaType(A,B)
  TYPE (MyTypeExt) :: A
  TYPE (MyTypeExt) :: B
  CALL A%Add(B)
END SUBROUTINE

SUBROUTINE Direct(A,B)
  TYPE (MyTypeExt) :: A
  TYPE (MyTypeExt) :: B
  CALL AddDirect(A,B)
END SUBROUTINE

END MODULE

  program FortranPerfTests
  USE TypeDefs

  implicit none
                  
  ! Variables
  INTEGER :: I
  TYPE (MyTypeExt) :: A,B    
  REAL(8) :: T1Start, T1End, T2Start, T2End, T3Start, T3End
  ! Body of FortranPerfTests                                                                                             
    
  ! Case 1: Call using class dummy arguments
  CALL CPU_TIME(T1Start)
  DO I=1,1E9
    CALL ViaClass(A,B)
  END DO
  CALL CPU_TIME(T1End)
  WRITE (*,*) A%Val                                       
  
  ! Case 2: Call using TYPE dummy arguments
  A%Val = 1
  CALL CPU_TIME(T2Start)
  DO I=1,1E9
    CALL ViaType(A,B)
  END DO
  CALL CPU_TIME(T2End)            
  WRITE (*,*) A%Val

  ! Case 3: Call via subroutine and type dummy arguments
  A%Val = 1
  CALL CPU_TIME(T3Start)
  DO I=1,1E9
    CALL Direct(A,B)
  END DO
  CALL CPU_TIME(T3End)            
  WRITE (*,*) A%Val
    
  WRITE (*,*) 'Type-bound via class arguments:',  T1End - T1Start 
  WRITE (*,*) 'Type-bound via type arguments:', T2End-T2Start
  WRITE (*,*) 'Subroutine call via type arguments:', T3End-T3Start

    
    
  end program FortranPerfTests

[/fortran]

0 Kudos
1 Solution
Steven_L_Intel1
Employee
811 Views
If you are passing something that is not polymorphic to an argument that is, the compiler has to create the class descriptor - there is no way around that.

View solution in original post

0 Kudos
9 Replies
Steven_L_Intel1
Employee
811 Views
Any time a decision is deferred from compile time to run time, performance will suffer. In the class case, the compiler has to generate code to determine the dynamic type and look up the correct routine to call. This is not amenable to optimization such as inlining, which is probably helping the type case.
0 Kudos
thomas_boehme
New Contributor II
811 Views
Hi Steve,

I did expect some performancepenalties for the reasons that you quote. However, I did not expect that overhead to be that large. That makes small type bound procedures for CLASS objects (likeget/setmethods)pretty much unusable.

When I look at the assembly code, it seems tome that a significant amount of memory is written to (mainly filled with zeros, some entries being nonzero). To me, it looks like temporary objects are created, which I did not expect in this case.

regards,
Thomas




0 Kudos
Steven_L_Intel1
Employee
811 Views
Probably some descriptors being created. I'll take a look and ask the developers if there's something obviously wrong. It may be a case of figuring out how to optimize this, much as Fortran 90 compiler had performance issues with array operations earlier.
0 Kudos
thomas_boehme
New Contributor II
811 Views
Hi Steve,

do you have any news regarding this topic?

regards,
Thomas
0 Kudos
Steven_L_Intel1
Employee
811 Views
I've looked over the generated code with one of our developers and we're puzzled because there is a lot of code being generated for one case other than the one seeming to take all the time. But what we saw was the need to create a class descriptor and pass it when one wasn't available, and that takes time.
0 Kudos
thomas_boehme
New Contributor II
811 Views
Hi Steve,

thanks a lot for looking at the code. If I understand correctly, the overhead is by design.

Is there any way to prevent the creation of a class descriptor?

Our problem is, that we have built a library for handling gas properties and we access most gas states via accessor/setter routines, which are called million of times. Most of them are tiny and have almost zero computational cost. However, due to using CLASS objects, our code does use significant time according to the profiler.

When looking closely, we see that most time is spent in code similar to what is generated in the example I gave.

It would be great if we could prevent the creation of the class descriptors millions of times and maybe move their creation to a higher call-level, which is not called that often. Any ideas would be very welcome.

regards,
Thomas

0 Kudos
Steven_L_Intel1
Employee
812 Views
If you are passing something that is not polymorphic to an argument that is, the compiler has to create the class descriptor - there is no way around that.
0 Kudos
thomas_boehme
New Contributor II
811 Views
Hi Steve,

thanks a lot for your continued support. With your last answer, you finally pointed me to a solution that I hope will allow us to reduce the runtime overhead significantly for our usage.

In the example above, if I declare the object A and B polymorphic from the beginning, i.e. I replace the declaration

TYPE (MyTypeExt) :: A, B
by
CLASS (MyTypeExt), ALLOCATABLE:: A,B

and allocate the objects in the code, the the class descriptor is created only once instead of 1e9 times.

By that, the overhead using the CLASS variant went down significantly, and now the variant with the class arguments only takes about three times as long as the other. That is perfectly acceptable given the additional benefits of polymorphism.

As we typically have long-lived objects that get used a lot, I believe this approach should be applicable to our real code as well and hopefully reduce the overhead that we currently see.

Best regards,
Thomas

0 Kudos
Steven_L_Intel1
Employee
811 Views
Glad to hear it.
0 Kudos
Reply