- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
we have a pretty large code in Fortran 2003 which uses type-bound procedures and CLASS arguments quite a lot. During profiling, we observed hotspots in extremely simple routines that basically only passed on a call to a type bound procedure.
I did investigate this a little further and found suprising (at least to me) performance differences depending upon whether CLASS or TYPE was used as a dummy argument. While I expected CLASS to be slower, I did not expect it be that much slower.
I did write a little test program, which is attached at the end of the post. I did a loop calling a subroutine, which in turn calls a type-bound procedure. TheCase 1 uses CLASS dummy arguments in the subroutine, whereas Case 2 uses TYPE dummy arguments. Case3 bypasses type-bound completely anduses directcalling.
Case 1 Case 2 Case 3
(CLASS) (TYPE) (direct, no TBP)
Debug mode 25.4s25.4s 7.25s
Release (all opt.) 15.0s1s0.98s
Release (w/o inlining) 15.8s 13.15s 3.5s
What bothers me most, it that the Case 1 does hardly benefit fromRelease optimizations. Compared to the TYPE variant, the codeperformed 15x slower.Also, it seems like that Case 2is only fast if inlineis turned on. So I expect that in cases that inlining is prevented by some other reason,case 2will also not perform well.
Is this behavior expected?Is there anyway to reduce the overhead caused by usingCLASSdummy arguments and type-bound procedure?
Looking at the assembly, there seem to be a lot more instructions involved with the CLASSdummy arguments. It seems to me that some kind of temporary objects are created. However, I do not have enough understanding of assembly and the internal object structure to fully understand the assembler code.
Any help on this topicwould be greatly appreciated
Edit: I forgot to mention that I see this behavior with the latest IVF 12.1 (Update 6) on Win32. I don't know if the behavior was the same before and I don't have prior versions installed any more.
regards,
Thomas
we have a pretty large code in Fortran 2003 which uses type-bound procedures and CLASS arguments quite a lot. During profiling, we observed hotspots in extremely simple routines that basically only passed on a call to a type bound procedure.
I did investigate this a little further and found suprising (at least to me) performance differences depending upon whether CLASS or TYPE was used as a dummy argument. While I expected CLASS to be slower, I did not expect it be that much slower.
I did write a little test program, which is attached at the end of the post. I did a loop calling a subroutine, which in turn calls a type-bound procedure. TheCase 1 uses CLASS dummy arguments in the subroutine, whereas Case 2 uses TYPE dummy arguments. Case3 bypasses type-bound completely anduses directcalling.
Case 1 Case 2 Case 3
(CLASS) (TYPE) (direct, no TBP)
Debug mode 25.4s25.4s 7.25s
Release (all opt.) 15.0s1s0.98s
Release (w/o inlining) 15.8s 13.15s 3.5s
What bothers me most, it that the Case 1 does hardly benefit fromRelease optimizations. Compared to the TYPE variant, the codeperformed 15x slower.Also, it seems like that Case 2is only fast if inlineis turned on. So I expect that in cases that inlining is prevented by some other reason,case 2will also not perform well.
Is this behavior expected?Is there anyway to reduce the overhead caused by usingCLASSdummy arguments and type-bound procedure?
Looking at the assembly, there seem to be a lot more instructions involved with the CLASSdummy arguments. It seems to me that some kind of temporary objects are created. However, I do not have enough understanding of assembly and the internal object structure to fully understand the assembler code.
Any help on this topicwould be greatly appreciated
Edit: I forgot to mention that I see this behavior with the latest IVF 12.1 (Update 6) on Win32. I don't know if the behavior was the same before and I don't have prior versions installed any more.
regards,
Thomas
[fortran]! FortranPerfTests.f90 ! ! FUNCTIONS: ! FortranPerfTests - Entry point of console application. ! MODULE TypeDefs TYPE :: MyType REAL(8) :: Val = 1.0 CONTAINS PROCEDURE :: Add END TYPE TYPE, EXTENDS(MyType) :: MyTypeExt END TYPE CONTAINS SUBROUTINE Add(this, Original) CLASS (MyType) :: this CLASS (MyType) :: Original this%Val = this%Val + Original%Val END SUBROUTINE SUBROUTINE AddDirect(this, Original) TYPE (MyTypeExt) :: this TYPE (MyTypeExt) :: Original this%Val = this%Val + Original%Val END SUBROUTINE SUBROUTINE ViaClass(A,B) CLASS (MyType) :: A CLASS (MyType) :: B CALL A%Add(B) END SUBROUTINE SUBROUTINE ViaType(A,B) TYPE (MyTypeExt) :: A TYPE (MyTypeExt) :: B CALL A%Add(B) END SUBROUTINE SUBROUTINE Direct(A,B) TYPE (MyTypeExt) :: A TYPE (MyTypeExt) :: B CALL AddDirect(A,B) END SUBROUTINE END MODULE program FortranPerfTests USE TypeDefs implicit none ! Variables INTEGER :: I TYPE (MyTypeExt) :: A,B REAL(8) :: T1Start, T1End, T2Start, T2End, T3Start, T3End ! Body of FortranPerfTests ! Case 1: Call using class dummy arguments CALL CPU_TIME(T1Start) DO I=1,1E9 CALL ViaClass(A,B) END DO CALL CPU_TIME(T1End) WRITE (*,*) A%Val ! Case 2: Call using TYPE dummy arguments A%Val = 1 CALL CPU_TIME(T2Start) DO I=1,1E9 CALL ViaType(A,B) END DO CALL CPU_TIME(T2End) WRITE (*,*) A%Val ! Case 3: Call via subroutine and type dummy arguments A%Val = 1 CALL CPU_TIME(T3Start) DO I=1,1E9 CALL Direct(A,B) END DO CALL CPU_TIME(T3End) WRITE (*,*) A%Val WRITE (*,*) 'Type-bound via class arguments:', T1End - T1Start WRITE (*,*) 'Type-bound via type arguments:', T2End-T2Start WRITE (*,*) 'Subroutine call via type arguments:', T3End-T3Start end program FortranPerfTests [/fortran]
1 Solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you are passing something that is not polymorphic to an argument that is, the compiler has to create the class descriptor - there is no way around that.
Link Copied
9 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Any time a decision is deferred from compile time to run time, performance will suffer. In the class case, the compiler has to generate code to determine the dynamic type and look up the correct routine to call. This is not amenable to optimization such as inlining, which is probably helping the type case.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Steve,
I did expect some performancepenalties for the reasons that you quote. However, I did not expect that overhead to be that large. That makes small type bound procedures for CLASS objects (likeget/setmethods)pretty much unusable.
When I look at the assembly code, it seems tome that a significant amount of memory is written to (mainly filled with zeros, some entries being nonzero). To me, it looks like temporary objects are created, which I did not expect in this case.
regards,
Thomas
I did expect some performancepenalties for the reasons that you quote. However, I did not expect that overhead to be that large. That makes small type bound procedures for CLASS objects (likeget/setmethods)pretty much unusable.
When I look at the assembly code, it seems tome that a significant amount of memory is written to (mainly filled with zeros, some entries being nonzero). To me, it looks like temporary objects are created, which I did not expect in this case.
regards,
Thomas
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Probably some descriptors being created. I'll take a look and ask the developers if there's something obviously wrong. It may be a case of figuring out how to optimize this, much as Fortran 90 compiler had performance issues with array operations earlier.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Steve,
do you have any news regarding this topic?
regards,
Thomas
do you have any news regarding this topic?
regards,
Thomas
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've looked over the generated code with one of our developers and we're puzzled because there is a lot of code being generated for one case other than the one seeming to take all the time. But what we saw was the need to create a class descriptor and pass it when one wasn't available, and that takes time.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Steve,
thanks a lot for looking at the code. If I understand correctly, the overhead is by design.
Is there any way to prevent the creation of a class descriptor?
Our problem is, that we have built a library for handling gas properties and we access most gas states via accessor/setter routines, which are called million of times. Most of them are tiny and have almost zero computational cost. However, due to using CLASS objects, our code does use significant time according to the profiler.
When looking closely, we see that most time is spent in code similar to what is generated in the example I gave.
It would be great if we could prevent the creation of the class descriptors millions of times and maybe move their creation to a higher call-level, which is not called that often. Any ideas would be very welcome.
regards,
Thomas
thanks a lot for looking at the code. If I understand correctly, the overhead is by design.
Is there any way to prevent the creation of a class descriptor?
Our problem is, that we have built a library for handling gas properties and we access most gas states via accessor/setter routines, which are called million of times. Most of them are tiny and have almost zero computational cost. However, due to using CLASS objects, our code does use significant time according to the profiler.
When looking closely, we see that most time is spent in code similar to what is generated in the example I gave.
It would be great if we could prevent the creation of the class descriptors millions of times and maybe move their creation to a higher call-level, which is not called that often. Any ideas would be very welcome.
regards,
Thomas
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you are passing something that is not polymorphic to an argument that is, the compiler has to create the class descriptor - there is no way around that.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Steve,
thanks a lot for your continued support. With your last answer, you finally pointed me to a solution that I hope will allow us to reduce the runtime overhead significantly for our usage.
In the example above, if I declare the object A and B polymorphic from the beginning, i.e. I replace the declaration
TYPE (MyTypeExt) :: A, B
by
CLASS (MyTypeExt), ALLOCATABLE:: A,B
and allocate the objects in the code, the the class descriptor is created only once instead of 1e9 times.
By that, the overhead using the CLASS variant went down significantly, and now the variant with the class arguments only takes about three times as long as the other. That is perfectly acceptable given the additional benefits of polymorphism.
As we typically have long-lived objects that get used a lot, I believe this approach should be applicable to our real code as well and hopefully reduce the overhead that we currently see.
Best regards,
Thomas
thanks a lot for your continued support. With your last answer, you finally pointed me to a solution that I hope will allow us to reduce the runtime overhead significantly for our usage.
In the example above, if I declare the object A and B polymorphic from the beginning, i.e. I replace the declaration
TYPE (MyTypeExt) :: A, B
by
CLASS (MyTypeExt), ALLOCATABLE:: A,B
and allocate the objects in the code, the the class descriptor is created only once instead of 1e9 times.
By that, the overhead using the CLASS variant went down significantly, and now the variant with the class arguments only takes about three times as long as the other. That is perfectly acceptable given the additional benefits of polymorphism.
As we typically have long-lived objects that get used a lot, I believe this approach should be applicable to our real code as well and hopefully reduce the overhead that we currently see.
Best regards,
Thomas
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Glad to hear it.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page