- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Suppose we have a Fortran function (for example a mathematical optimization algorithm) that takes as input, another Fortran function:
myOptimizer(func)
Now depending on the user's choice, the input function could be from a list of several different functions. This list of choices can be implemented via an if-block:
if (userChoice=='func1') then myOptimizer(func1) elseif (userChoice=='func2') then myOptimizer(func2) elseif (userChoice=='func3') then myOptimizer(func3) end if
Alternatively, I could also define function pointers, and write this as,
if (userChoice=='func1') then func => func1 elseif (userChoice=='func2') then func => func2 elseif (userChoice=='func3') then func => func3 end if myOptimizer(func)
Based on my tests with Intel Fortran Compiler 2017 with O2 flag, the second implementation happens to be slower by several factors (4-5 times slower than the if-block implementation). From the software development perspective, I would strongly prefer the second approach since it results in much more concise and cleaner code. However, performance also equally matters in my problem.
Is this loss of performance by indirect function calls, specific to my problem, or is it expected in all Fortran codes? or is it a compiler-dependent issue? Is there a solution to using indirect function calls without performance loss?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your test (that produces 4-5 times slower) may not be reflective of how you will actually code. IOW for the test program you may have a dummy function that does nothing (or produces results that are never used). In the first case you are measuring call overhead and not computation+overhead differences. In the second case, the compiler may have optimized out the function call (result not used -- eliminate code as if dead code).
Additionally, you do not specify if the func's are contained functions or external functions.
I suggest you produce a test run using VTune (both cases) and compare the differences in the disassembly code. You may find the first case optimized the code out of the program.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The two codes are not equivalent. Consider what happens if userChoice has the value "Oops!".
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
! or maybe myOptimiser( FuncSelecterFunc(userchoice) ) ! where FuncSelecterFunc returns a function pointer
BTW note Jims comments, but even then I would suggest that 5 x virtually nothing = virtually nothing, I would have thought the more significant times would be elsewhere in the code at does real work. Go with whatever scheme gives the clearest code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My take on this problem is that most of the work probably gets done in subroutine myOptimizer, and that makes it rather strange that there is any difference in performance between the two methods because myOptimizer sees the same thing passed in either case: the address of procedure func1 (or func2 or func3...). Here is a test:
module N use ISO_C_BINDING implicit none contains subroutine myOptimizer(f) bind(C,name='MYOPTIMIZER') type(C_FUNPTR), value :: f character(20) fmt write(fmt,'(*(g0))') '(a,Z0.',bit_size(0_C_INTPTR_T)/4,')' write(*,fmt) 'Address passed = ',transfer(f,0_C_INTPTR_T) end subroutine myOptimizer end module N module M use ISO_FORTRAN_ENV, only: wp => REAL64 implicit none contains function func1(x) real(wp) func1 real(wp) x func1 = 2*x end function func1 function func2(x) real(wp) func2 real(wp) x func2 = x**2 end function func2 function func3(x) real(wp) func3 real(wp) x func3 = 2**x end function func3 end module M program P use M implicit none procedure(func1), pointer :: func write(*,'(*(g0))') 'Test with func1:' call myOptimizer(func1) write(*,'(*(g0))') 'Test with func => func1:' func => func1 call myOptimizer(func) write(*,'(*(g0))') 'Test with func2:' call myOptimizer(func2) write(*,'(*(g0))') 'Test with func => func2:' func => func2 call myOptimizer(func) write(*,'(*(g0))') 'Test with func3:' call myOptimizer(func3) write(*,'(*(g0))') 'Test with func => func3:' func => func3 call myOptimizer(func) end program P
Output with ifort, 64 bits:
Test with func1: Address passed = 00007FF7E2671430 Test with func => func1: Address passed = 00007FF7E2671430 Test with func2: Address passed = 00007FF7E2671440 Test with func => func2: Address passed = 00007FF7E2671440 Test with func3: Address passed = 00007FF7E2671450 Test with func => func3: Address passed = 00007FF7E2671450
So you can see that myOptimizer got the same address in both cases. Now, if you changed myOptimizer in the second case so that its dummy argument were a procedure pointer rather than a procedure, that might be a little different. But you don't have to do that and I'm going do assume that you probably didn't.
Assuming that your measurement technique was accurate, the only thing I could think of that could change performance like this is if the compiler could see the code for myOptimizer and func1, func2, and func3 so that it could inline func1 within myOptimizer in the first case. You could tell by looking at a disassembly, or even by looking at the size of the *.obj file ifort produces in both cases. If there's inlining the first case should produce a lot of code bloat because there should be something like 4 versions of myOptimizer: one each for func1, func2, and func3, and another for a generic version.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If the myOptimizer function is a dummy do nothing function (that is not optimized out, e.g. an external function not visible to the compiler), then there is a significant difference in the two methods in that the second one (pointer) has a write instruction whereas the first does not (note, depending on where func resides and/or is attributed, it might not be permitted for the compiler optimization to optimize the storage of func into a register (IOW a physical write to memory is performed.
This is another reason why it is important for the posters to post a complete sample program (and include compiler switches, version, etc...)
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page