efficiency loss due to use of function pointer in place of if-block

DataScientist · ‎11-16-2017

Suppose we have a Fortran function (for example a mathematical optimization algorithm) that takes as input, another Fortran function:

myOptimizer(func)

Now depending on the user's choice, the input function could be from a list of several different functions. This list of choices can be implemented via an if-block:

if (userChoice=='func1') then
    myOptimizer(func1)
elseif (userChoice=='func2') then
    myOptimizer(func2)
elseif (userChoice=='func3') then
    myOptimizer(func3)
end if

Alternatively, I could also define function pointers, and write this as,

if (userChoice=='func1') then
    func => func1
elseif (userChoice=='func2') then
    func => func2
elseif (userChoice=='func3') then
    func => func3
end if
myOptimizer(func)

Based on my tests with Intel Fortran Compiler 2017 with O2 flag, the second implementation happens to be slower by several factors (4-5 times slower than the if-block implementation). From the software development perspective, I would strongly prefer the second approach since it results in much more concise and cleaner code. However, performance also equally matters in my problem.

Is this loss of performance by indirect function calls, specific to my problem, or is it expected in all Fortran codes? or is it a compiler-dependent issue? Is there a solution to using indirect function calls without performance loss?

jimdempseyatthecove · ‎11-16-2017

Your test (that produces 4-5 times slower) may not be reflective of how you will actually code. IOW for the test program you may have a dummy function that does nothing (or produces results that are never used). In the first case you are measuring call overhead and not computation+overhead differences. In the second case, the compiler may have optimized out the function call (result not used -- eliminate code as if dead code).

Additionally, you do not specify if the func's are contained functions or external functions.

I suggest you produce a test run using VTune (both cases) and compare the differences in the disassembly code. You may find the first case optimized the code out of the program.

Jim Dempsey

IanH · ‎11-16-2017

The two codes are not equivalent. Consider what happens if userChoice has the value "Oops!".

andrew_4619 · ‎11-16-2017

! or maybe
myOptimiser( FuncSelecterFunc(userchoice) )
! where FuncSelecterFunc returns a function pointer

BTW note Jims comments, but even then I would suggest that 5 x virtually nothing = virtually nothing, I would have thought the more significant times would be elsewhere in the code at does real work. Go with whatever scheme gives the clearest code.

JVanB · ‎11-16-2017

My take on this problem is that most of the work probably gets done in subroutine myOptimizer, and that makes it rather strange that there is any difference in performance between the two methods because myOptimizer sees the same thing passed in either case: the address of procedure func1 (or func2 or func3...). Here is a test:

module N
   use ISO_C_BINDING
   implicit none
   contains
      subroutine myOptimizer(f) bind(C,name='MYOPTIMIZER')
         type(C_FUNPTR), value :: f
         character(20) fmt
         write(fmt,'(*(g0))') '(a,Z0.',bit_size(0_C_INTPTR_T)/4,')'
         write(*,fmt) 'Address passed = ',transfer(f,0_C_INTPTR_T)
      end subroutine myOptimizer
end module N

module M
   use ISO_FORTRAN_ENV, only: wp => REAL64
   implicit none
   contains
      function func1(x)
         real(wp) func1
         real(wp) x
         func1 = 2*x
      end function func1
      function func2(x)
         real(wp) func2
         real(wp) x
         func2 = x**2
      end function func2
      function func3(x)
         real(wp) func3
         real(wp) x
         func3 = 2**x
      end function func3
end module M

program P
   use M
   implicit none
   procedure(func1), pointer :: func
   write(*,'(*(g0))') 'Test with func1:'
   call myOptimizer(func1)
   write(*,'(*(g0))') 'Test with func => func1:'
   func => func1
   call myOptimizer(func)
   write(*,'(*(g0))') 'Test with func2:'
   call myOptimizer(func2)
   write(*,'(*(g0))') 'Test with func => func2:'
   func => func2
   call myOptimizer(func)
   write(*,'(*(g0))') 'Test with func3:'
   call myOptimizer(func3)
   write(*,'(*(g0))') 'Test with func => func3:'
   func => func3
   call myOptimizer(func)
end program P

Output with ifort, 64 bits:

Test with func1:
Address passed = 00007FF7E2671430
Test with func => func1:
Address passed = 00007FF7E2671430
Test with func2:
Address passed = 00007FF7E2671440
Test with func => func2:
Address passed = 00007FF7E2671440
Test with func3:
Address passed = 00007FF7E2671450
Test with func => func3:
Address passed = 00007FF7E2671450

So you can see that myOptimizer got the same address in both cases. Now, if you changed myOptimizer in the second case so that its dummy argument were a procedure pointer rather than a procedure, that might be a little different. But you don't have to do that and I'm going do assume that you probably didn't.

Assuming that your measurement technique was accurate, the only thing I could think of that could change performance like this is if the compiler could see the code for myOptimizer and func1, func2, and func3 so that it could inline func1 within myOptimizer in the first case. You could tell by looking at a disassembly, or even by looking at the size of the *.obj file ifort produces in both cases. If there's inlining the first case should produce a lot of code bloat because there should be something like 4 versions of myOptimizer: one each for func1, func2, and func3, and another for a generic version.

jimdempseyatthecove · ‎11-16-2017

If the myOptimizer function is a dummy do nothing function (that is not optimized out, e.g. an external function not visible to the compiler), then there is a significant difference in the two methods in that the second one (pointer) has a write instruction whereas the first does not (note, depending on where func resides and/or is attributed, it might not be permitted for the compiler optimization to optimize the storage of func into a register (IOW a physical write to memory is performed.

This is another reason why it is important for the posters to post a complete sample program (and include compiler switches, version, etc...)

Jim Dempsey