Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.

Fortran 90 Optimisation

I have written a Fortran 90 module to perform a vector cross product with a variety of data types. These routines are called many thousands of times in my software and therefore I would like them to be inlined. However, I am unable to find an compiler option to inline the functions.

I have generated a cut down test case :-

module module_vec3d

interface cross_product
module procedure cross_product_R4_R4
end interface


function cross_product_R4_R4(x, y) result(z)

real :: z(3)
real, intent(in) :: x(3), y(3)

z = (/ x(2) * y(3) - x(3) * y(2), &
x(3) * y(1) - x(1) * y(3), &
x(1) * y(2) - x(2) * y(1) /)
end function cross_product_R4_R4

end module module_vec3d

program try_dp

#ifndef INLINE
use module_vec3d

integer, parameter :: N = 10000
real :: a(3, N), c(3, N)
real :: time_begin, time_end
integer :: i, j

do i = 1, N
a(:, i) = (/ 0.1 + 0.05 * i / N, 0.2 - 0.1 * i / N, 0.3 - 0.05 * i / N /)
c(:,:) = 0.0

call cpu_time(time_begin)
do j = 1, N
do i = 1, N
#ifdef INLINE
c(1, j) = c(1, j) + a(2, i) * a(3, j) - a(3, i) * a(2, j)
c(2, j) = c(2, j) + a(3, i) * a(1, j) - a(1, i) * a(3, j)
c(3, j) = c(3, j) + a(1, i) * a(2, j) - a(2, i) * a(1, j)
c(:, j) = c(:, j) + cross_product(a(:, i), a(:, j))
call cpu_time(time_end)
print *, time_end - time_begin

write(20,*) c

end program try_dp

Using the manual inline code (#ifdef INLINE), the run time is reduced by a factor of 3 on my machine. How can I force the compiler to inline these functions?

P.S. I have downloaded a trial version of Intel Fortran v7 and the performance for the module function is even worse.
0 Kudos
3 Replies
With CVF, you can use the /inline:all option - there is not a way to specify inlining of a specific routine. Note that in CVF, the inlined routine has to be in the same source file as the caller. Even with :all, your routine may not be able to be inlined.

However, it is not clear to me that the compiler can inline a function reference with array section arguments. Is there some reason why you don't just have the whole loop version as a callable routine? Compilers would be much happier with that.


Thank you for your suggestions.

I have found that if the module function is converted to a subroutine and the result is returned in the 3rd argument then the subroutine is inlined by the compiler (using /inline:all). This is moderately inconvenient, since temporary variables are required and it affects the readability when the cross-product is used in long expressions. Why will the compiler not inline a function that returns an array?
This is not my area of expertise, but a routine to be inlined needs to have certain properties - generally those that allow the compiler to, in a figurative sense, treat the routine call as a "macro expansion". If you pass variables that are manipulated in the routine, the compiler can do those same manipulations in inline code. A function that returns a scalar result is pretty much a drop-in replacement, too. But your routine constructs a temporary array on the stack, which the compilers can't just replace with code. Yes, I suppose with more extensive analysis to see what you're doing with the array, something may be possible, but the Fortran semantics of your function call are subtly different from the subroutine or your own inline expansion.

Inlining is not a magic wand - not everything is inlineable.