Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Inlining problems in Fortran

AT
Beginner
770 Views

Given the following code

 

  module test1

  contains

  subroutine INLINE_ME(x,y)

  implicit none

  real, intent(in) :: x,y

  print*, x,y

  end subroutine INLINE_ME

  subroutine CALLER
  implicit none

!dir$ ATTRIBUTES FORCEINLINE :: INLINE_ME
  call INLINE_ME(1.,1.)   ! even when calling withing a module it does not work


  end subroutine


  end module


  program inline_test
  !   use test1   Enable this when using "inline_me"
  implicit none

!!dir$ ATTRIBUTES FORCEINLINE :: INLINE_ME1
!      call inline_me1(1.,1.) ! DOES NOT WORK

!!dir$ ATTRIBUTES FORCEINLINE :: INLINE_ME
!      call inline_me(1.,1.) ! DOES NOT WORK (ERROR is multiple declarations of same name (inline_me)

  contains
  subroutine INLINE_ME1(x,y)

  implicit none

  real, intent(in) :: x,y

  print*, x,y

  end subroutine INLINE_ME1


  end program

I am trying to understand why my inlining does not seem to work. I have tried three different scenarios

a) I call a subroutine from the same module as where the caller is. Example shown in module test1, where CALLER calls INLINE_ME. The error is : undefined reference toinline_me_'`

b) I call the same routine INLINE_ME but from a different place, in this case within the scope of program. Here I get an error saying The attributes of this name conflict with those made accessible by a USE statement

c) I call the routine INLINE_ME1 defined in the program. I get same error as in b)

I have tried to compile as ifort -ipo and without -ipo.

0 Kudos
6 Replies
jimdempseyatthecove
Honored Contributor III
770 Views

When the subroutine/functions visible to the compilation unit as above, place the forceinline at the declaration of the subroutine/function

!  inline_me.f90 
  module test1

  contains

 !dir$ ATTRIBUTES FORCEINLINE :: INLINE_ME
 subroutine INLINE_ME(x,y)

  implicit none

  real, intent(in) :: x,y

  print*, x,y

  end subroutine INLINE_ME

!dir$ ATTRIBUTES FORCEINLINE :: CALLER
  subroutine CALLER
  implicit none

  call INLINE_ME(1.,1.)   ! even when calling withing a module it does not work


  end subroutine


  end module


  program inline_test
  use test1   ! Enable this when using "inline_me"
  implicit none

      call inline_me1(1.,1.) ! DOES NOT WORK

      call inline_me(1.,1.) ! DOES NOT WORK (ERROR is multiple declarations of same name (inline_me)
      
      call CALLER

  contains
!dir$ ATTRIBUTES FORCEINLINE :: INLINE_ME1
  subroutine INLINE_ME1(x,y)

  implicit none

  real, intent(in) :: x,y

  print*, x,y

  end subroutine INLINE_ME1


  end program inline_test
 

in the assembly listing below you can see that between:

MAIN__ PROC

and the end of the MAIN__ PROC (your program inline_test)

.B1.8:: ; Preds .B1.7
; Execution count [1.00e+000]
xor eax, eax ;40.3
add rsp, 240 ;40.3
pop r12 ;40.3
ret ;40.3

That there are no calls to INLINE_ME, INLINE_ME1, nor CALLER, but you do see the "call for_write_seq_lis_xmit" for each of the included (inlined) print statements.

Also following the end of the program inline_test, you will see the out-of-line  callable subroutines (the module routines have the module name, and _mp_ pre-pended to the subroutine name and the internal (to program) INLINE_ME1 has been elided (removed).

Jim Dempsey

; mark_description "Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 19.0.1.144 Bui";
; mark_description "ld 20181018";
; mark_description "/nologo /O2 /module:x64\\Release\\ /object:x64\\Release\\ /Fdx64\\Release\\vc120.pdb /FAs /Fax64\\Release\\ ";
; mark_description "/libs:dll /threads /c /Qlocation,link,C:\\Program Files (x86)\\Microsoft Visual Studio 12.0\\VC\\\\bin\\amd6";
; mark_description "4 /Qm64";
	OPTION DOTNAME
_TEXT	SEGMENT      'CODE'
TXTST0:
; -- Begin  MAIN__
_TEXT	ENDS
_TEXT	SEGMENT      'CODE'
; mark_begin;
       ALIGN     16
	PUBLIC MAIN__
; --- INLINE_TEST
MAIN__	PROC 
.B1.1::                         ; Preds .B1.0
                                ; Execution count [1.00e+000]

;;;   program inline_test

L1::
                                                           ;30.11
        push      r12                                           ;30.11
        sub       rsp, 240                                      ;30.11
        xor       edx, edx                                      ;30.11
        mov       ecx, 3                                        ;30.11
        call      __intel_new_feature_proc_init                 ;30.11
                                ; LOE rbx rbp rsi rdi r13 r14 r15 xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15
.B1.11::                        ; Preds .B1.1
                                ; Execution count [1.00e+000]
        stmxcsr   DWORD PTR [48+rsp]                            ;30.11
        lea       rcx, QWORD PTR [__NLITPACK_1.0.4]             ;30.11
        or        DWORD PTR [48+rsp], 32832                     ;30.11
        ldmxcsr   DWORD PTR [48+rsp]                            ;30.11
        call      for_set_reentrancy                            ;30.11
                                ; LOE rbx rbp rsi rdi r13 r14 r15 xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15
.B1.2::                         ; Preds .B1.11
                                ; Execution count [1.00e+000]

;;;   use test1   ! Enable this when using "inline_me"
;;;   implicit none
;;; 
;;;       call inline_me1(1.,1.) ! DOES NOT WORK

        mov       r10, rsp                                      ;34.12
        lea       rcx, QWORD PTR [48+rsp]                       ;34.12
        mov       edx, -1                                       ;34.12
        mov       r8, 01208384ff00H                             ;34.12
        lea       r9, QWORD PTR [__STRLITPACK_2.0.5]            ;34.12
        mov       r12d, 1065353216                              ;34.12
        mov       QWORD PTR [rcx], 0                            ;34.12
        lea       rax, QWORD PTR [192+rsp]                      ;34.12
        mov       DWORD PTR [rax], r12d                         ;34.12
        mov       QWORD PTR [32+r10], rax                       ;34.12
        call      for_write_seq_lis                             ;34.12
                                ; LOE rbx rbp rsi rdi r13 r14 r15 r12d xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15
.B1.3::                         ; Preds .B1.2
                                ; Execution count [1.00e+000]
        lea       rdx, QWORD PTR [__STRLITPACK_3.0.5]           ;34.12
        lea       rcx, QWORD PTR [48+rsp]                       ;34.12
        mov       DWORD PTR [152+rcx], r12d                     ;34.12
        lea       r8, QWORD PTR [200+rsp]                       ;34.12
        call      for_write_seq_lis_xmit                        ;34.12
                                ; LOE rbx rbp rsi rdi r13 r14 r15 r12d xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15
.B1.4::                         ; Preds .B1.3
                                ; Execution count [1.00e+000]

;;; 
;;;       call inline_me(1.,1.) ! DOES NOT WORK (ERROR is multiple declarations of same name (inline_me)

        mov       r10, rsp                                      ;36.12
        lea       rcx, QWORD PTR [96+rsp]                       ;36.12
        mov       edx, -1                                       ;36.12
        mov       r8, 01208384ff00H                             ;36.12
        lea       r9, QWORD PTR [__STRLITPACK_0.0.2]            ;36.12
        lea       rax, QWORD PTR [208+rsp]                      ;36.12
        mov       QWORD PTR [-112+rax], 0                       ;36.12
        mov       DWORD PTR [rax], r12d                         ;36.12
        mov       QWORD PTR [32+r10], rax                       ;36.12
        call      for_write_seq_lis                             ;36.12
                                ; LOE rbx rbp rsi rdi r13 r14 r15 r12d xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15
.B1.5::                         ; Preds .B1.4
                                ; Execution count [1.00e+000]
        lea       rdx, QWORD PTR [__STRLITPACK_1.0.2]           ;36.12
        lea       rcx, QWORD PTR [96+rsp]                       ;36.12
        mov       DWORD PTR [120+rcx], r12d                     ;36.12
        lea       r8, QWORD PTR [216+rsp]                       ;36.12
        call      for_write_seq_lis_xmit                        ;36.12
                                ; LOE rbx rbp rsi rdi r13 r14 r15 r12d xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15
.B1.6::                         ; Preds .B1.5
                                ; Execution count [1.00e+000]

;;;       
;;;       call CALLER

        mov       r10, rsp                                      ;38.12
        lea       rcx, QWORD PTR [144+rsp]                      ;38.12
        mov       edx, -1                                       ;38.12
        mov       r8, 01208384ff00H                             ;38.12
        lea       r9, QWORD PTR [__STRLITPACK_0.0.2]            ;38.12
        lea       rax, QWORD PTR [224+rsp]                      ;38.12
        mov       QWORD PTR [-80+rax], 0                        ;38.12
        mov       DWORD PTR [rax], r12d                         ;38.12
        mov       QWORD PTR [32+r10], rax                       ;38.12
        call      for_write_seq_lis                             ;38.12
                                ; LOE rbx rbp rsi rdi r13 r14 r15 r12d xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15
.B1.7::                         ; Preds .B1.6
                                ; Execution count [1.00e+000]
        lea       rdx, QWORD PTR [__STRLITPACK_1.0.2]           ;38.12
        lea       rcx, QWORD PTR [144+rsp]                      ;38.12
        mov       DWORD PTR [88+rcx], r12d                      ;38.12
        lea       r8, QWORD PTR [232+rsp]                       ;38.12
        call      for_write_seq_lis_xmit                        ;38.12
                                ; LOE rbx rbp rsi rdi r13 r14 r15 xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15
.B1.8::                         ; Preds .B1.7
                                ; Execution count [1.00e+000]

;;; 
;;;   contains

        xor       eax, eax                                      ;40.3
        add       rsp, 240                                      ;40.3
        pop       r12                                           ;40.3
        ret                                                     ;40.3
        ALIGN     16
                                ; LOE
.B1.9::
; mark_end;
MAIN__ ENDP
_TEXT	ENDS
.xdata	SEGMENT  DWORD   READ  ''
	ALIGN 004H
.unwind.MAIN__.B1_B8	DD	198913
	DD	1966345
	DD	49154
.xdata	ENDS
.pdata	SEGMENT  DWORD   READ  ''
	ALIGN 004H
.pdata.MAIN__.B1_B8	DD	imagerel .B1.1
	DD	imagerel .B1.9
	DD	imagerel .unwind.MAIN__.B1_B8
.pdata	ENDS
_RDATA	SEGMENT     READ  'DATA'
__NLITPACK_1.0.4	DD	2
__STRLITPACK_2.0.5	DD	131354
	DB	0
	DB 3 DUP ( 0H)	; pad
__STRLITPACK_3.0.5	DD	65818
	DB	0
_RDATA	ENDS
_DATA	SEGMENT      'DATA'
_DATA	ENDS
; -- End  MAIN__
_TEXT	SEGMENT      'CODE'
; -- Begin  TEST1$
_TEXT	ENDS
_TEXT	SEGMENT      'CODE'
; mark_begin;
       ALIGN     16
	PUBLIC TEST1$
TEST1$	PROC 
.B2.1::                         ; Preds .B2.0
                                ; Execution count [1.00e+000]

;;;   module test1

L2::
                                                           ;2.10
        ret                                                     ;2.10
        ALIGN     16
                                ; LOE
.B2.2::
; mark_end;
TEST1$ ENDP
_TEXT	ENDS
_DATA	SEGMENT      'DATA'
_DATA	ENDS
; -- End  TEST1$
_TEXT	SEGMENT      'CODE'
; -- Begin  TEST1_mp_INLINE_ME
_TEXT	ENDS
_TEXT	SEGMENT      'CODE'
; mark_begin;
       ALIGN     16
	PUBLIC TEST1_mp_INLINE_ME
; --- INLINE_ME
TEST1_mp_INLINE_ME	PROC 
; parameter 1: rcx
; parameter 2: rdx
.B3.1::                         ; Preds .B3.0
                                ; Execution count [1.00e+000]

;;;  subroutine INLINE_ME(x,y)

L3::
                                                           ;7.13
        push      r14                                           ;7.13
        sub       rsp, 112                                      ;7.13
        mov       r14, rdx                                      ;7.13

;;; 
;;;   implicit none
;;; 
;;;   real, intent(in) :: x,y
;;; 
;;;   print*, x,y

        mov       r11, rsp                                      ;13.3
        mov       edx, -1                                       ;13.3
        mov       eax, DWORD PTR [rcx]                          ;13.3
        lea       rcx, QWORD PTR [48+rsp]                       ;13.3
        mov       r8, 01208384ff00H                             ;13.3
        lea       r9, QWORD PTR [__STRLITPACK_0.0.2]            ;13.3
        mov       QWORD PTR [rcx], 0                            ;13.3
        lea       r10, QWORD PTR [96+rsp]                       ;13.3
        mov       DWORD PTR [48+rcx], eax                       ;13.3
        mov       QWORD PTR [32+r11], r10                       ;13.3
        call      for_write_seq_lis                             ;13.3
                                ; LOE rbx rbp rsi rdi r12 r13 r14 r15 xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15
.B3.2::                         ; Preds .B3.1
                                ; Execution count [1.00e+000]
        mov       eax, DWORD PTR [r14]                          ;13.3
        lea       rcx, QWORD PTR [48+rsp]                       ;13.3
        lea       rdx, QWORD PTR [__STRLITPACK_1.0.2]           ;13.3
        lea       r8, QWORD PTR [104+rsp]                       ;13.3
        mov       DWORD PTR [56+rcx], eax                       ;13.3
        call      for_write_seq_lis_xmit                        ;13.3
                                ; LOE rbx rbp rsi rdi r12 r13 r15 xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15
.B3.3::                         ; Preds .B3.2
                                ; Execution count [1.00e+000]

;;; 
;;;   end subroutine INLINE_ME

        add       rsp, 112                                      ;15.3
        pop       r14                                           ;15.3
        ret                                                     ;15.3
        ALIGN     16
                                ; LOE
.B3.4::
; mark_end;
TEST1_mp_INLINE_ME ENDP
_TEXT	ENDS
.xdata	SEGMENT  DWORD   READ  ''
	ALIGN 004H
.unwind.TEST1_mp_INLINE_ME.B1_B3	DD	132609
	DD	3758281222
.xdata	ENDS
.pdata	SEGMENT  DWORD   READ  ''
	ALIGN 004H
.pdata.TEST1_mp_INLINE_ME.B1_B3	DD	imagerel .B3.1
	DD	imagerel .B3.4
	DD	imagerel .unwind.TEST1_mp_INLINE_ME.B1_B3
.pdata	ENDS
_DATA	SEGMENT      'DATA'
_DATA	ENDS
; -- End  TEST1_mp_INLINE_ME
_TEXT	SEGMENT      'CODE'
; -- Begin  TEST1_mp_CALLER
_TEXT	ENDS
_TEXT	SEGMENT      'CODE'
; mark_begin;
       ALIGN     16
	PUBLIC TEST1_mp_CALLER
; --- CALLER
TEST1_mp_CALLER	PROC 
.B4.1::                         ; Preds .B4.0
                                ; Execution count [1.00e+000]

;;;   subroutine CALLER

L4::
                                                           ;18.14
        push      rsi                                           ;18.14
        sub       rsp, 112                                      ;18.14

;;;   implicit none
;;; 
;;;   call INLINE_ME(1.,1.)   ! even when calling withing a module it does not work

        mov       edx, -1                                       ;21.8
        mov       r10, rsp                                      ;21.8
        lea       rcx, QWORD PTR [48+rsp]                       ;21.8
        mov       r8, 01208384ff00H                             ;21.8
        lea       r9, QWORD PTR [__STRLITPACK_0.0.2]            ;21.8
        mov       esi, 1065353216                               ;21.8
        lea       rax, QWORD PTR [96+rsp]                       ;21.8
        mov       QWORD PTR [-48+rax], 0                        ;21.8
        mov       DWORD PTR [rax], esi                          ;21.8
        mov       QWORD PTR [32+r10], rax                       ;21.8
        call      for_write_seq_lis                             ;21.8
                                ; LOE rbx rbp rdi r12 r13 r14 r15 esi xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15
.B4.2::                         ; Preds .B4.1
                                ; Execution count [1.00e+000]
        lea       rdx, QWORD PTR [__STRLITPACK_1.0.2]           ;21.8
        lea       rcx, QWORD PTR [48+rsp]                       ;21.8
        mov       DWORD PTR [56+rcx], esi                       ;21.8
        lea       r8, QWORD PTR [104+rsp]                       ;21.8
        call      for_write_seq_lis_xmit                        ;21.8
                                ; LOE rbx rbp rdi r12 r13 r14 r15 xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15
.B4.3::                         ; Preds .B4.2
                                ; Execution count [1.00e+000]

;;; 
;;; 
;;;   end subroutine

        add       rsp, 112                                      ;24.3
        pop       rsi                                           ;24.3
        ret                                                     ;24.3
        ALIGN     16
                                ; LOE
.B4.4::
; mark_end;
TEST1_mp_CALLER ENDP
_TEXT	ENDS
.xdata	SEGMENT  DWORD   READ  ''
	ALIGN 004H
.unwind.TEST1_mp_CALLER.B1_B3	DD	132353
	DD	1610732037
.xdata	ENDS
.pdata	SEGMENT  DWORD   READ  ''
	ALIGN 004H
.pdata.TEST1_mp_CALLER.B1_B3	DD	imagerel .B4.1
	DD	imagerel .B4.4
	DD	imagerel .unwind.TEST1_mp_CALLER.B1_B3
.pdata	ENDS
_DATA	SEGMENT      'DATA'
_DATA	ENDS
; -- End  TEST1_mp_CALLER
_RDATA	SEGMENT     READ  'DATA'
	DB 3 DUP ( 0H)	; pad
_2il0floatpacket.0	DD	03f800000H
__STRLITPACK_0.0.2	DD	131354
	DB	0
	DB 3 DUP ( 0H)	; pad
__STRLITPACK_1.0.2	DD	65818
	DB	0
_RDATA	ENDS
_DATA	SEGMENT      'DATA'
_DATA	ENDS
EXTRN	for_set_reentrancy:PROC
EXTRN	for_write_seq_lis_xmit:PROC
EXTRN	for_write_seq_lis:PROC
EXTRN	__intel_new_feature_proc_init:PROC
EXTRN	__ImageBase:PROC
EXTRN	_fltused:BYTE
	INCLUDELIB <ifconsol>
	INCLUDELIB <libifcoremd>
	INCLUDELIB <libifportmd>
	INCLUDELIB <libmmd>
	INCLUDELIB <MSVCRT>
	INCLUDELIB <libirc>
	INCLUDELIB <svml_dispmd>
	INCLUDELIB <OLDNAMES>
	END

 

0 Kudos
AT
Beginner
770 Views

Thanks for the reply Jim.

I am a bit confused now honestly. When shall we put the INLINE directive at the function definition and when to have it in front of the call?

0 Kudos
jimdempseyatthecove
Honored Contributor III
770 Views

After reviewing the code sample and documentation

You can use

!dir$ forceinline
call inline_me

Or use the !dir$ attributes forceinline :: procname, however it appears that the attribute variation can only be used with the subroutine/function declaration (or its interface declaration).

Sorry for the run around.

Jim Dempsey

0 Kudos
AT
Beginner
770 Views

Ohhh. That was it. Cheers Jim !

Quick thing,  I saw that the flag -ipo might be necessary when the routine is defined in a different module. Is that correct understood that -ipo must be enabled in compilation and linking if the compiler will have a chance to inline a routine defined in module A and being called from Module B.

0 Kudos
jimdempseyatthecove
Honored Contributor III
770 Views

The compiler can generate optimization diagnostic information. One of which is the Compiler Inline Report. You can use that to verify, or as a more proof of the pudding, use VTune to generate stats on the runtime, then open the routine and look at the Disassembly.

Keep in mind that depending on compiler optimization switches, that the code generated may have multiple-paths (e.g. one for SSE, one for AVX, etc...). The compiler may or may not inline depending on path taken (although forceinline should do the inline, as opposed to !dir$ inline being a "you would like it to inline if possible").

The VTune or Dissassembly/Assembly Listing is more of an absolute proof.

Jim Dempsey

0 Kudos
AT
Beginner
770 Views

Thanks Jim, I have so far being using the optimisation report to check for inlining. I should probably learn to read assembly code better as you mentioned that this is the ultimate proof.

 

Thanks again for your help

0 Kudos
Reply