- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Given the following code
module test1 contains subroutine INLINE_ME(x,y) implicit none real, intent(in) :: x,y print*, x,y end subroutine INLINE_ME subroutine CALLER implicit none !dir$ ATTRIBUTES FORCEINLINE :: INLINE_ME call INLINE_ME(1.,1.) ! even when calling withing a module it does not work end subroutine end module program inline_test ! use test1 Enable this when using "inline_me" implicit none !!dir$ ATTRIBUTES FORCEINLINE :: INLINE_ME1 ! call inline_me1(1.,1.) ! DOES NOT WORK !!dir$ ATTRIBUTES FORCEINLINE :: INLINE_ME ! call inline_me(1.,1.) ! DOES NOT WORK (ERROR is multiple declarations of same name (inline_me) contains subroutine INLINE_ME1(x,y) implicit none real, intent(in) :: x,y print*, x,y end subroutine INLINE_ME1 end program
I am trying to understand why my inlining does not seem to work. I have tried three different scenarios
a) I call a subroutine from the same module as where the caller is. Example shown in module test1, where CALLER calls INLINE_ME. The error is : undefined reference toinline_me_'`
b) I call the same routine INLINE_ME but from a different place, in this case within the scope of program. Here I get an error saying The attributes of this name conflict with those made accessible by a USE statement
c) I call the routine INLINE_ME1 defined in the program. I get same error as in b)
I have tried to compile as ifort -ipo and without -ipo.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When the subroutine/functions visible to the compilation unit as above, place the forceinline at the declaration of the subroutine/function
! inline_me.f90 module test1 contains !dir$ ATTRIBUTES FORCEINLINE :: INLINE_ME subroutine INLINE_ME(x,y) implicit none real, intent(in) :: x,y print*, x,y end subroutine INLINE_ME !dir$ ATTRIBUTES FORCEINLINE :: CALLER subroutine CALLER implicit none call INLINE_ME(1.,1.) ! even when calling withing a module it does not work end subroutine end module program inline_test use test1 ! Enable this when using "inline_me" implicit none call inline_me1(1.,1.) ! DOES NOT WORK call inline_me(1.,1.) ! DOES NOT WORK (ERROR is multiple declarations of same name (inline_me) call CALLER contains !dir$ ATTRIBUTES FORCEINLINE :: INLINE_ME1 subroutine INLINE_ME1(x,y) implicit none real, intent(in) :: x,y print*, x,y end subroutine INLINE_ME1 end program inline_test
in the assembly listing below you can see that between:
MAIN__ PROC
and the end of the MAIN__ PROC (your program inline_test)
.B1.8:: ; Preds .B1.7
; Execution count [1.00e+000]
xor eax, eax ;40.3
add rsp, 240 ;40.3
pop r12 ;40.3
ret ;40.3
That there are no calls to INLINE_ME, INLINE_ME1, nor CALLER, but you do see the "call for_write_seq_lis_xmit" for each of the included (inlined) print statements.
Also following the end of the program inline_test, you will see the out-of-line callable subroutines (the module routines have the module name, and _mp_ pre-pended to the subroutine name and the internal (to program) INLINE_ME1 has been elided (removed).
Jim Dempsey
; mark_description "Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 19.0.1.144 Bui"; ; mark_description "ld 20181018"; ; mark_description "/nologo /O2 /module:x64\\Release\\ /object:x64\\Release\\ /Fdx64\\Release\\vc120.pdb /FAs /Fax64\\Release\\ "; ; mark_description "/libs:dll /threads /c /Qlocation,link,C:\\Program Files (x86)\\Microsoft Visual Studio 12.0\\VC\\\\bin\\amd6"; ; mark_description "4 /Qm64"; OPTION DOTNAME _TEXT SEGMENT 'CODE' TXTST0: ; -- Begin MAIN__ _TEXT ENDS _TEXT SEGMENT 'CODE' ; mark_begin; ALIGN 16 PUBLIC MAIN__ ; --- INLINE_TEST MAIN__ PROC .B1.1:: ; Preds .B1.0 ; Execution count [1.00e+000] ;;; program inline_test L1:: ;30.11 push r12 ;30.11 sub rsp, 240 ;30.11 xor edx, edx ;30.11 mov ecx, 3 ;30.11 call __intel_new_feature_proc_init ;30.11 ; LOE rbx rbp rsi rdi r13 r14 r15 xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15 .B1.11:: ; Preds .B1.1 ; Execution count [1.00e+000] stmxcsr DWORD PTR [48+rsp] ;30.11 lea rcx, QWORD PTR [__NLITPACK_1.0.4] ;30.11 or DWORD PTR [48+rsp], 32832 ;30.11 ldmxcsr DWORD PTR [48+rsp] ;30.11 call for_set_reentrancy ;30.11 ; LOE rbx rbp rsi rdi r13 r14 r15 xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15 .B1.2:: ; Preds .B1.11 ; Execution count [1.00e+000] ;;; use test1 ! Enable this when using "inline_me" ;;; implicit none ;;; ;;; call inline_me1(1.,1.) ! DOES NOT WORK mov r10, rsp ;34.12 lea rcx, QWORD PTR [48+rsp] ;34.12 mov edx, -1 ;34.12 mov r8, 01208384ff00H ;34.12 lea r9, QWORD PTR [__STRLITPACK_2.0.5] ;34.12 mov r12d, 1065353216 ;34.12 mov QWORD PTR [rcx], 0 ;34.12 lea rax, QWORD PTR [192+rsp] ;34.12 mov DWORD PTR [rax], r12d ;34.12 mov QWORD PTR [32+r10], rax ;34.12 call for_write_seq_lis ;34.12 ; LOE rbx rbp rsi rdi r13 r14 r15 r12d xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15 .B1.3:: ; Preds .B1.2 ; Execution count [1.00e+000] lea rdx, QWORD PTR [__STRLITPACK_3.0.5] ;34.12 lea rcx, QWORD PTR [48+rsp] ;34.12 mov DWORD PTR [152+rcx], r12d ;34.12 lea r8, QWORD PTR [200+rsp] ;34.12 call for_write_seq_lis_xmit ;34.12 ; LOE rbx rbp rsi rdi r13 r14 r15 r12d xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15 .B1.4:: ; Preds .B1.3 ; Execution count [1.00e+000] ;;; ;;; call inline_me(1.,1.) ! DOES NOT WORK (ERROR is multiple declarations of same name (inline_me) mov r10, rsp ;36.12 lea rcx, QWORD PTR [96+rsp] ;36.12 mov edx, -1 ;36.12 mov r8, 01208384ff00H ;36.12 lea r9, QWORD PTR [__STRLITPACK_0.0.2] ;36.12 lea rax, QWORD PTR [208+rsp] ;36.12 mov QWORD PTR [-112+rax], 0 ;36.12 mov DWORD PTR [rax], r12d ;36.12 mov QWORD PTR [32+r10], rax ;36.12 call for_write_seq_lis ;36.12 ; LOE rbx rbp rsi rdi r13 r14 r15 r12d xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15 .B1.5:: ; Preds .B1.4 ; Execution count [1.00e+000] lea rdx, QWORD PTR [__STRLITPACK_1.0.2] ;36.12 lea rcx, QWORD PTR [96+rsp] ;36.12 mov DWORD PTR [120+rcx], r12d ;36.12 lea r8, QWORD PTR [216+rsp] ;36.12 call for_write_seq_lis_xmit ;36.12 ; LOE rbx rbp rsi rdi r13 r14 r15 r12d xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15 .B1.6:: ; Preds .B1.5 ; Execution count [1.00e+000] ;;; ;;; call CALLER mov r10, rsp ;38.12 lea rcx, QWORD PTR [144+rsp] ;38.12 mov edx, -1 ;38.12 mov r8, 01208384ff00H ;38.12 lea r9, QWORD PTR [__STRLITPACK_0.0.2] ;38.12 lea rax, QWORD PTR [224+rsp] ;38.12 mov QWORD PTR [-80+rax], 0 ;38.12 mov DWORD PTR [rax], r12d ;38.12 mov QWORD PTR [32+r10], rax ;38.12 call for_write_seq_lis ;38.12 ; LOE rbx rbp rsi rdi r13 r14 r15 r12d xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15 .B1.7:: ; Preds .B1.6 ; Execution count [1.00e+000] lea rdx, QWORD PTR [__STRLITPACK_1.0.2] ;38.12 lea rcx, QWORD PTR [144+rsp] ;38.12 mov DWORD PTR [88+rcx], r12d ;38.12 lea r8, QWORD PTR [232+rsp] ;38.12 call for_write_seq_lis_xmit ;38.12 ; LOE rbx rbp rsi rdi r13 r14 r15 xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15 .B1.8:: ; Preds .B1.7 ; Execution count [1.00e+000] ;;; ;;; contains xor eax, eax ;40.3 add rsp, 240 ;40.3 pop r12 ;40.3 ret ;40.3 ALIGN 16 ; LOE .B1.9:: ; mark_end; MAIN__ ENDP _TEXT ENDS .xdata SEGMENT DWORD READ '' ALIGN 004H .unwind.MAIN__.B1_B8 DD 198913 DD 1966345 DD 49154 .xdata ENDS .pdata SEGMENT DWORD READ '' ALIGN 004H .pdata.MAIN__.B1_B8 DD imagerel .B1.1 DD imagerel .B1.9 DD imagerel .unwind.MAIN__.B1_B8 .pdata ENDS _RDATA SEGMENT READ 'DATA' __NLITPACK_1.0.4 DD 2 __STRLITPACK_2.0.5 DD 131354 DB 0 DB 3 DUP ( 0H) ; pad __STRLITPACK_3.0.5 DD 65818 DB 0 _RDATA ENDS _DATA SEGMENT 'DATA' _DATA ENDS ; -- End MAIN__ _TEXT SEGMENT 'CODE' ; -- Begin TEST1$ _TEXT ENDS _TEXT SEGMENT 'CODE' ; mark_begin; ALIGN 16 PUBLIC TEST1$ TEST1$ PROC .B2.1:: ; Preds .B2.0 ; Execution count [1.00e+000] ;;; module test1 L2:: ;2.10 ret ;2.10 ALIGN 16 ; LOE .B2.2:: ; mark_end; TEST1$ ENDP _TEXT ENDS _DATA SEGMENT 'DATA' _DATA ENDS ; -- End TEST1$ _TEXT SEGMENT 'CODE' ; -- Begin TEST1_mp_INLINE_ME _TEXT ENDS _TEXT SEGMENT 'CODE' ; mark_begin; ALIGN 16 PUBLIC TEST1_mp_INLINE_ME ; --- INLINE_ME TEST1_mp_INLINE_ME PROC ; parameter 1: rcx ; parameter 2: rdx .B3.1:: ; Preds .B3.0 ; Execution count [1.00e+000] ;;; subroutine INLINE_ME(x,y) L3:: ;7.13 push r14 ;7.13 sub rsp, 112 ;7.13 mov r14, rdx ;7.13 ;;; ;;; implicit none ;;; ;;; real, intent(in) :: x,y ;;; ;;; print*, x,y mov r11, rsp ;13.3 mov edx, -1 ;13.3 mov eax, DWORD PTR [rcx] ;13.3 lea rcx, QWORD PTR [48+rsp] ;13.3 mov r8, 01208384ff00H ;13.3 lea r9, QWORD PTR [__STRLITPACK_0.0.2] ;13.3 mov QWORD PTR [rcx], 0 ;13.3 lea r10, QWORD PTR [96+rsp] ;13.3 mov DWORD PTR [48+rcx], eax ;13.3 mov QWORD PTR [32+r11], r10 ;13.3 call for_write_seq_lis ;13.3 ; LOE rbx rbp rsi rdi r12 r13 r14 r15 xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15 .B3.2:: ; Preds .B3.1 ; Execution count [1.00e+000] mov eax, DWORD PTR [r14] ;13.3 lea rcx, QWORD PTR [48+rsp] ;13.3 lea rdx, QWORD PTR [__STRLITPACK_1.0.2] ;13.3 lea r8, QWORD PTR [104+rsp] ;13.3 mov DWORD PTR [56+rcx], eax ;13.3 call for_write_seq_lis_xmit ;13.3 ; LOE rbx rbp rsi rdi r12 r13 r15 xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15 .B3.3:: ; Preds .B3.2 ; Execution count [1.00e+000] ;;; ;;; end subroutine INLINE_ME add rsp, 112 ;15.3 pop r14 ;15.3 ret ;15.3 ALIGN 16 ; LOE .B3.4:: ; mark_end; TEST1_mp_INLINE_ME ENDP _TEXT ENDS .xdata SEGMENT DWORD READ '' ALIGN 004H .unwind.TEST1_mp_INLINE_ME.B1_B3 DD 132609 DD 3758281222 .xdata ENDS .pdata SEGMENT DWORD READ '' ALIGN 004H .pdata.TEST1_mp_INLINE_ME.B1_B3 DD imagerel .B3.1 DD imagerel .B3.4 DD imagerel .unwind.TEST1_mp_INLINE_ME.B1_B3 .pdata ENDS _DATA SEGMENT 'DATA' _DATA ENDS ; -- End TEST1_mp_INLINE_ME _TEXT SEGMENT 'CODE' ; -- Begin TEST1_mp_CALLER _TEXT ENDS _TEXT SEGMENT 'CODE' ; mark_begin; ALIGN 16 PUBLIC TEST1_mp_CALLER ; --- CALLER TEST1_mp_CALLER PROC .B4.1:: ; Preds .B4.0 ; Execution count [1.00e+000] ;;; subroutine CALLER L4:: ;18.14 push rsi ;18.14 sub rsp, 112 ;18.14 ;;; implicit none ;;; ;;; call INLINE_ME(1.,1.) ! even when calling withing a module it does not work mov edx, -1 ;21.8 mov r10, rsp ;21.8 lea rcx, QWORD PTR [48+rsp] ;21.8 mov r8, 01208384ff00H ;21.8 lea r9, QWORD PTR [__STRLITPACK_0.0.2] ;21.8 mov esi, 1065353216 ;21.8 lea rax, QWORD PTR [96+rsp] ;21.8 mov QWORD PTR [-48+rax], 0 ;21.8 mov DWORD PTR [rax], esi ;21.8 mov QWORD PTR [32+r10], rax ;21.8 call for_write_seq_lis ;21.8 ; LOE rbx rbp rdi r12 r13 r14 r15 esi xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15 .B4.2:: ; Preds .B4.1 ; Execution count [1.00e+000] lea rdx, QWORD PTR [__STRLITPACK_1.0.2] ;21.8 lea rcx, QWORD PTR [48+rsp] ;21.8 mov DWORD PTR [56+rcx], esi ;21.8 lea r8, QWORD PTR [104+rsp] ;21.8 call for_write_seq_lis_xmit ;21.8 ; LOE rbx rbp rdi r12 r13 r14 r15 xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm14 xmm15 .B4.3:: ; Preds .B4.2 ; Execution count [1.00e+000] ;;; ;;; ;;; end subroutine add rsp, 112 ;24.3 pop rsi ;24.3 ret ;24.3 ALIGN 16 ; LOE .B4.4:: ; mark_end; TEST1_mp_CALLER ENDP _TEXT ENDS .xdata SEGMENT DWORD READ '' ALIGN 004H .unwind.TEST1_mp_CALLER.B1_B3 DD 132353 DD 1610732037 .xdata ENDS .pdata SEGMENT DWORD READ '' ALIGN 004H .pdata.TEST1_mp_CALLER.B1_B3 DD imagerel .B4.1 DD imagerel .B4.4 DD imagerel .unwind.TEST1_mp_CALLER.B1_B3 .pdata ENDS _DATA SEGMENT 'DATA' _DATA ENDS ; -- End TEST1_mp_CALLER _RDATA SEGMENT READ 'DATA' DB 3 DUP ( 0H) ; pad _2il0floatpacket.0 DD 03f800000H __STRLITPACK_0.0.2 DD 131354 DB 0 DB 3 DUP ( 0H) ; pad __STRLITPACK_1.0.2 DD 65818 DB 0 _RDATA ENDS _DATA SEGMENT 'DATA' _DATA ENDS EXTRN for_set_reentrancy:PROC EXTRN for_write_seq_lis_xmit:PROC EXTRN for_write_seq_lis:PROC EXTRN __intel_new_feature_proc_init:PROC EXTRN __ImageBase:PROC EXTRN _fltused:BYTE INCLUDELIB <ifconsol> INCLUDELIB <libifcoremd> INCLUDELIB <libifportmd> INCLUDELIB <libmmd> INCLUDELIB <MSVCRT> INCLUDELIB <libirc> INCLUDELIB <svml_dispmd> INCLUDELIB <OLDNAMES> END
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the reply Jim.
I am a bit confused now honestly. When shall we put the INLINE directive at the function definition and when to have it in front of the call?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
After reviewing the code sample and documentation
You can use
!dir$ forceinline
call inline_me
Or use the !dir$ attributes forceinline :: procname, however it appears that the attribute variation can only be used with the subroutine/function declaration (or its interface declaration).
Sorry for the run around.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ohhh. That was it. Cheers Jim !
Quick thing, I saw that the flag -ipo might be necessary when the routine is defined in a different module. Is that correct understood that -ipo must be enabled in compilation and linking if the compiler will have a chance to inline a routine defined in module A and being called from Module B.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The compiler can generate optimization diagnostic information. One of which is the Compiler Inline Report. You can use that to verify, or as a more proof of the pudding, use VTune to generate stats on the runtime, then open the routine and look at the Disassembly.
Keep in mind that depending on compiler optimization switches, that the code generated may have multiple-paths (e.g. one for SSE, one for AVX, etc...). The compiler may or may not inline depending on path taken (although forceinline should do the inline, as opposed to !dir$ inline being a "you would like it to inline if possible").
The VTune or Dissassembly/Assembly Listing is more of an absolute proof.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Jim, I have so far being using the optimisation report to check for inlining. I should probably learn to read assembly code better as you mentioned that this is the ultimate proof.
Thanks again for your help

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page