- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is a follow up to the now closed topic
https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/270998
I'm seeing the same issue using the 2017 initial release on linux. Can anyone tell me how to get the compiler to inline my dense_sse_mul?
icc (ICC) 17.0.0 20160721
Intel(R) Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 17.0.0.098 Build 20160721
The underlying gcc is
gcc (GCC) 4.7.0
Text from ipo_out.optrpt
-> (NOFORCE): (141,12) dense_sse_mul (isz = 22) (sz = 31)
[[ Unable to inline callsite <1>]]
The dense_sse_mul symbol is declared as
void inline dense_sse_mul(const double* A, const double* B, double* C) {
__asm__ __volatile__(" .........................
and is compiled with
icc -c -O3 -xSSE4.2 -ipo -restrict -DNDEBUG sse_5_5_5_DP.c..
The main routine is fortran and contains an interface for dense_sse_mul and is compiled with
ifort -O3 -ipo -inline-factor=1000 -align array32byte
INTERFACE
SUBROUTINE dense_sse_mul(a, b, mm) BIND(C)
!dir$ attributes forceinline :: dense_sse_mul
USE, INTRINSIC :: ISO_C_BINDING, ONLY: C_DOUBLE
IMPLICIT NONE
real(C_DOUBLE), dimension(5,5), intent(in ) :: a
real(C_DOUBLE), dimension(5,5), intent(in ) :: b
real(C_DOUBLE), dimension(5,5), intent( out) :: mm
END SUBROUTINE dense_sse_mul
END INTERFACE
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I do not believe Fortran IPO is capable of inlining C/C++ (with or without) containing assembler.
Have you checked the assembler ouput of writing your inlineable subroutine in Fortran (and placed in module)?
Also, for Fortran consider aligning and declaring dummies in module with alignment requirements. I see you use -align array32byte but this applies to arrays you declare, and not necessary to array dummies declared in subroutines.
Also, since you are targeting SSE instead of AVX, consider dimensioning your arrays as (6,6) and zero filling the extra row and column...
or use (6,5) for a and b, and (5,6) for mm.
I recommend that you not target SSE as AVX and AVX2 is now more predominant.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jim,
Thanks for the suggestions. I have tested an AVX version of the dense_sse_mul routine and found it to be slightly slower than the SSE version, probably due to the slower clock used when the AVX pipes are active. Just to clarify, the underlying code for dense_*_mul is coming from the libxsmm code generator. I'm aware that padding and alignment can impact performance, but I was going to leave those to another round of optimizing.
I'm not exactly sure what "Have you checked the assembler ouput of writing your inlineable subroutine in Fortran (and placed in module)?" means.
The disassembly does show calls to the dense_*_mul routines.
objdump -D a.out | grep dense
404124: e8 a7 1f 00 00 callq 4060d0 <dense_sse_mul>
40420c: e8 2f 1a 00 00 callq 405c40 <dense_avx_mul>
0000000000405c40 <dense_avx_mul>:
405dac: 0f 8c d2 fe ff ff jl 405c84 <dense_avx_mul+0x44>
405ecb: 0f 8c e1 fe ff ff jl 405db2 <dense_avx_mul+0x172>
405ee1: 0f 8c 92 fd ff ff jl 405c79 <dense_avx_mul+0x39>
405fca: 0f 8c 22 ff ff ff jl 405ef2 <dense_avx_mul+0x2b2>
40609e: 0f 8c 2c ff ff ff jl 405fd0 <dense_avx_mul+0x390>
4060b4: 0f 8c 2d fe ff ff jl 405ee7 <dense_avx_mul+0x2a7>
00000000004060d0 <dense_sse_mul>:
406356: 0f 8c b8 fd ff ff jl 406114 <dense_sse_mul+0x44>
40648a: 0f 8c cc fe ff ff jl 40635c <dense_sse_mul+0x28c>
4064a0: 0f 8c 63 fc ff ff jl 406109 <dense_sse_mul+0x39>
406640: 0f 8c 6b fe ff ff jl 4064b1 <dense_sse_mul+0x3e1>
406722: 0f 8c 1e ff ff ff jl 406646 <dense_sse_mul+0x576>
406738: 0f 8c 68 fd ff ff jl 4064a6 <dense_sse_mul+0x3d6>
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>I'm not exactly sure what "Have you checked the assembler ouput of writing your inlineable subroutine in Fortran (and placed in module)?" means.
SUBROUTINE dense_Fortran_mul(a, b, mm) !dir$ attributes forceinline :: dense_Fortran_mul USE, INTRINSIC :: ISO_C_BINDING, ONLY: C_DOUBLE IMPLICIT NONE real(C_DOUBLE), dimension(5,5), intent(in ) :: a real(C_DOUBLE), dimension(5,5), intent(in ) :: b real(C_DOUBLE), dimension(5,5), intent( out) :: mm mm = matmul(a,b) END SUBROUTINE dense_Fortran_mul
Build targeting either sse or AVX/AVX2
Then modify for use with aligned arrays.
The better route would be use MATMUL directly in the application.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
BTW your C/C++ code is not showing the entry and exit point code.
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page