Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

templates

jimdempseyatthecove
Honored Contributor III
731 Views
Steve,

Are templates on the horizon for IVF?

I am in the process of making variations on the theme customized for

no-SSE
SSE (2-up vectors of doubles)
AVX (4-up vectors of doubles)
AVX (3-up vectors of doubles)

Using a different type in the same template would be quite handy.
I am debating on using #define and FPP. I use this technique in the declaration of the large user defined types), but assuring the debugger syncs to the line number is at times problematic (don't know if templates would be any better--- but they do a good jobin C++)

I was pleasantly pleased with the 3-up doubles
[fortran] type TypeYMM02 SEQUENCE real(8) :: v(0:2) end type TypeYMM02 ... type (TypeYMM02) :: temp02 type (TypeYMM02) :: ArrayOf3wide(nnn) type (TypeYMM02) :: ArrayOf3wide2d(1:3,nnn) ! X1, Y1, Z1, X2, Y2, Z2, ... ... !DEC$ VECTOR ALWAYS do i = 0,3 if(temp02.v(i) .eq. 0.0_8) temp02.v(i) = 1.0_8 end do ;;; !DEC$ VECTOR ALWAYS ;;; do i = 0,3 ;;; if(temp02.v(i) .eq. 0.0_8) temp02.v(i) = 1.0_8 vmovupd ymm1, YMMWORD PTR [TESTAVX$TEMP02.0.2] ;103.12 ;;; end do lea rcx, QWORD PTR [208+rsp] ;105.5 vxorpd ymm0, ymm0, ymm0 ;103.26 mov edx, -1 ;105.5 vcmpeqpd ymm2, ymm1, ymm0 ;103.26 mov r8, 0208384ff00H ;105.5 vblendvpd ymm3, ymm1, YMMWORD PTR [_2il0floatpacket.60], ymm2 ;103.38 lea r9, QWORD PTR [__STRLITPACK_13.0.2] ;105.5 vmovupd YMMWORD PTR [TESTAVX$TEMP02.0.2], ymm3 ;103.38 [/fortran]

N.B. Note the use of vcmpeqpdto construct the mask and the vblendvpd to conditionally merge.
The above code optimizes to no loop and all inline.

The learcx, movedx, mov r8, and lear9, are all future statements interlieved into the code.
Cleaning up those statements we get

;;; !DEC$ VECTOR ALWAYS
;;; do i = 0,3
;;; if(temp02.v(i) .eq. 0.0_8) temp02.v(i) = 1.0_8
vmovupd ymm1, YMMWORD PTR [TESTAVX$TEMP02.0.2] ;103.12
;;; end do
vxorpd ymm0, ymm0, ymm0 ;103.26
vcmpeqpd ymm2, ymm1, ymm0 ;103.26
vblendvpd ymm3, ymm1, YMMWORD PTR [_2il0floatpacket.60], ymm2 ;103.38
vmovupd YMMWORD PTR [TESTAVX$TEMP02.0.2], ymm3 ;103.38

Five instructions, no branches.

I use this to consruct a mask and thus avoid branching.

I want to commend your compiler writers at an excellentjob of optimizations.

Jim Dempsey
0 Kudos
3 Replies
Steven_L_Intel1
Employee
731 Views
Jim, I am not sure what you mean by templates here. Parameterized derived types? Something else?
0 Kudos
JVanB
Valued Contributor II
731 Views
I don't understand how the above compiles. The code references temp02%v(3). Template code is not as much of a problem in Fortran if all the instantiations are user-defined types as it can be if user-defined types or intrinsic types are both possible. You can create templates with USE and rename and include.
0 Kudos
jimdempseyatthecove
Honored Contributor III
731 Views
templates

Fortran support for templates as in C++0x templates

Right now I could use the FPP

[fortran]module MOD_interfaces ... interface GetBBTUT function GetBBTUT_r(pFiniteSolution, SegmentNumber) result use mod_foss real :: r(3) type(TypeFiniteSolution) :: pFiniteSolution integer :: SegmentNumber end function GetBBTUT_r #ifdef _Use_AVX function GetBBTUT_ymm(pFiniteSolutionYMM, SegmentNumber) result use mod_all use mod_foss type(TypeYMM) :: r(3) type(TypeFiniteSolutionYMM) :: pFiniteSolutionYMM integer :: SegmentNumber end function GetBBTUT_ymm function GetBBTUT_ymm02(pFiniteSolutionYMM02, SegmentNumber) result use mod_all use mod_foss type(TypeYMM02) :: r(3) type(TypeFiniteSolutionYMM02) :: pFiniteSolutionYMM02 integer :: SegmentNumber end function GetBBTUT_ymm02 function GetBBTUT_xmm(pFiniteSolutionXMM, SegmentNumber) result use mod_all use mod_foss type(TypeXMM) :: r(3) type(TypeFiniteSolutionXMM) :: pFiniteSolutionXMM integer :: SegmentNumber end function GetBBTUT_xmm #endif end interface GetBBTUT ... module MOD_interfaces ---------------------------------- ! some file function GetBBTUT_r(pFiniteSolution, SegmentNumber) result USE MOD_UTIL use MOD_FOSS use MOD_FOSSinterface implicit none ! return value real :: r(3) ! input args type(TypeFiniteSolution) :: pFiniteSolution integer :: SegmentNumber ! local variables real, automatic :: TUI(3) ! code ! Get TUI TUI = pFiniteSolution.rBBTUI(:,SegmentNumber) !--------------------------------------------- ! TRANSFORM TO TETHER FRAME !--------------------------------------------- CALL MATVEC (1, pFiniteSolution.rGIT, TUI, r) end function GetBBTUT_r #ifdef _Use_AVX function GetBBTUT_ymm(pFiniteSolution, SegmentNumber) result use mod_all use mod_foss type(TypeYMM) :: r(3) type(TypeFiniteSolutionYMM) :: pFiniteSolution integer :: SegmentNumber ! local variables type(TypeYMM), automatic :: TUI(3) ! code ! Get TUI TUI(1) = pFiniteSolution.rBBTUI(1,SegmentNumber) TUI(2) = pFiniteSolution.rBBTUI(2,SegmentNumber) TUI(3) = pFiniteSolution.rBBTUI(3,SegmentNumber) !--------------------------------------------- ! TRANSFORM TO TETHER FRAME !--------------------------------------------- CALL MATVEC (1, pFiniteSolution.rGIT, TUI, r) end function GetBBTUT_ymm function GetBBTUT_ymm02(pFiniteSolution, SegmentNumber) result use mod_all use mod_foss type(TypeYMM02) :: r(3) type(TypeFiniteSolutionYMM02) :: pFiniteSolution integer :: SegmentNumber ! local variables type(TypeYMM02), automatic :: TUI(3) ! code ! Get TUI TUI(1) = pFiniteSolution.rBBTUI(1,SegmentNumber) TUI(2) = pFiniteSolution.rBBTUI(2,SegmentNumber) TUI(3) = pFiniteSolution.rBBTUI(3,SegmentNumber) !--------------------------------------------- ! TRANSFORM TO TETHER FRAME !--------------------------------------------- CALL MATVEC (1, pFiniteSolution.rGIT, TUI, r) end function GetBBTUT_ymm02 function GetBBTUT_xmm(pFiniteSolution, SegmentNumber) result use mod_all use mod_foss type(TypeXMM) :: r(3) type(TypeFiniteSolutionXMM) :: pFiniteSolution integer :: SegmentNumber ! local variables type(TypeXMM), automatic :: TUI(3) ! code ! Get TUI TUI(1) = pFiniteSolution.rBBTUI(1,SegmentNumber) TUI(2) = pFiniteSolution.rBBTUI(2,SegmentNumber) TUI(3) = pFiniteSolution.rBBTUI(3,SegmentNumber) !--------------------------------------------- ! TRANSFORM TO TETHER FRAME !--------------------------------------------- CALL MATVEC (1, pFiniteSolution.rGIT, TUI, r) end function GetBBTUT_xmm #endif =============================================== ! FPP technique #define TypeQQQ real #define TypeQQQfs TypeFiniteSolution function GetBBTUT_r(pFiniteSolution, SegmentNumber) result #include "GetBBTUT_QQQ.f90" end function GetBBTUT_r #undef TypeQQQ #undef TypeQQQfs #ifdef _Use_AVX #define TypeQQQ type(TypeYMM) #define TypeQQQfs TypeFiniteSolutionYMM function GetBBTUT_ymm(pFiniteSolution, SegmentNumber) result #include "GetBBTUT_QQQ.f90" end function GetBBTUT_r #undef TypeQQQ #undef TypeQQQfs #define TypeQQQ type(TypeYMM) #define TypeQQQfs TypeFiniteSolutionYMM function GetBBTUT_ymm(pFiniteSolution, SegmentNumber) result #include "GetBBTUT_QQQ.f90" end function GetBBTUT_ymm #undef TypeQQQ #undef TypeQQQfs #define TypeQQQ type(TypeYMM02) #define TypeQQQfs TypeFiniteSolutionYMM02 function GetBBTUT_ymm02(pFiniteSolution, SegmentNumber) result #include "GetBBTUT_QQQ.f90" end function GetBBTUT_ymm02 #undef TypeQQQ #undef TypeQQQfs #define TypeQQQ type(TypeXMM) #define TypeQQQfs TypeFiniteSolutionXMM function GetBBTUT_xmm(pFiniteSolution, SegmentNumber) result #include "GetBBTUT_QQQ.f90" end function GetBBTUT_xmm #undef TypeQQQ #undef TypeQQQfs #endif ======================== Where: ! GetBBTUT_QQQ.f90 use mod_all use mod_foss TypeQQQ :: r(3) type(TypeQQQfs) :: pFiniteSolution integer :: SegmentNumber ! local variables type(TypeXMM), automatic :: TUI(3) ! code ! Get TUI TUI(1) = pFiniteSolution.rBBTUI(1,SegmentNumber) TUI(2) = pFiniteSolution.rBBTUI(2,SegmentNumber) TUI(3) = pFiniteSolution.rBBTUI(3,SegmentNumber) !--------------------------------------------- ! TRANSFORM TO TETHER FRAME !--------------------------------------------- CALL MATVEC (1, pFiniteSolution.rGIT, TUI, r) ! end GetBBTUT_QQQ.f90 [/fortran]
Consider when the #include file has 100's of lines of code.

The FPP #include hack works... but Debugging is a hit or miss issue.

If C++0x-like templates were available, it would be relatively straitforward to accomplish this

Jim Dempsey


0 Kudos
Reply