- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have a code piece which uses Fortran user-defined types quite a lot. I find that having "%" sign in the inner loops can significantly increase the memory loads and decrease the performance. But I don't know why. I would truly appreciate your help for helping me find the reason. Here are the two short versions of the code:
Original Version:
- subroutine ARK2(region)
- ! ... Incoming variables
- type(t_region), pointer :: region
- ! ... local variables
- integer :: rkStep, i, j, k, ng, ARK2_nStages, ImplicitFlag
- type(t_mixt), pointer :: state
- state => region%state
- do i = 1, region%grid%nCells
- state%time(i) = state%timeOld(i) + state%dt(i)
- end do
- end subroutine ARK2
Optimized Version:
- subroutine ARK2(region)
- ! ... Incoming variables
- type(t_region), pointer :: region
- ! ... local variables
- integer :: rkStep, i, j, k, ng, ARK2_nStages, ImplicitFlag
- type(t_mixt), pointer :: state
- state => region%state
- real(kind=8), pointer :: time(:), timeOld(:), dt(:)
- ! ... dereference pointers
- time => state%time
- timeOld => state%timeOld
- dt => state%dt
- do i = 1, region%grid%nCells
- time(i) = timeOld(i) + dt(i)
- end do
- end subroutine ARK2
Please note that the only change I made in the optimized version is defining some local pointers to remove the "%" sign in the inner loop. What surprised me is that the original version has two times more load instructions executed (measured by using TAU and PAPI) than the optimized one. And the optimized one can run two times faster than the original one. I am very curious about how this happened and how the compiler will deal with the "%" sign in Fortran user defined types. Many thanks for your suggestions!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Exactly. It isn't the use of % itself that is slow - it isn't. But you have double the memory references in the first case and this probably also interferes with other optimizations.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
With pointers, the compiler has to do an indirect reference each time you access a component, as it doesn't know if the pointer may be aliased.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve Lionel (Intel) wrote:
With pointers, the compiler has to do an indirect reference each time you access a component, as it doesn't know if the pointer may be aliased.
Hi Steve,
Thank you so much for your reply. But could you please explain with more details? I think both of state%time and time are pointers. Then why the "%" sign can lead to more load instructions and slow down the code?
Many thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you show the declaration of type t_mixt?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve Lionel (Intel) wrote:
Can you show the declaration of type t_mixt?
Hi Steve,
Thanks for your reply. Here is the declaration of type t_mixt:
- TYPE t_mixt
- INTEGER :: NVARS, ND, numproc, myrank
- REAL(RFREAL) :: RE, REinv, PR, PRinv, SC, SCinv, xshock
- REAL(RFREAL) :: DTAU, DTAUold, InitialRHS, CurrentRHSMax, AveHtFluxOld
- REAL(RFREAL), POINTER :: time(:), dt(:), timeOld(:), cfl(:)
- REAL(RFREAL), POINTER :: cv(:,:), cvOld(:,:), dv(:,:), tv(:,:), gv(:,:), cvOld2(:,:), cvOld1(:,:)
- REAL(RFREAL), POINTER :: rhs(:,:), rk_rhs(:,:,:), cvTarget(:,:), cvTargetOld(:,:), RHSz(:,:)
- REAL(RFREAL), POINTER :: VelGrad1st(:,:), TempGrad1st(:,:), MagStrnRt(:), tvCor(:,:)
- REAL(RFREAL), POINTER :: flux(:), dflux(:), muT(:), PrT(:), SGS_KE(:)
- REAL(RFREAL) :: MaxHyperMu, MaxHyperBeta, MaxHyperKappa ! ... maximum hyperviscosity at each time step
- REAL(RFREAL), POINTER :: auxVars(:,:), auxVarsOld(:,:)
- REAL(RFREAL), POINTER :: rhs_AuxVars(:,:), rk_rhs_AuxVars(:,:,:)
- REAL(RFREAL), POINTER :: levelSet(:), levelSetOld(:), rhs_levelSet(:), rk_rhs_levelSet(:,:)
- REAL(RFREAL), POINTER :: precond(:,:), auxVarsTarget(:,:), rhs_explicit(:,:,:), rhs_implicit(:,:,:)
- REAL(RFREAL), POINTER :: rhs_auxVars_explicit(:,:,:), rhs_auxVars_implicit(:,:,:)
- ! - Finite Volume
- TYPE(t_fvsweep), pointer :: sweep(:)
- ! - IO buffers
- REAL(RFREAL), POINTER :: DBUF_IO(:,:,:,:)
- INTEGER, POINTER :: IBUF_IO(:,:,:)
- ! - Adjoint N-S
- REAL(RFREAL), POINTER :: av(:,:), avOld(:,:)
- REAL(RFREAL), POINTER :: avTarget(:,:), avTargetOld(:,:)
- REAL(RFREAL), POINTER :: cvNew(:,:)
- INTEGER :: num_cvFiles, numCur_cvFile
- INTEGER, POINTER :: iter_cvFiles(:)
- REAL(RFREAL), POINTER :: time_cvFiles(:)
- ! ... post-processing
- REAL(RFREAL), POINTER :: pp(:,:), Vort(:,:), Dilat(:)
- ! ... spline coefficients for EOS and transport variables
- TYPE(t_spline), POINTER :: dvSpline(:) ! Cv(T), Cp(T), Gamma(T), Z(T), T(e_int) (in that order)
- TYPE(t_spline), POINTER :: tvSpline ! mu(T), lambda(T), k(T)
- END TYPE t_mixt
Please note RFREAL is simply kind=8 for double precision. Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks - since time is itself a pointer, you have double-indirection in the initial case. The compiler can do a decent job of optimizing single-level references to pointers, but double-level indirections are more complex.
Do you need to use POINTER for all of these? It seems to me that ALLOCATABLEs might work better for many if not all of the cases where you use POINTER. The compiler can deal better with ALLOCATABLE, though you still have the double-indirection issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve Lionel (Intel) wrote:
Thanks - since time is itself a pointer, you have double-indirection in the initial case. The compiler can do a decent job of optimizing single-level references to pointers, but double-level indirections are more complex.
Do you need to use POINTER for all of these? It seems to me that ALLOCATABLEs might work better for many if not all of the cases where you use POINTER. The compiler can deal better with ALLOCATABLE, though you still have the double-indirection issue.
Hi Steve,
Thank you so much for your reply. Just to clarify:
By saying double-indirection, you mean: The first indirection is we need to load address from state and then use the address to load state%time; The second indirection is we need to load address from state%time (since time itself is a pointer) and then to load the floating point numbers.
In the optimized version, there is only one such indirection (loading address from time and then loading the floating point numbers). This is the reason why the original version has more load instructions and a low speed.
Am I correct?
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Exactly. It isn't the use of % itself that is slow - it isn't. But you have double the memory references in the first case and this probably also interferes with other optimizations.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page