Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
26757 Discussions

Optimizing 5d look up table interpolation



Hi, I written a code in order to make a look up table that, through linear interpolation, is able to get the real value y as function of the independent variable vector x(5). In particular the code execute a linear interpolation in 5d.
I'm trying to optimize the code. The main operations of interpolation are carried out in 'lookup_table_5d_mod.f90', subroutine GetColAtLoc, line 203. I'm wondering how to minimize access time to table stored in memory during interpolation operator. In your opinion, should be useful pre-fetching data used during interpolation, in L2 cache?
You will find comments in the source code. Thank you.

0 Kudos
1 Reply
Black Belt

Too little information is available to provide a correct assessment of what to do. This said...., my assumption is...

The computational latencies involves with:

          !Trilinear interpolation in (/x(1),x(2),x(3),x4Low,x5Low)
          y2d(1,1) = sum(w*This%m_Y(i:i+1,j:j+1,k:k+1,m,l))
          !Trilinear interpolation in (/x(1),x(2),x(3),x4High,x5Low)
          y2d(2,1) = sum(w*This%m_Y(i:i+1,j:j+1,k:k+1,m+1,l))
          !Trilinear interpolation in (/x(1),x(2),x(3),x4Low,x5High)
          y2d(1,2) = sum(w*This%m_Y(i:i+1,j:j+1,k:k+1,m,l+1))
          !Trilinear interpolation in (/x(1),x(2),x(3),x4High,x5High)
          y2d(2,2) = sum(w*This%m_Y(i:i+1,j:j+1,k:k+1,m+1,l+1))         

and the latencies (without examination of disassembly code) are likely due to the construction temporary arrays from This%m_Y of shape (2,2,2), in four places and used in the four expressions ...sum(w*This%m_Y(...

Provided that the contents of This%m_Y are used/reused many times after being defined in LT5d_CreateFromData (NewLT%m_Y => SpaceData)...
Then it may be beneficial (latency) to redefine m_Y a linear array of an array of type double precision x(2,2,2) and construct the linear index from i,j,k,m,l.

type a222
   double precision :: a(2,2,2)
end type a222

type , extends(LookupTableClass) :: LookupTable5dClass
!  double precision, pointer     :: m_y(:,:,:,:,:)
   type(a222), allocatable :: m_y(:,:,:) ! first index constructed from i, j, k GetColAtLoc

While this increases the storage size, it should eliminate the temporary creation and gather into the temporary.

(this works provided that the orginal m_Y's are reused a sufficient number of times since creation)

Jim Dempsey