Hi, I written a code in order to make a look up table that, through linear interpolation, is able to get the real value y as function of the independent variable vector x(5). In particular the code execute a linear interpolation in 5d.
I'm trying to optimize the code. The main operations of interpolation are carried out in 'lookup_table_5d_mod.f90', subroutine GetColAtLoc, line 203. I'm wondering how to minimize access time to table stored in memory during interpolation operator. In your opinion, should be useful pre-fetching data used during interpolation, in L2 cache?
You will find comments in the source code. Thank you.
Too little information is available to provide a correct assessment of what to do. This said...., my assumption is...
The computational latencies involves with:
!Trilinear interpolation in (/x(1),x(2),x(3),x4Low,x5Low) y2d(1,1) = sum(w*This%m_Y(i:i+1,j:j+1,k:k+1,m,l)) !Trilinear interpolation in (/x(1),x(2),x(3),x4High,x5Low) y2d(2,1) = sum(w*This%m_Y(i:i+1,j:j+1,k:k+1,m+1,l)) !Trilinear interpolation in (/x(1),x(2),x(3),x4Low,x5High) y2d(1,2) = sum(w*This%m_Y(i:i+1,j:j+1,k:k+1,m,l+1)) !Trilinear interpolation in (/x(1),x(2),x(3),x4High,x5High) y2d(2,2) = sum(w*This%m_Y(i:i+1,j:j+1,k:k+1,m+1,l+1))
and the latencies (without examination of disassembly code) are likely due to the construction temporary arrays from This%m_Y of shape (2,2,2), in four places and used in the four expressions ...sum(w*This%m_Y(...
Provided that the contents of This%m_Y are used/reused many times after being defined in LT5d_CreateFromData (NewLT%m_Y => SpaceData)...
Then it may be beneficial (latency) to redefine m_Y a linear array of an array of type double precision x(2,2,2) and construct the linear index from i,j,k,m,l.
type a222 double precision :: a(2,2,2) end type a222 !Class type , extends(LookupTableClass) :: LookupTable5dClass ... ! double precision, pointer :: m_y(:,:,:,:,:) type(a222), allocatable :: m_y(:,:,:) ! first index constructed from i, j, k GetColAtLoc
While this increases the storage size, it should eliminate the temporary creation and gather into the temporary.
(this works provided that the orginal m_Y's are reused a sufficient number of times since creation)