I need some help to speed up a small portion of my fortran code.
Following code is inside several loops.
ix1 and iy1 variables are assigned in outer loops
ij = 0
do i = 1,4
ix1pi = ix1+i
do j = 1,4
ij = ij+1
q(ij,:) = dcube(:,ix1pi,iy1+j)
uxuy(ij,:) = ux(i)*uy(j)
sig2 = sum(q*uxuy,dim=1)
"dcube" is allocatable array.
My question is how to avoid copying columns or any other way of speeding this portion of code.
For one, I'd suggest reversing the bounds of q and uxuy (if you can), as the array sections you are assigning to are discontiguous and this is going to require an extra loop.