Those loops are a load of DOT

JohnNichols · ‎03-26-2016

    dm11=tmp(1,1)
    dm12=tmp(1,2)
    dm13=tmp(1,3)
    dm21=tmp(2,1)
    dm22=tmp(2,2)
    dm23 = tmp(2,3)
    dm31 =tmp(3,1)
    dm32 = tmp(3,2)
    dm33 = tmp(3,3)
    do j = 1,mn_9
        l = ls(j)
        s1=dm11*p(j,1)+dm12*p(j,2)+dm13*p(j,3)
        s2=dm12*p(j,1)+dm22*p(j,2)+dm23*p(j,3)
        s3=dm13*p(j,1)+dm23*p(j,2)+dm33*p(j,3)
        do i= 1,j
            k=ls(i)
            sm(k,l)=sm(k,l)+s1*p(i,1)+s2*p(i,2)+s3*p(i,3)
            sm(l,k)=sm(k,l)
        end do
    end do

This is from some 80's code that ran on a Sun, does anyone see any good reason for the dm variables that are lost on me?

jimdempseyatthecove · ‎03-26-2016

The optimization capabilities of and 80's based compiler (likely) did not move the loop invariant code (data) out of the loop, It also would likely not have either registerized the tmp, nor copied the loop invariant code to (stack) local storage. Todays compilers are capable of placing the array base tmp into a register, then using a hard coded offset for the index pair. Therefore the dm.. temporaries can be eliminated.

It would be easy enough to VTune measure the code both ways, then make the determination as to if it is worth it to hunt out all such use of temps for code replacement.

How large is mn_9?

You also might want to experiment with replacing the inner loop with

        do i= 1,j
            sm_tmp(i)=s1*p(i,1)+s2*p(i,2)+s3*p(i,3)
        end do
        do i= 1,j
            k=ls(i)
            sm(k,l)=sm(k,l) + sm_tmp(i)
            sm(l,k)=sm(k,l)
        end do

The former code likely would not have vectorized, The above code would likely vectorize the first loop, then scalar the second loop. Multiplication operation is expensive relative to addition. Depending on the iteration count of i (average is half of mn_9) this may or may not show an improvement.

Jim Dempsey

andrew_4619 · ‎03-26-2016

Those loops are a load of DOT_PRODUCTs which hopefully the compiler will do efficiently if you use the intrinsic function....

JohnNichols · ‎03-26-2016

Thanks for the answers - interesting - yes this is just matrix manipulation to enter the stiffness K matrix for a FEM model.

I am loath to play to much till I have it running and then go in and butcher it a lot. I like correct answers first then speed.

Also helps to work out real math - which is scattered across about 20 academic papers over 30 years

Interesting idea -- take different plate elements and mix them in one stiffness matrix that is 18 dof for a triangular element - some of the code is Fortran and some Matlab -- I am trying to connect the plates to beam element nodes with 6 dof - - it is so much easier using PARDISO and FEAST then commercial code. But one has to write the front end.

Be completely impossible with this forum of course - best place in the world after the Old Man at Coniston.

John

Temporary Variables