dm11=tmp(1,1) dm12=tmp(1,2) dm13=tmp(1,3) dm21=tmp(2,1) dm22=tmp(2,2) dm23 = tmp(2,3) dm31 =tmp(3,1) dm32 = tmp(3,2) dm33 = tmp(3,3) do j = 1,mn_9 l = ls(j) s1=dm11*p(j,1)+dm12*p(j,2)+dm13*p(j,3) s2=dm12*p(j,1)+dm22*p(j,2)+dm23*p(j,3) s3=dm13*p(j,1)+dm23*p(j,2)+dm33*p(j,3) do i= 1,j k=ls(i) sm(k,l)=sm(k,l)+s1*p(i,1)+s2*p(i,2)+s3*p(i,3) sm(l,k)=sm(k,l) end do end do
This is from some 80's code that ran on a Sun, does anyone see any good reason for the dm variables that are lost on me?
The optimization capabilities of and 80's based compiler (likely) did not move the loop invariant code (data) out of the loop, It also would likely not have either registerized the tmp, nor copied the loop invariant code to (stack) local storage. Todays compilers are capable of placing the array base tmp into a register, then using a hard coded offset for the index pair. Therefore the dm.. temporaries can be eliminated.
It would be easy enough to VTune measure the code both ways, then make the determination as to if it is worth it to hunt out all such use of temps for code replacement.
How large is mn_9?
You also might want to experiment with replacing the inner loop with
do i= 1,j sm_tmp(i)=s1*p(i,1)+s2*p(i,2)+s3*p(i,3) end do do i= 1,j k=ls(i) sm(k,l)=sm(k,l) + sm_tmp(i) sm(l,k)=sm(k,l) end do
The former code likely would not have vectorized, The above code would likely vectorize the first loop, then scalar the second loop. Multiplication operation is expensive relative to addition. Depending on the iteration count of i (average is half of mn_9) this may or may not show an improvement.
Thanks for the answers - interesting - yes this is just matrix manipulation to enter the stiffness K matrix for a FEM model.
I am loath to play to much till I have it running and then go in and butcher it a lot. I like correct answers first then speed.
Also helps to work out real math - which is scattered across about 20 academic papers over 30 years
Interesting idea -- take different plate elements and mix them in one stiffness matrix that is 18 dof for a triangular element - some of the code is Fortran and some Matlab -- I am trying to connect the plates to beam element nodes with 6 dof - - it is so much easier using PARDISO and FEAST then commercial code. But one has to write the front end.
Be completely impossible with this forum of course - best place in the world after the Old Man at Coniston.