Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Matrix Operation

JohnNichols
Valued Contributor III
2,755 Views
do k = 1 , 3
        do j = 1 , 3
            d(1) = c(1,1)*qm(1,j,k) + c(1,2)*qm(2,j,k) + c(1,3)*qm(3,j,k)
            d(2) = c(2,1)*qm(1,j,k) + c(2,2)*qm(2,j,k) + c(2,3)*qm(3,j,k)
            d(3) = c(3,1)*qm(1,j,k) + c(3,2)*qm(2,j,k) + c(3,3)*qm(3,j,k)
            do i =1,j
                kth(i,j)=kth(i,j)+qm(1,i,k)*d(1)+qm(2,i,k)*d(2)+qm(3,i,k)*d(3)
                kth(j,i)=kth(i,j)
            end do
        end do
    end do

This has got to be a matrix operation - doe it not?

 

0 Kudos
52 Replies
jimdempseyatthecove
Honored Contributor III
607 Views

Additional suggestion:

Have each of your triangle object contain area2 (area squared), that is built when you construct/initialize the object. The initialization code can perform the squaring and verification of the input data (and do it once). Then you can remove the construction of, and verification of, area2 in all subroutines that are performing his test. This is another reason for passing an object (Triangle) reference rather than a list of args.

Keep in mind that you are (principally) manipulating doubleprecision numbers, of which you have setup parameter dp.
Literal floating point numbers are by default single precision.
Therefore, you should be suffixing your literals with _dp

wfac = 0.75_dp*f*area
...

area43=0.666666666666666_dp*area2

Though in this case you might want to consider

real(kind = dp), parameter :: TwoThirds = 2.0_dp / 3.0_dp
...
area43 = TwoThirds * area2

Doing so (suffixing literals with _dp) will make your code faster and more accurate.

Jim Dempsey

0 Kudos
JohnNichols
Valued Contributor III
607 Views

Jim:

The sample input file is in the zip files as at.inp.

The data is a two element model , the first element is the one supplied by Felippa with his code and the second one is a made up one to give me more data for assembling the stiffness matrix. I check the first mode on the first element to make sure I have not stuffed up the program.

I have now added PARDISO to the solution and it appears to run -- I have no idea at this stage if the answers are correct - one would be lucky if they are -- I will make up the standard file that Felippa's student published with the Mathematica code so we can now start to check answers against a problem with known answers. 

SM3MHFF​ - I am not using this element at the moment.  The basic element of interest is the SM3SHELL​, which Felippa created in an interesting way, he actually took 4 elements to make a plate, each follows a different mechanics theory and has less than the 6 degrees of freedom at each node.  He adds the output from the four elements together into one 18 by 18 stiffness matrix for three nodes each with 6 dof.  Quite simple really - but elegant.  The SM3SHELL calls the routines

    call SM3SHMEMBB(xlp,ylp, dm, alphab, f, le, esm, sm)
    sm = esm
    call SM3SHMEMBH(xlp,ylp, dm, betab, le, esm, sm)
    sm = esm
    call SM3SHBENDB(xlp,ylp, db, f, zero, f, lb, esm, sm)
    sm = esm
    call SM3SHBENDH(xlp, ylp, db, f, lb, esm, sm)
    esm = sm

to put it all together into one 18 by 18 array - in the extensive printout I now give myself you can see the development of the esm matrix, the other elements in the file are less than 18dof and of no help in my specific problem, but I will look at them when I have a minute.

I am looking at bridges, with multiple beams spanning from one pier to another - any standard highway bridge,  the beams are connected with a deck - really just a thick plate across multiple beams and then the deck is continuous over the supports, so it is a weird hybrid arrangement.  I now have data from this type of bridge and I need an analysis method that allows me to play with the stiffness matrix directly and this certainly does.  I like to avoid the GUI's that separate you from the model - a simple data txt file is so much quicker and easier for these simple models. 

I will make the changes you suggest, clean up the code a little more - still dirty and make a good test file.  My original intention was to add the plate element to the beams program, but I think it will be easier the other way around, a beam is just a special plate with two nodes. 

In terms of the time analysis, I need to do 2000 time steps per second and for 8 minutes at least to match our data files.  That is some heavy duty number crunching, something like 800000 time steps.  So I am a ways away from that step.

John

 

0 Kudos
JohnNichols
Valued Contributor III
607 Views

Jim:

I fixed the code I think for all the dp numbers, actually created numbers in BASE module - like ONE = 1.0_dp and then used the ONE, etc. Seemed a bit easier and now I can use the BASE file elsewhere, In much the same way as Andrew suggested (I believe I am correct) the mn_6 etc, which is also useful.

I think I have made all of the arrays allocatable that need to be allocatable, I was hoping to keep the input file as simple as possible - the ones we used in Water Supply would drive you mad because of the separation of everything into different blocks. The stiffness matrix will be sparse, but not packed at the moment -- until I am sure it is working. Small sacrifice as I will not be over 100 nodes for the moment.

consider making a subroutine out of your inner code (post move), then call with the appropriate <<<<<<<<<<<<<<<<<<<<  I am lost here.

___________________________

Latest code is attached -- the input file is still AT.INP

This is a lot of fun.

Tahnks for the help. 

John

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
607 Views

>> I am lost here

Ignore the comment about the code change. I think my comments were with respect to pre Borr(4).zip code.

Your code is shaping up. Have you been able to run tests to assure correctness of the code changes (iow reproduce results run or published)?

>> I need to do 2000 time steps per second and for 8 minutes at least to match our data files

Is the computation at time T=n dependent on time T=n-1?

If yes (I expect yes), then parallelization may be difficult. Bear in mind that you can get parallelization by using asynchronous stages (i.e. pipelining).

For example, at T=0, the normal vector (out of the plane of the 3 points of the node), and related computations, can be computed for all nodes in parallel. If you structure your code properly, you can be computing different nodes in different time phases at the same time. See this:

http://lotsofcores.com/PlesiochronousPhasingBarrierVideo

Jim Dempsey

0 Kudos
JohnNichols
Valued Contributor III
607 Views

Jim:

Thanks for the comments.  At the moment I have only run the problem given by Felippa with his original code and then to see if I get the same values as output.  This problem is really weird and one would not use it in reality.

I am now at a stage - last night - where I can now look at a full published problem, the one used Golmon for the Mathematica code. Page 3 of the GLL attachment. I also found another chapter by Felippa about the use of edge beams - which is pretty much the problem I have - his ideas are excellent the real math issue is the connection of the plate to the attached beam, AXISHELL had a similar problem with edge restraint beams so there is a reasonably well established method.

I will put up the sample problem as soon as I can work it out - Golmon does not give all of the numbers, I am going to have to back calculate some.

The other problem that is a standard test is called the patch test. Essentially all elements are fixed except a middle one that is free.  Interesting problem, it is considered a standard test by the academic community but I gather it is warm fuzzy feeling it gives you when it works rather than an absolute assurance you are correct.

The real issue - I am working with a team at the moment always want to jump to full brick FEM elements and this makes the output both fulsome and a challenge to understand.  Developing brick FEM models is a real challenge, having tried Cathedral sized structures - the building variations from rectilinear drive you insane.

In terms of parallel processing:

Step 1 is to read in the data, - each element is independent and with the method you suggested each plate element can be developed separately all the way to the inclusion in the stiffness matrix. I need to work on the assemble step as this is a pain because of the possible 64 variations in boundary conditions.

Step 2 For the static analysis - simple send it to Pardiso

Step 3 is the eigensolutions - FEAST

Step 4 is the time based method - setting the absolute minimum time period at 16384 time steps - 2000 time steps per second for 2^14 records for a decent FFT analysis -  and 1 second per time step for the processor then we need 4 hours of a single processor, even 8 processors we are still measuring in 30 minute intervals.

The real issue is designing the time analysis to be parallel from the start and not in reverse -- ideas?

John

 

0 Kudos
JohnNichols
Valued Contributor III
607 Views

Jim:

Start of the sample problem -- interesting how people publish results and then leave out the problem details to allow you to check the results. 

This is only two plates so I can check the results completely and then move to their full problem.

John

0 Kudos
JohnNichols
Valued Contributor III
607 Views

Jim:

Of course the problem with engineering is that no-one believes the modeler except the modeler and everyone believes the experimental engineer except the experimental engineer who has seen all the fudges that went into the data recording.

I took the Golmon example of a cantilever beam with a fixed end moment and set up the model in both the original Fortran code by Felippa and the newish code copied from the different sources and somewhat updated.  I took a while to figure out the missing data and then to get the two sets of code to agree on the stiffness matrix. I found several mistakes I had made in the coding that only became obvious once I had two sets of test data running.  I now have the sum of the numbers in the two SM matrices - mine and Felippa agreeing to 8 significant figures and I cannot observe any significant differences in the two output files using EXAMDIFF - an excellent comparison program.

Golman published a single result for the cantilever from her measurements as if it is a 1 d beam, in reality it is a large plate 0.6 m wide, 1 m high made of steel and 4 metres long.  Capture.PNG shows her published results. My results from PARDISO are shown on Capture 2, Node 5 and 6 are at the end of the cantilever and are nicely in the region of the GOLMON beam result. There are other tests I need apply but I think I am at least close enough to continue with the coding. 

Thanks for the help this is neat.

John

0 Kudos
JohnNichols
Valued Contributor III
607 Views

Steve:

I created an array in a TYPE to hold some DP variables as an allocatable array.

Allocated the array -

forgot to zero it ---  !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

If I ran in debug it was zeroed on entry to the subroutine - if not in debug it had garage in it -- because I have been developing the code I mainly set breakpoints and then the array is zeroed?

I am seeing normality or weirdness ?

Once I worked out it was not zeroed I could do that and it worked?

John

0 Kudos
JohnNichols
Valued Contributor III
607 Views

Jim:

I think I am at least in the right ball park, maybe not at the goal posts.  I can do PARDISO and FEAST and the answers do not look stupid.

John

0 Kudos
Steven_L_Intel1
Employee
607 Views

By default, no variables get zeroed, including allocatables. You can use /Qinit to specify an initial value for allocatable (and other) variables. If built in a debug configuration, allocatables will tend to be filled with CCCCCCCC, at least on first allocation. Otherwise you get whatever was in the memory before.

0 Kudos
JohnNichols
Valued Contributor III
607 Views

Steve:

Thanks - that nearly drove me mad.

J

0 Kudos
Reply