Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Interesting Observations

JohnNichols
Valued Contributor III
6,689 Views

I have had a lot of fun and some heart ache playing with the MKL Library and the PARDISO solver.

The PARDISO solver is very fast.  The problem I am solving is the hydraulic network problem, in this case using the Gradient Method. The PARDISO solver gives a much better first estimate of the solution vector, which can be measured using the error on the sum of the hydraulic heads. It's first pass gives a 4 fold better estimate than the other solvers (all that I have tried).

But the solver then converges very slowly to the "best estimate" of the answer, taking 8000 iterations to do what the others do in 20. So the fact that the iterations are fast, is lost in the slow convergence.

I have played with passing the A array into the PARDISO solver in different formats, both the sparse and the dense matrix, it makes no difference.  Rum really. I look step by step at the solution vector, and it is a long way from the other inversion solution vector even on the first iteration for the fully dense solvers.

Interestingly if I run as a PHASE 33 - I get errors, but it works, if I use 13 - it also works slowly, but I do not get the errors.

It does solve, but it is not faster in the end than the other solvers, interesting really.

Question: Should I try another MKL solver?

JMN

 

 

 

 

0 Kudos
75 Replies
JohnNichols
Valued Contributor III
1,089 Views

Dear mecej4:

Thanks - I will do a complete review of all those things and set to the minimum size.

Sometimes you can look till you are blue in the face. Before I became an academic we called it QA.

JMN

0 Kudos
JohnNichols
Valued Contributor III
1,089 Views

Dear mecej4:

You are brilliant.  It took a few minutes to find the four or five bugs that were crashing the code - all related to bounds.  It now runs 24 hours.

Much faster than the other inversion program.

Yes, I must write it in Fortran just to see the differences.

JMN

0 Kudos
JohnNichols
Valued Contributor III
1,089 Views

Dear mecej4:

WSAF.f90 
!
!  FUNCTIONS:
!  WSAF - Entry point of console application.
!

!****************************************************************************
!
!  PROGRAM: WSAF
!
!  PURPOSE:  Entry point for the console application.
!
!****************************************************************************

    program WSAF

    implicit none
    character*8 A,B,C,D
    character*4 AF
    double precision F
    double precision H
    integer i,j
    integer Nnodes
    double precision dumX,dumY,dumE,dumC
    double precision dumIndD,dumResD,dumParkD,dumLeakD
    
    structure /SNODE/
        integer ID
        integer Tank
        double precision X
        double precision Y
        double precision E
        double precision C0
        double precision Qbase
        double precision Cs
        double precision H
        double precision C
        double precision indDemand
        double precision parkDemand
        double precision leakDemand
        double precision resDemand
        double precision dumC
    end structure
        
    type(SNODE) snodes(100)
    ! Variables

    ! Body of WSAF
    print *, 'Hello World'
    open(1, FILE="FileM.in",STATUS ="UNKNOWN")
    
     Read(1,*)A,B,C,D
     Write(*,*)A,B,C,D
     Read(1,*)A,B,C,F,H
     Write(*,*)A,B,C,F,H
     i= 0
     Nnodes = 0
     do while(i .ne. 1)
         Read(1,*)AF
         if(AF .eq. "NODE") then
             backspace(1)
             Read(1,*)AF,j,dumX,dumY,dumE,dumC,dumResD,dumParkD,dumIndD,dumLeakD
            if(AF .eq. "NODE") then
                Print *,AF
                Nnodes = Nnodes+1
                snodes(NNodes)%ID = j
                snodes(NNodes)%X = dumX            
                snodes(NNodes)%Y = dumY
                snodes(NNodes)%E = dumE
                snodes(NNodes)%C = dumC
                snodes(NNodes)%resDemand = dumResD
                snodes(NNodes)%parkDemand = dumParkD
                snodes(NNodes)%indDemand = dumIndD
                snodes(NNodes)%leakDemand = dumLeakD
            endif
         else
                i = 1
        endif
    end do
         
         
    Close(1)

    end program WSAF

We have to start somewhere

JMN

0 Kudos
mecej4
Honored Contributor III
1,089 Views

John,

It is good that you started to do the port to Fortran. I think that you will see not only slightly faster execution but, as I said earlier, the code will be easier to debug and verify.

List directed input such as 

Read(1,*)AF,j,dumX,dumY,dumE,dumC,dumResD,dumParkD,dumIndD,dumLeakD

carries the risk that it will pull data from subsequent input lines when the line just read does not have enough data to fill the list, i.e., ten items. It would do this kind of unwanted action when reading, for example, a PIPE input line, which has only eight items, and abort with errors since the input data field corresponding to the ninth item, viz., 'PIPE', is not correct for a numeric type.

Here is another idea for you to ponder. So far, you have used Newton-Raphson for solving the system of nonlinear equations, albeit with different linear solvers, dense and sparse. MKL, however, offers another solver: a trust-region solver, with or without bound constraints on the unknowns, called TRNLSP/TRNLSPBC. Far from the solution (i.e., in the earlier iterations without a very good trial solution), the expense of computing the Jacobian may not be justified by the results, and trust region and other methods for nonlinear equations may be worth trying, particularly if you wish to analyze huge systems, say, the crumbling water/sewer systems of Chicago.

0 Kudos
JohnNichols
Valued Contributor III
1,089 Views

Dear mecej4:

I already encountered the problem of reading the pipe line by mistake, which is why I introduced the old trick of looking at the first character group and then deciding what to do.  I need to put end and error statements in as well, but this was just to get me started. I really want to stick to one form of intermediate file taken straight from the DXF drawing, if I have different file types then it starts to become as difficult as  occurs in trying to transfer models now.  I have wasted a lot of time writing code to transfer models into different systems, I stopped doing that as soon as I saw Rossman's 91 system. The system Rossman created in 1991 of having a single entity on a single line is better than anything else that I have tried - just for reviewing and assembling the models. He moved away from that and I do not think it got better, just more complicated.  The 2008 version of EPANET is much more complicated than the 91 version, but it really does not improve the base methods, of course it adds the GUI - but why write your own when you have AutoCAD and LISP.

 There is a Professor in the UK screaming for a revised program for accelerometer analysis - I have put it to one side whilst I dealt with this interesting issue. That code is all in C# because that is what the drivers were written in.

 

I am not going into the Chicago Sewers one suspects there are crocodiles and a few bodies in those sewers.

Thanks

JMN

0 Kudos
JohnNichols
Valued Contributor III
1,089 Views

Dear mecej4:

I was able to code the Water Supply program in Fortran, made me realize how much I had forgotten in 2 years of C#. 

I go to the stage of calling PARDISO and I keep getting a -1 error, so my input data is incorrect.

I spent the last two days comparing the C# and Fortran results to make sure the data is the same, one interesting problem is the Fortran does not initialize the variables to zero, took a while to remember that point.

Here is the code, the PARDISO solver is fast, but painful to code.  I know it is a simple error, but I need to get a full printout and work through all the stages.  I fixed for the 1 index and the a(I,j) elements - I had trouble getting the search to work in FINDNODE so I did a quick and dirty and need to go back to it.

 

JMN

 

0 Kudos
mecej4
Honored Contributor III
1,089 Views

JMN: Initialization of local variables follows slightly different conventions in Fortran as compared to C/C++. In Fortran, a declaration such as

     INTEGER :: NNZ = 0

(where NNZ is a local variable) sets the specified value only at the start of program execution. If the subprogram containing this declaration is entered a second time, at that time NNZ will have the last assigned value during the previous execution of the subprogram, and not 0. This is the main reason that the binary search code failed.

You have errors in the part of the code that forms the IA, JA, VA arrays for the CSR representation of the matrix. Examine the printed IA, JA, VA values to see.

If the integer type arguments that you pass to PARDISO are 8-byte integers, you should either call PARDISO_64 or provide an interface so that the correct subroutine call can be made. I don't think that you will need 8-byte integers as indices for pipes, tanks, etc., unless you aim to analyze networks with billions of nodes.

Here is corrected source code. The code for computing the F() array is missing. Obviously, you wanted to fix existing bugs before proceeding.

When you post updated source code, please do not revert to using a dot (=period) to refer to components of derived type variables, even though the Intel compiler lets you do so and that may be your personal stylistic preference. Instead, please use the standard % operator. I use other compilers in addition to IFort to debug programs, and non-standard code (i.e., with '.' used in place of '%') cannot then be checked with other compilers which do not accept a period with this meaning. It is fairly easy to convert '%' to '.', but the reverse is not, since operators such as .EQ., etc., contain dots, as do real numbers.

 

0 Kudos
JohnNichols
Valued Contributor III
1,089 Views

Dear mecej4:

I will fix all the % - apologies, I should have done that it already.

Thanks for the comments.  I am starting to think I need to write a separate routine that takes an square array and turns it into the Pardiso solver.

In terms of the F array I have been doing this at a conference and just working on it slowly.

 

JMN

0 Kudos
JohnNichols
Valued Contributor III
1,089 Views

Dear Intel:

I am now getting a -12 error on the Pardiso solver. Which means the licence file is incorrect.

I run fine in C# calling Fortran, but using the pure Fortran program the -12 pops up. 

I also have the PARDISO 5.0.0 installed on the computer, which has not caused problems before, when I ran Fortran code.

Any ideas?

Dear mecej4:

I worked out that there are two places in the code where about 6 lines are identical - is a carry over from the original C code, this first section of the code sets up the F array, and the second section immediately after it sets up the J array, without realizing it at the weekend I left out the F section and proceeded onto the J section.

I also missed taking the absolute value of the head before I took the 0.54 root. Translating from C# to Fortran is not without its quirks. Nan pops up all over the place.

I got the code running, but now have a -12 error.

JMN

 

0 Kudos
mecej4
Honored Contributor III
1,089 Views

I would recommend that you postpone worrying about error messages from Pardiso until you are able to pass the same argument values from Fortran as you did from C#, because the error messages from Pardiso are spurious at this point.

Note what I said about initialization of local automatic variables in #28, and the comment about 8-byte integers. The program has a number of errors of this type. It is only after they are all fixed that it is appropriate to call Pardiso.

The code that I posted in #28 runs with no error messages from Pardiso, but the results are wrong, probably because the input argument values are not correct.

0 Kudos
JohnNichols
Valued Contributor III
1,089 Views

Dear mecej4:

Capture1.PNGThis is the Fortran version of the program running. 

it is running on the PARDISO solver 5.0.0 using 64 bit.  I remembered that one needed to set the output to 64 bit in the VS program and it creates an X64 folder, then it all works,

I get the error code -12 if I use the MKL solver. 

It is faster than C# - although this does not include the write the output to the file for the network solution.

117 ms to 177 ms.

JMN

 

Well done mecej4.

0 Kudos
mecej4
Honored Contributor III
1,089 Views

JMN wrote:
I get the error code -12 if I use the MKL solver. 

That is expected when you call pardiso_64 from 32-bit code. if you go back to 4-byte integers as I suggested, you will not need to call pardiso_64, and your code will run in 32-bit mode or 64-bit mode.

Secondly, the Basel/Lugano Pardiso-5 takes only 4-byte integers for those of its arguments that are of type integer, and 8-byte integers are used only for pointer/handle arguments. It also uses 8-byte integers internally, but a typical user need not be concerned with that aspect.

0 Kudos
JohnNichols
Valued Contributor III
1,089 Views

Dear mecej4:

I have been reading the Reid Textbook on 2003 Fortran explained.  Interestingly it is not as good at its explanations as yours are.  The book talks about the best practices and then sometimes uses them in the examples, but sometimes not.

As someone who basically wrote code for my small group of workmates and who never used libraries, except NAG, which one has the code for the difference in integer and reals was not really an issue. I can remember being peeved when I started using C# and the values did not return from the code, one had to use the ref element or return the value.  Now I am unlearning that point and picking up the Fortran style again. Luckily there is this site and you.

Now that I have the code running albeit in 64 bit with the 5.0.0 solver I need to solve the interesting elements of the types, and make the code easier to understand and so it uses the MKL 32 bit solver, but that can wait for the weekend..  I am actually chuffed that I can get a 64 bit program running at all and remember now how to do it. 

You were of course correct and as shown above the Fortran code runs a lot faster. Interestingly the 64 bit code took few iterations to arrive at the answer, the solver in 64 bit gives slightly different inverse matrix when compared to the 32 bit.  This is interesting.

 

Today I have to solve the problem of the time it takes for two boxes connected by a pipe to equalize pressure.

Thanks for the help, you are certainly a world class programmer.

JMN

 

0 Kudos
mecej4
Honored Contributor III
1,089 Views

Today I have to solve the problem of the time it takes for two boxes connected by a pipe to equalize pressure.

Such an analysis is a standard part of studying the step response of a pressure transducer connected by tubing to pressure ports.

If you do not specify a tolerance on "equal", the answer is "infinity", since after the flow rate tapers off to the laminar range the decay of the pressure difference and the flowrate are both exponential.

0 Kudos
JohnNichols
Valued Contributor III
1,089 Views

The problem is not the problem, the problem is the solution has to be explained to a bunch of building inspectors who will not be to keen.

One side wants the minimum pipe size and the other side will say the model is not realistic.

 

JMN

0 Kudos
JohnNichols
Valued Contributor III
1,089 Views

Dear mecej4:

I spent an enjoyable weekend buried inside the pipe in a wind problem.  Interesting issues with the use of the software programs. The joys of other people's programs.

I finished the main part of the Fortran version of the Water Supply Program. It is about twice as fast to run as the C# program, all things being as equal as I can make them. The fastest I could get for a 12 node problem was 114 ms. This is using the Intel DOS Window stuff and the writing the output to a list file, otherwise it is 200 ms from inside VS.  The C# consistently runs about 300 ms. I have not tried the Intel DOS Window stuff with C#.  You are correct Fortran is a more elegant solution as the Pure Maths people would say.

I enclose the code FYI.  I am going to look at the chemical concentration stuff next, but thought I might try the other Intel Routine you suggested.

But I am going to leave this version as the 64 bit PARSISO - it works and why stuff with things that work.

There are a couple of algorithms that need to be optimized, been a while since I looked at that sort of stuff.  The pump curve is one area, I do a very bad search for the result, and the FINDNODE needs to be fixed.

Thanks again.

JMN

 

0 Kudos
mecej4
Honored Contributor III
1,089 Views

There is probably an error in getqc() that may be triggered when you try the program on a bigger problem. The range of the subscript of array stanks() is 1:Ntanks. The code, however, references stanks(0) when i3=0, which would be an array bounds violation.

It is probably not worthwhile to time runs on small size problems. Do you have a data file ready for a large problem (with, say, a few hundred nodes, at least)?

0 Kudos
JohnNichols
Valued Contributor III
1,089 Views

I am about to make up such a file, should be ready tomorrow.

JMN

0 Kudos
JohnNichols
Valued Contributor III
1,089 Views

Dear mecej4:

I have made a simple Fortran program to generate a larger data file. This has 180 nodes and 267 pipes, laid out in a long grid. It represents a terrible water supply system as all of the supply is at one end and the demand is scattered. I will fix it - but it does work. Takes about 877 ms for a 24 hour analysis of this problem.  This is pretty fast for the size of the problem.  I need to fix the other programs to read this input files as it has more explicit demands - much better for the poor modeller who needs to measure things they can count not try and work out gph.

The compile spits the pacifier into the dust at 400 elements in the arrays, I need to use the /heaps_arrays on the command line.  If I set maxElements to say 4000 it runs real slow.

I looked at the i3 issue.

I can trace that back to the original code, Rossman used 1 based indexing in his C code, without thinking I used 0 based indexing in the C# code. Rossman had set the TANK element of the class to 0 if it was not a tank and the tank number otherwise. I had to fix this in the C# code to use a -1 on TANK to be a non-tank (so to speak). The code in getpc sets the start flow in the pipe, but i3 cannot be zero in any code - it is set at -1 and then must be a nonzero positive integer. This code is never touched. I have put a stop in this code in case it ever runs, but it is not used. Set the heads somewhere else and never pulled the code out.  It just lives on.

I tried looking for temp arrays being passed but did not see the error message.

It compiles to 15750 elements in the vector arrays. My problem is that I am creating the Jacobian matrix and then creating the PARDISO arrays, I need to get rid of the Jacobian and store directly.  It uses a lot of storage.

JMN

 

 

0 Kudos
mecej4
Honored Contributor III
1,061 Views

The program contains several unnecessary simplifications that, for a first knock-together, are fine, but some readjustment is necessary before moving on to really large problems.

For example, J(:,:) is dimensioned (Maxelements, Maxelements), even though it would be sufficient to use J(Nodes, Nnodes). If you defer allocation of the main arrays until after the data file has been read and you have tallied Nnodes, Npipes, etc., you could probably solve larger problems than if you allocated large arrays of which only a small part would get populated and used. 

I checked the sparse structure of the Jacobian, and for the run with File3.in, the nonzero elements are always the same set for every call of of subroutine Simulation. If you can predict that this finding will remain true for all the problems that you will give to the program to solve, I think that there are possibilities for storing J solely as a sparse matrix and do away with the big square array that we have at present.

0 Kudos
JohnNichols
Valued Contributor III
1,061 Views

J is set by the structure of the problem which does not change at all. The structure contains the topology, really  the only change in the topology may occur with a funny hydraulic device in the future - where you are forced to recalculate elements of J.

J should go straight into the PARDISO format - it only needs to be as long as the topological requirements which for these problems are quite sparse.  Almost always centred close to the diagonal, because of the way we humans number nodes, ie 2 is usually connected to 1 etc.

I will look to change to allocation late - first

The penny dropped when I noticed the later hours took only a few iterations as all we are doing is perturbing the head at a reservoir and then recalculating flows that are close.

Assuming you want a 5 million person model, 3 people per house and 8 houses per node, one needs 210,000 nodes, and given that one really should not require more than 750,000 elements in the Pardiso arrays - at the moment J is 180 million elements

There is no hassle in generating a model this big, except for the algorithm to select a starting pipe size for each pipe. The matrix is a bit sensitive to errors in diameter.

JMN

0 Kudos
Reply