I am using LAPACK to solve a linear sytem of equations A(n,n) * X(n,1) = B(n,1). A, X and B are real*8 (double precision).
First of all, I create A and B, initialize X to 0, and I then make a copy of A and B to A0 and B0, because A and B will be overwritten.
First of all, I use dgetrf to calculate LU factorization:
call dgetrf ( n, n, A, n, iPiv, info )
with integer*4 array iPiv(n). Matrix A gets overwritten by L and U, with unit diagonal elements not stored. This is clear.
I then continue with dgetrs to solve my system:
call dgetrs ( 'N', n, 1, A, n, iPiv, B, n, info )
where B contains the solution matrix X. Is the solution matrix a solution to the original system A * X = B or does it need to be reordered using iPiv? Right now, I simply copy B to X, hoping that no reordering is needed.
Finally, I am refining the solution X using dgerfs:
call dgerfs ( 'N', n, 1, A0, n, A, n, iPiv, B0, n, X, n, ferr, berr, work, iWork, info )
with ferr(1), berr(1), work(n) and iWork(n). For the final step, my questions are:
- Is any reordering needed for any of the variables used in the call to dgerfs?
- Documentation states that iWork must be (n), but it does not say anything about work for double precision flavors. It says (3*n) for real flavors; is that OK or should it rather be just (n)?
As long as you use the Lapack routines in one of the recommended sequences, you need not use the pivot array yourself. All that is necessary is that you pass the pivot array delivered by the decomposition routine (?GETRF) to the other routines that need it.
Have you tried the Lapack example dgerfsx.f and the associated data file? These are provided in the MKL examples Zip file(s), which when extracted create and populate the source, data and lib directories with the example files.
The work arrays WORK and IWORK/RWORK, as appropriate, are distinct. Their sizes are stated in units of INTEGER, REAL or DOUBLE PRECISION for your convenience, but what matters is that they provide enough bytes of memory. The MKL routines may put any mix of integers, reals, etc. into those work spaces, and you should not worry about their types.
I suggest that you use the Lapack-95 interfaces, if feasible. With them, you need not worry about providing work arrays.
It seems everything works well without me pivoting any arrays myself; thank you for confirming this.
I noticed the possibility to use Lapack-95 interfaces, but I have not tried using them yet. I am going to do so later and compare results with what I use now.
Regarding the Lapack example dgerfsx.f, I could not find any file with that name in the installation directory "IntelSWTools" and its subdirectories.
I would like to ask one more question. After specifying /Qmkl:sequential in my project's options, the size of my application's executable grew from about 6 MB to 12 MB. Is there any way to reduce this size in case I use only dgetrf, dgetrs and dgerfs?
There will be no differences in results because of using the Lapack-95 interfaces. These are merely wrapper routines that enable checking the arguments and allocating work arrays before calling the computational routines (which are merely the F77 routines that you presently are calling directly).
The examples are provided in the form of Zip files with names containing "example" in the MKL installation directory. You have to extract the Zip files to see the individual files, or use a facility to mount a Zip file as a directory.
Your EXE files are large because you specified static libraries in your project. I have seen EXEs generated from a simple 50-line source file (with calls to MKL) jump from 20 kB to 70 MB when you request static libraries instead of dynamic libraries. The size has very little to do with whether you use the sequential or parallel MKL libraries.
Could somebody answer the OP's question about the size of workspace work for ?gerfs. Its given as (*) in the documentation. I am getting stack overflow if I use the lapack95 interface (gerfs) so want to use dgerfs with provided workspace to see if that fixes the stack overflow.