Odd dgesv results [Now with pardiso]

Patrik_O_ · ‎08-26-2015

Hello,

I'm trying to refactor a piece of code that solves a simple Ax=b linear problem using MKL. From the manual, I saw that dgesv is the function I should call.

A is an n-by-n Jacobian matrix (with just 5 non-zero diagonals), while b and x are arrays of size n.

To solve the problem I use this snippet of code

MKL_INT n = matrix_size;
MKL_INT nrhs = 1;
MKL_INT info;
MKL_INT *pivot = calloc(matrix_size, sizeof(MKL_INT));
dgesv(&n, &nrhs, A, &n, pivot, b, &n, &info);

My problem is that the solution (stored in b) is completely different from my reference result as computed by the old code (a custom and unmaintainable solver). At some point I thought that I introduced an error in the computation of A or b, but if I store these two guys on disk, load them up into MATLAB and solve the problem with A\b, the result is consistent with the old solver.

Am I missing something?

mecej4 · ‎08-26-2015

You need to provide more details. Specifically, how is the matrix A declared and defined? It would be best if you could provide complete source code with which the claimed bug could be reproduced.

You are calling the Fortran subroutine dgesv() from C. This can be done, but the arguments have to be Fortran-compatible. An alternative would be to use the Lapack-e interface, as in the MKL example lapacke_dgesv_col.c.

Patrik_O_ · ‎08-26-2015

mecej4 wrote:

You need to provide more details. Specifically, how is the matrix A declared and defined?It would be best if you could provide complete source code with which the claimed bug could be reproduced.

A is declared as

double *A = calloc(matrix_size * matrix_size, sizeof(double));

then there is a function that fills the appropriate elements of A. The hole thing is part of a big physics simulator, so I cannot upload the whole thing. A itself is pretty big, but I'll try to find if the problems happen with a smaller matrix and to write a small piece of code that shows the issue

You are calling the Fortran subroutine dgesv() from C. This can be done, but the arguments have to be Fortran-compatible. An alternative would be to use the Lapack-e interface, as in the MKL example lapacke_dgesv_col.c.

Any specific things I should be aware about? Beside storing A columnwise

mecej4 · ‎08-26-2015

If the matrix is large and banded, you are probably going to do better if you use the banded matrix solver dgbsv() instead of the dense matrix solver dgesv().

Patrik_O_ · ‎08-28-2015

mecej4 wrote:

If the matrix is large and banded, you are probably going to do better if you use the banded matrix solver dgbsv() instead of the dense matrix solver dgesv().

Thanks. I know about that function, but first I wanted to solve using the dense solver, just to see if I built the matrix A correctly (and not messed up the band storage scheme). In any case, it didn't help. Dbgsv didn't produce the correct result. Furthermore, even matlab has some problems if I treat A as full, while it work correctly if it treat A as sparse.

Just to be clear, I have to solve a set of non-linear equations b(X) that is, a 2D non-linear Poisson equation, and to to this I use this iterative scheme:
X_i+1 = X_i - delta_i
where A(X_i) * delta_i = b(X_i)

Maybe I should try pardiso or dtrnlsp_solve

mecej4 · ‎08-28-2015

A common and good procedure to follow is to start out with a coarse discretization and to obtain nearly identical results with different solvers (dense, banded, sparse) and iteration schemes. Then, you can make the mesh finer and use the more efficient solver. As the mesh becomes finer, however, the solution time is going to increase considerably.

You can split off the linear equation solution as a test problem from your larger program. All that you have to do in the large program is to write the A and b arrays to a file just before the first call to GESV/GBSV, and then stop.

Next, write a stand-alone program that reads the file and solves the equations. Test out different solvers in the stand-alone program and, when you have found one that works well, plough that back into the large program.

There is a nonlinear Poisson example in the MKL cookbook sample set, please see https://software.intel.com/en-us/node/507039 . I am interested in finding out details of the nonlinearity in your application, since I have seen in the past that some nonlinearities may be removed by transforming the dependent variable, and doing so greatly speeds up the solution.

Patrik_O_ · ‎08-29-2015

mecej4 wrote:
A common and good procedure to follow is to start out with a coarse discretization and to obtain nearly identical results with different solvers (dense, banded, sparse) and iteration schemes. Then, you can make the mesh finer and use the more efficient solver. As the mesh becomes finer, however, the solution time is going to increase considerably.

This might be tricky. Data for computing b come from previous steps of the program, which require a thin enough discretization. If the discretization is too coarse, they may fail

You can split off the linear equation solution as a test problem from your larger program. All that you have to do in the large program is to write the A and b arrays to a file just before the first call to GESV/GBSV, and then stop.

Next, write a stand-alone program that reads the file and solves the equations. Test out different solvers in the stand-alone program and, when you have found one that works well, plough that back into the large program.

I could, but so far both GESV and GBSV failed. Maybe I should check again the code that compute A and b

There is a nonlinear Poisson example in the MKL cookbook sample set, please see https://software.intel.com/en-us/node/507039 .

Thanks, I'll check it out

I am interested in finding out details of the nonlinearity in your application, since I have seen in the past that some nonlinearities may be removed by transforming the dependent variable, and doing so greatly speeds up the solution.

Simply put, the equation I have to solve is

d^2 X / d x^2 + d^2 X / d y^2 = k1 * N * exp( (X-M)/ k2)

where d is the differential operator, x and y are two directions, k1 and k2 are two scalar constants and N and M are two arrays of the same size of X (they are defined on every mesh point). To solve, i compute b = d^2 X / d x^2 + d^2 X / d y^2 = k1 * N * exp( (X-M)/ k2) and I want B = 0. A is the Jacobian matrix computed analytically from b

mecej4 · ‎08-29-2015

Patrik O. wrote:
I could, but so far both GESV and GBSV failed. Maybe I should check again the code that compute A and b

It is far more likely that the failure occurred because the routine was called incorrectly. It is fairly trivial to check if the linear equation solver is working correctly: compute the norm of A.x - b after solving A.x = b. It is also possible to call GESVX instead of GESV, and obtain error bounds on the solution.

---

Do you have mathematical expressions for M(x,y) and N(x,y)?

Patrik_O_ · ‎08-29-2015

mecej4 wrote:
It is far more likely that the failure occurred because the routine was called incorrectly. It is fairly trivial to check if the linear equation solver is working correctly: compute the norm of A.x - b after solving A.x = b. It is also possible to call GESVX instead of GESV, and obtain error bounds on the solution.

I'll check

Do you have mathematical expressions for M(x,y) and N(x,y)

No, they are the outputs of a monte carlo method. I could try to scale the equations, since both lhs and rhs are in the order of 10^18

Patrik_O_ · ‎08-31-2015

mecej4 wrote:
It is far more likely that the failure occurred because the routine was called incorrectly. It is fairly trivial to check if the linear equation solver is working correctly: compute the norm of A.x - b after solving A.x = b. It is also possible to call GESVX instead of GESV, and obtain error bounds on the solution.

Ok, I did this check and the norm is 10^21, way too big. The code I'm using now is

int *pivot = calloc(n, sizeof(int));
memcpy(b_copy, b, sizeof(double) * n);
int info = LAPACKE_dgesv(LAPACK_COL_MAJOR, n, 1, A, n, pivot, b, n);                                            
cblas_dgemv(CblasColMajor, CblasNoTrans, n, n, 1.0, A, n, b, 1, -1.0, b_copy, 1);

Patrik_O_ · ‎08-31-2015

OK, let's drop the traditional methods. I've tried to use pardiso and thanks to the CRS scheme, I can provide you a test case.

In this case, the relative residual is 1.5, but the maximum value of the solution (which is what I'm interested in) is 1.6, an acceptable value.

You can download the test case from https://dl.dropboxusercontent.com/u/9287376/testcase.tar.xz

mecej4 · ‎08-31-2015

For the test case posted at Dropbox, the largest absolute value of b in A.x = b is 6.46e+020; the largest absolute value of the coefficients in A is 2.94e+021; therefore a residual of 1.5e+000 is, in comparison, the same as zero.

Perhaps you need to re-scale the variables to make this large range of values smaller. You could also investigate whether those large elements of A and b were correctly computed. Or, there is nothing wrong, and you just have to accept "large" residuals as being normal in your application.

Here is another argument to highlight the fact that residual values should not be compared to solution values. Suppose the equations come from a FEA of a stress-strain problem. A is the stiffness matrix, the x are displacements and the b are forces. It would make no sense to compare a residual force of 1.5 newton to a displacement of 1.6 mm.

Patrik_O_ · ‎08-31-2015

mecej4 wrote:

For the test case, the largest absolute value of b in A.x = b is 6.46e+020, therefore a residual of 1.5e+000 is, in comparison, the same as zero. Perhaps you need to rescale the variables to make this large range of values smaller. You could also investigate whether those large elements of b were correctly computed.

Yes, that is the next thing I will try. b values should be in the range of 1e18 1e20 and the same holds for the corresponding values in A. There are a few rows of A where I have only one nonzero element (equal to 1) and the corresponding element in b should be in the range of [-2, 2]

Patrik_O_ · ‎09-02-2015

I noticed an error in my previous post. While the elements of A will always be in the range of [1e18,1e20] for my kind of problems (except where Dirichlet conditions are applied), the elements from b will tend to 0 when I'm approaching the final solution. Scaling by 1e18 might help with the factorization, but in the end it didn't change a thing and the solutions I find are good for my purposes.

Lesson I learned: use pardiso for solving finite differences differential problems