Unexpected behavior of cluster_sparse_solver

Ivan_K_6 · ‎09-18-2015

Hello.

I am trying to solve a system with `cluster_sparse_solver`, and am getting unexpected results. https://gist.github.com/ivan-krukov/157372d9a55db244c4b4

In my case, given a sparse matrix `A`, we want to solve `Ax=e`, where `e` is the first column of the identity matrix with the appropriate size (A column where the first entry is 1, rest are zeros).

With small systems, there are no issues, but large systems end up returning unreasonable results. I suspect this might have to do with numerical stability of the solver depending on the number of zeros in the right hand side. For example, a 60 x 60 system returns the expected result, while a 70 x 70 system returns something that looks like overflow. I would also like to note that for different systems, where the right hand side has many non-zero entries, the solutions generated are quite reasonable.

Running the code linked above with: `mpirun -n 10 ./solver_bugs 60` gives the expected answer, while `mpirun -n 10 ./solver_bugs 70` provides something completely bogus.

In the long run, we were hoping to use this for _very_ large systems (100,000 x 100,000).

I am not certain how this problem can be diagnosed. While I hate to post "my code does not work" kind of questions, it seems like the only option here. Any help will be much appreciated.

The version of MKL in question is mkl 11.2u3, which was bundled with composer_xe 2015 3.187.

mecej4 · ‎09-19-2015

I suggest that you prepare a completely serial version of your program and run the 60 X 60 and 70 X 70 problems with that, in order to separate threading issues from issues of algorithm correctness and stability.

An easy way of doing this may be to dump the matrix to a file in your MPI program just prior to calling the cluster solver. Then, using one of the sample programs provided with MKL, you can read the matrix from the file and obtain the solution with a unit vector on the right hand side by calling Pardiso.

I suspect this might have to do with numerical stability of the solver depending on the number of zeros in the right hand side.

I think that this is unlikely.

Ivan_K_6 · ‎09-22-2015

Looks like there was a bug somewhere in my code. Using a very slightly modified version of the `MKL` example works. Thank you!