Pardiso inaccurate in MKL 2017, but not in 2016

Essex_E_ · ‎07-17-2017

I have recently updates from MKL 2016 to MKL 2017 (from INTEL_MKL_VERSION 110304 to 20170003). I quickly found that Pardiso is completely inaccurate on some simple problems. Please find attached a matrix in plain text format (matrix_explosion.txt) as well as some logging info from solving it with MKL 2016 (matrix_explosion_mkl2016.txt) and MKL 2017 (matrix_explosion_mkl2017.txt).

The matrix is symmetric, real, indefinite, has 1563 rows. It's small enough to analyze with dense LAPACK, so I loaded it into Octave and found that it has condition number 5.8e7, which seems totally reasonable.

Solving with MKL 2016, the inf-norm relative residual is 6e-4. Solving with MKL 2017, the inf-norm relative residual is 1e14 (!!!). Solving with Octave, the inf-norm relative residual is 4e-10. Even with MKL 2016, Pardiso is not nearly as accurate as whatever Octave is doing.

My iparm inputs are:

iparm[0] = 1; // use my settings
iparm[1] = 2; // METIS
iparm[9] = 8; // pivot perturbation of 1.0E-8
iparm[10] = 1; // scaling
iparm[12] = 1; // weighted matching
iparm[17] = -1; // enable reporting
iparm[20] = 1;  // enable reporting
iparm[26] = 1;  // input matrix checking

I'm running Windows 10 with a Intel Core i7 5960X @ 3.00GHz. I'm compiling and linking with MSVC using toolset Visual Studio 2015 (v140). My compile line looks (roughly) like:

/Yu"stdafx.h" /MP /GS /W3 /Gy /Zc:wchar_t /I"C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\tbb\include" /I"C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\mkl\include" /Zi /Gm- /O2 /sdl /Fd"x64\Develop\vc140.pdb" /Zc:inline /fp:precise ... more defines ... /D "NDEBUG" /D "_CONSOLE" /D "WIN32_LEAN_AND_MEAN" /D "NOMINMAX" /D "_USE_MATH_DEFINES" /D "_SCL_SECURE_NO_WARNINGS" /D "_CRT_SECURE_NO_WARNINGS" /errorReport:prompt /WX- /Zc:forScope /Gd /Oi /MD /openmp- /Fa"x64\Develop\" /EHsc /nologo /Fo"x64\Develop\" /Fp"x64\Develop\solveLinearSystem.pch"

My link line looks like:

/OUT:"x64\Develop\solveLinearSystem.exe" /MANIFEST /NXCOMPAT /PDB:"x64\Develop\solveLinearSystem.pdb" /DYNAMICBASE ... some libs ... "tbb.lib" ... some libs ... "mkl_core.lib" "mkl_tbb_thread.lib" "mkl_intel_lp64.lib" ... some libs ... "kernel32.lib" "user32.lib" "gdi32.lib" "winspool.lib" "comdlg32.lib" "advapi32.lib" "shell32.lib" "ole32.lib" "oleaut32.lib" "uuid.lib" "odbc32.lib" "odbccp32.lib" /DEBUG /MACHINE:X64 /OPT:NOREF /INCREMENTAL /PGD:"x64\Develop\solveLinearSystem.pgd" /SUBSYSTEM:CONSOLE /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /ManifestFile:"x64\Develop\solveLinearSystem.exe.intermediate.manifest" /OPT:NOICF /ERRORREPORT:PROMPT /NOLOGO /LIBPATH:"C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\tbb\lib\intel64\vc14" /LIBPATH:"C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\mkl\lib\intel64_win" /LIBPATH:"C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\compiler\lib\intel64_win" /TLBID:1

I don't think it's connected, but I have another active post about another Pardiso problem here: https://software.intel.com/en-us/comment/1909006

Paul_N_2 · ‎07-18-2017

I am seeing a similar issue with the Paradiso solver in MKL 2017. In my case, the large residual norm goes away if the weighted matching is turned off. See https://software.intel.com/en-us/node/738375 for further details.

Best,

Paul

Essex_E_ · ‎07-25-2017

Since this has received no response from Intel or any other solution, and it's not even appearing in the correct forum, I've submitted a ticket (#02969321) for the issue.

Spencer_P_Intel · ‎07-27-2017

Essex,

I am attempting to resolve both of your problems but starting with this one. I have created a .c file that imports your matrix and then runs pardiso stage 11, 22 and 33 and then computes res = Ax-b. I have run it on windows and Linux with 0 base and 1 base indexing but I am having trouble recreating the issue you are describing here.

I based my code recreater on the example file: examples/solversc/source/pardiso_sym_c.c which solves a similar problem with a real, symmetric, indefinite matrix. I then imported your matrix into the csr format (with column sorted on each row and only the upper triangular part) and changed the iparm parameters to match the ones you suggested. I also had to adjust from 0 base. Since you did not provide a rhs, I use the ones vector. I added the residual norm calculation and on linux and windows the infinity norm residual remains the same: ||Ax-b|| ~ 0.007 for both libraries mkl2016 version 11.3.4 and mkl2017 update 3 (I think this is what your 2017 test is using) but also for a range of other releases between those.

Would you be able to provide me (attach it to this discussion) with more details of what you are doing, possibly a minimal example code that produces this bug?

Best,

Spencer

Essex_E_ · ‎07-27-2017

Thanks for looking into it. Your inability to reproduce the problem suggests that I may be doing something wrong. I hope it's as simple as that. I'll make a stripped-down stand-alone example that shows reproduces the problem. Should I post that here, or use the ticketing system?

Spencer_P_Intel · ‎07-27-2017

Either one is fine, but it will be easier for me to find it here. Keep everything localized. :)

Spencer

Essex_E_ · ‎07-28-2017

Attached: a small MSVC 2015 project with a single 200-line cpp file that loads the included matrix from disk, solves it with Pardiso, and prints the residual norm. On my computer, it's reproducing the problem: residual are large with MKL 2017 and small with MKL 2016. I look forward to learning what silly mistake I'm making :P

Essex_E_ · ‎07-28-2017

I've just run this test on another computer here. Turns out, the problem doesn't happen on that computer. Perhaps something has gone wrong with my installation of MKL on my local machine.

Essex_E_ · ‎07-28-2017

The mystery deepens. I've got a handful of computers here. I've compiled this test program, copied it around the network, and run the exact same binary on a bunch of computers. Results are below. Some of them fail (larger residual) some of them succeed (small residual). Is this a CPU-specific problem? I'm not sure what might be common across the failing machines, but different on the succeeding machines.

Machine info, and success/fail of my Pardiso test program:
Intel Core i7 5960X  3.0GHz, Windows 10, desktop -- fails
Intel Core i7 5960X  3.0GHz, Windows  7, desktop -- fails
Intel Core i7 6900K  3.2GHz, Windows 10, desktop -- fails
Intel Core i7 4790K  4.0GHz, Windows 10, desktop -- works
Intel Core i7 4790K  4.0GHz, Windows 10, desktop -- works
Intel Core i7 4790   3.6GHz, Windows 10, desktop -- works
Intel Core i7 4870HQ 2.5GHz, Windows  8, laptop  -- works

How else can I help sort this out?

Spencer_P_Intel · ‎07-31-2017

Essex,

Here are some more details and hopefully a resolution for now. When I looked more closely at your matrix sparsity pattern, I observed that it looks similar to the structure of some saddle point problems where we often have a block system with one or more of the main diagonal blocs being zero blocks. For example:

M = [ A B]
[B^T 0]

You are treating your matrix as (mtype= -2) a real symmetric indefinite matrix (which it is) but weighted matching with scaling is known to have some trouble with this type of matrix. The goal is to get large elements close to the diagonal but sometimes it works, sometimes it doesn't, when the matrix has these zero blocks. Now the work around is to instead treat the matrix as real unsymmetric (mtype=11) matrix. This means you need to read in all the elements on each row for CSR format instead of just the upper triangular part.

Otherwise, use the same parameters as before and it should handle the zero block without issue:

  iparm[0] = 1; // user user-provided values
  iparm[1] = 2; // METIS reordering
  iparm[9] = 8; // 1e-8 pivoting perturbation
  iparm[10] = 1; // enable scaling
  iparm[12] = 1; // enable weighted matching
  iparm[17] = -1; // output nnz in factors
  iparm[20] = 1; // use Bunch-Kaufman pivoting
  iparm[26] = 1; // check the input matrix

You can observe the number of perturbed pivots (see iparm(10) description )from this weighted matching in iparm(14) (iparm[13] for c code) . The unsymmetric algorithm can handle the zeros on the diagonal and so I observe the number of perturbed pivots to be ~500 with mtype=-2 but =0 for mtype=11. You residual will now be in the range desired. Mine is around ~1e-8.

I hope this helps!

Spencer

Essex_E_ · ‎08-03-2017

I confirm that I get a good result when using mtype=11, with all other settings the same. I remain undecided about whether or not to adopt this strategy or simply keep using MKL 2016. I'll need to do a lot more performance and robustness testing if I'm to switch our product to using mtype=11.

Yes, my matrix is a saddle-point / KKT-matrix, as you observed. I wasn't aware that weighted-matching has trouble on these sorts of problems. I'm surprised by that, since the manual specifically recommends using weighted matching for highly indefinite problems. Can you point me to some references about this, and about its failure on KKT matrices? Also, what does "highly" indefinite mean? Does that mean many negative eigenvalues, or large negative eigenvalues, or large negative diagonals, or something else?

Per Ying H.'s suggestion, I have tried using mkl_sequential.lib and mkl_intel_thread.lib instead of mkl_tbb_thread.lib. Both of these have much better behaviour. Relative-residual norm is 4e-3 instead of 3e14. I doubt that this is expected behaviour. Unfortunately, I can't use either of these options because of my broader functional and non-functional requirements.

mkl_intel_thread.lib
  number of perturbed pivots = 583
  number of refinement steps = 2
  Input and solution norms:
  ||A|| = 29.2318
  ||b|| = 39.5348
  ||x|| = 9.91907e+08
  ||Ax-b|| = 0.169567
  ||Ax-b||/||b|| = 0.00428906

mkl_tbb_thread.lib
  number of perturbed pivots = 583
  number of refinement steps = 2
  Input and solution norms:
  ||A|| = 29.2318
  ||b|| = 39.5348
  ||x|| = 4.60102e+16
  ||Ax-b|| = 1.25659e+16
  ||Ax-b||/||b|| = 3.17844e+14

mkl_sequential.lib
  number of perturbed pivots = 600
  number of refinement steps = 2
  ||A|| = 29.2318
  ||b|| = 39.5348
  ||x|| = 9.91593e+08
  ||Ax-b|| = 0.179451
  ||Ax-b||/||b|| = 0.00453906

Spencer_P_Intel · ‎08-04-2017

Essex,

Here is what I have learned about weighted matching permutation algorithms for matrix systems. If needed, I hope it will give you a jump start on your own understanding and help you find the right algorithm for your system.

They were initially designed for dense matrices to get large elements on diagonal and reduce the effort for pivoting in the factorization. Not long afterwards they were extended to handle non-symmetric sparse matrices. It is still commonly used for non-symmetric systems and seems to be especially good for non-symmetric indefinite matrices typically from KKT or saddle-point type problems. Using the maximal weighted matching algorithm to permute largest elements to the diagonal reduces the need for perturbed partial pivoting and when combined with fill-in minimizing pivoting this deals with 0 diagonal blocks quite well and does reduce the amount of work a factorization algorithm must do while keeping the accuracy high. Otherwise, if a full pivoting factorization algorithm is used, then it reduces the amount of work as well and handles 0 diagonal blocks well by removing them.

One thing to note is that finding the exact weighted perfect matching is an NP-complete problem and so we use (some pretty good) heuristics to try to get a perfect matching but it is not always guaranteed. This is the first place that makes this a difficult problem that doesn't always work perfectly for every indefinite matrix.

A second difficulty comes when handling symmetric matrices. Note that if Pm is the maximal weighted matching permutation matrix, then (Pm A) is in general, non-symmetric whether A is symmetric or not. In order to handle the symmetric indefinite matrices and stay symmetric, a modification is required. The maximal(hopefully perfect) matching permutation matrix, Pm, is written as a product of cycles in the permutation and then modified somewhat so that it is composed of only 1 and 2 cycle permutations, represented by the new permutation matrix, Pc. Then Pardiso uses the resulting symmetric matrix (Pc A Pc^T) to do the Cholesky factorization with 1x1 and 2x2 Supernode-Bunch-Kaufman partial pivots and the other fill-in reducing algorithms... Notice that this is not longer the original weighted matching permutation and may still require perturbed pivots which can compound to break the numerical accuracy and stability of the resulting factorization. A good reference for the algorithms for symmetric indefinite matrices is "On Fast Factorization Partial Pivoting Methods for Sparse Symmetric Indefinite Systems" by Olaf Schenk (one of the original authors of Pardiso) and Klaus G\"{a}rtner. They make the argument that this approach will not always work for symmetric indefinite matrices but that it does work for a wide class of matrices and does well for many of them. In addition, they mention that the result of the factorization with these perturbed pivots is in general not accurate and iterative refinement will be needed. They later recommend in their results section that at least 2 steps of iterative refinement be used when perturbed pivoting happens to get more numerically accurate results... (This is another idea for you to try. Use symmetric indefinite with scaling and weighted matching, but also enable some iterative refinements. See iparm[8] and iparm[7] for more details)

It appears that you have found a symmetric indefinite matrix which does not necessarily perform well with this symmetric weighted matching algorithm with partial pivoting factorization, but it does work with the non-symmetric algorithm. I hope this helps! (I learned a bit in the process of researching this issue as well)

Spencer

p.s. As per your questions about what highly indefinite means, I believe that KKT and saddle-point type matrices that are symmetric or Hermitian are definitely indefinite (0 and positive eigenvalues). I have no idea what the addition of (highly) to indefinite is supposed to mean. The zero diagonal block elements pose problems for pivoting. Having both positive and negative eigenvalues is also tricky... Indefinite problems are hard in general. :) However, hopefully the above helped for your specific case and I agree the wording in the current documentation is a little confusing.

Essex_E_ · ‎08-04-2017

Thank you very much for the detailed reply. That was helpful. I'll also check out the reference you provided.

I'll need to do some more investigations, but it's sounding like I should probably switch to non-symmetric algorithm for robustness and accuracy reasons, even if it's a performance hit.

I did experiment a bit with using more refinement steps. When I was getting relative residuals around 0.1 or 0.001 with default settings, then adding more refinement brought that down to 1e-10 or better. More refinement didn't help at all on this problem where the residual is 1e14.

Jaromir_K_ · ‎01-26-2018

Hello,

I also ran into accuracy problems using matrices of the form

M = [ A B]
[B^T 0]

and mtype=1.

It occurs for me in MKL 2016 Update 4. I didn't go into enough detail in order to be able to understand what changed between MKL 2016 and MKL 2017, but I'm supposing the change that caused this was probably already included in 2016 Update 4 (?).
The performance hit by switching to mtype=11 is about 30% in my case, which is quite significant for my application, so I'm hesitant about whether to downgrade to 2016 (which would make me have to deal with other bugs in MKL which have meanwhile been fixed) or use mtype=11.

I'm wondering whether there are any plans for this to be fixed in MKL 2018?

Then I could use mtype=11 temporarily and switch back to mtype=1 once MKL 2018 comes out.

Thank you

Essex_E_ · ‎01-26-2018

Jaromir,

For what it's worth, I can tell you my experience here.

I never switched to MKL 2017. I stayed with MKL 2016 and mtype=-2 (real symmetric indefinite). When MKL 2018 (2018.0.124) came out, I switched to it. On my matrices, it doesn't have the bad behaviour of MKL 2017. I never did find the time to thoroughly evaluate the suggestion of switching to mtype=11 (real unsymm).

Jaromir_K_ · ‎01-29-2018

Thanks very much for sharing!

What puzzles me is that for me the problem also occurs in version 2016 Update 4, which you are saying to have worked fine for you.

I guess I will use mtype=11 for now and will try switching back to mtype=1 once I've updated to 2018, as it seems the behaviour might have changed between the two versions.

I will try to report back here for future visitors.

Spencer_P_Intel · ‎01-31-2018

Jaromir,

Two comments:

First, Would you mind sharing your test matrix and a simple test case where you observe the accuracy issues so we can work through and test if it is still happening with the latest code base?

Second, Can I assume that your matrix M is symmetric and has 0 or negative valued eigenvalues? That is, A is symmetric and positive definite(or possibly indefinite) and B is not necessarily of full rank. (This is often the case for saddle point problems or interior point optimizations or physical problems like velocity and pressure Navier-Stokes fluid solvers, etc... )

If M has 0 or negative valued eigenvalues then the weighted matching, scaling and perturbed pivoting become important for stability of the algorithm. Note that Gaussian elimination without pivoting is stable for symmetric positive definite and diagonally dominant matrices but is actually unstable for many if not most other matrix types without some type of pivoting. Sparse indefinite matrices are one class that is known to be unstable without pivoting and so we have come up with a lot of approaches to resolve that instability. Some work better than others, but there is not a one size fits all except for complete pivoting which is computationally very expensive. So mostly we try other partial pivoting approaches. Weighted matching and scaling can provide a nice reordering to help to put largest elements on the diagonal (making it more like the diagonally dominant matrices) and perturbed pivots(iparm[21-1]): diagonal pivoting or Bunch-Kaufman pivoting provide more options for removing the instabilities of the indefiniteness when a diagonal has a value of 0 on it. Sometimes the use of iterative refinements (iparm[8-1]) will help minimize any accuracy issues introduced by these perturbed pivots.

One of the big differences between mtype=1 (structurally symmetric) and mtype=11 (unsymmetric) is that mtype=11 performs an M+M^T operation to obtain a structurally symmetric sparsity pattern/matrix for factorization where as mtype 1 does not need to do this since it is already structurally symmetric. In general we recommend to use mtype=11 over mtype=1 since the operation of M+M^T is quite small (<5% total time). Now, mtype=-2 (symmetric indefinite) is another option in this case since the matrix is truly symmetric and has 0 or negative eigenvalues. We have recently improved the performance for mtype=-2,mtype=2 and mtype=11 matrices using the two level factorization algorithm (iparm[24-1]=1) for Intel MKL 2018 (Update 1+) so it is worth trying those if you have 2018 version.

Now mtype=11 by default has weighted matching ( iparm[13-1]=1 ) and scaling (iparm[11-1]=1) turned on where as mtype=1 has them turned off. This likely accounts for your 30% increase in time, as weighted matching is a somewhat heavy operation (but improves stability of solution quite a lot). Did mtype=11 give accurate results when mtype=1 was not accurate? Using mtype=-2 also defaults to no weighted matching or scaling but does use Bunch-Kaufman pivoting (iparm[21-1]=1) in the factorization. This allows for more stable results as well.

When you run your algorithm again, turn on message level information (set msglvl=1 where you are setting your iparm array ) to print out to the screen more details about what exactly is happening when you run Intel MKL Pardiso. This will tell you all sorts of things like whether or not pivoting or matching etc are being used and some convergence details. Likewise if you use things like iterative refinement (iparm[8-1]=# allowed steps) to increase accuracy, then you can read off some off the iparm variables about what was actually performed as well. (see https://software.intel.com/en-us/articles/pardiso-parameter-table for some details about IN and OUT parameters in iparm array or https://software.intel.com/en-us/mkl-developer-reference-fortran-pardiso-iparm-parameter for the complete set of current iparm options)

Anyway, I hope some of that helps you make your design decisions! Please let us know if we can be of further help!

Best,
Spencer