METIS fails with non-diagonal Identity matrix

Dinesh_S_ · ‎10-31-2017

Numerical factorization stage seems to break with multi processors run for sparse Identity matrices for METIS or parallel METIS

The details are given here

https://software.intel.com/en-us/node/742812

Regards

Dinesh

Gennady_F_Intel · ‎10-31-2017

Have you checked the problem with MKL 2017 u4 or 2018?

Dinesh_S_ · ‎11-01-2017

what are the differences between MKL 2017 u4 or 2018 as for as PARDISO is concerned?

Dinesh_S_ · ‎11-01-2017

fails with 2017 u4

Dinesh_S_ · ‎11-01-2017

and fails with 2018 too

Dinesh_S_ · ‎11-02-2017

Gennady F. (Intel) wrote:

Have you checked the problem with MKL 2017 u4 or 2018?

Hi, fails under both updates.. Any insight would be appreciated

Gennady_F_Intel · ‎11-02-2017

ok, thanks, we will check this case. Have you checked if this case work with minimum degree algorithm?

Dinesh_S_ · ‎11-03-2017

yes, as stated in the link, the error appears only for METIS under mult-processor runs. The minimum degree algorithm is significantly slower compared to METIS

Gennady_F_Intel · ‎11-04-2017

ok, but when i convert from coo to csr format ans pass this input to pardiso ( with matrix checker - on), i see -1 message which mean - the input matrix inconsistency. Could then give us the reproducer you use?

Dinesh_S_ · ‎11-06-2017

Gennady F. (Intel) wrote:

ok, but when i convert from coo to csr format ans pass this input to pardiso ( with matrix checker - on), i see -1 message which mean - the input matrix inconsistency. Could then give us the reproducer you use?

Hi

I do not have a converter. I in fact work on csr format, and wrote this matrix out in coo to test for any mistakes using matlab. I am attaching another case where I have the matrix in csr format. Hopefully that helps.

The first file has col index and the column values, and the other file has offset (but these files are quite simple since it is essentially a diagonal identity matrix in under some matrix permutation)

Dinesh

Dinesh_S_ · ‎11-06-2017

It may not dependent on the matrix that specific to my problem. If you create any non-diagonal Identity matrix, and run with METIS it might fail.

Dinesh_S_ · ‎11-08-2017

Gennady F. (Intel) wrote:

ok, but when i convert from coo to csr format ans pass this input to pardiso ( with matrix checker - on), i see -1 message which mean - the input matrix inconsistency. Could then give us the reproducer you use?

do you need anything else from what has been provided?

Thanks

Dinesh

Alexander_K_Intel2 · ‎11-09-2017

We quickly checked you matrix on Linux machine and i doesn't see any issues there. Can i ask you to provide iparm set that you use for this test? An of course we will run this matrix on Win

=== PARDISO: solving a real nonsymmetric system ===
Matrix checker is turned ON
0-based array is turned ON
PARDISO double precision computation is turned ON
Parallel METIS algorithm at reorder step is turned ON
Matching is turned ON

Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.000067 s
Time spent in reordering of the initial matrix (reorder)         : 0.000003 s
Time spent in symbolic factorization (symbfct)                   : 0.013133 s
Time spent in data preparations for factorization (parlist)      : 0.000007 s
Time spent in allocation of internal data structures (malloc)    : 0.011820 s
Time spent in additional calculations                            : 0.005717 s
Total time spent                                                 : 0.030747 s

Statistics:
===========
Parallel Direct Factorization is running on 16 OpenMP

< Linear system Ax = b >
             number of equations:           1035
             number of non-zeros in A:      1035
             number of non-zeros in A (%): 0.096618

number of right-hand sides: 1

< Factors L and U >
             number of columns for each panel: 72
             number of independent subgraphs: 0
             number of supernodes:                    1035
             size of largest supernode:               1
             number of non-zeros in L:                1035
             number of non-zeros in U:                1
             number of non-zeros in L+U:              1036
time_reorder 0.0550621
=== PARDISO is running in In-Core mode, because iparam(60)=0 ===

Percentage of computed non-zeros for LL^T factorization
1 % 4 % 5 % 7 % 8 % 9 % 10 % 11 % 13 % 14 % 15 % 16 % 17 % 19 % 20 % 21 % 22 % 23 % 25 % 26 % 27 % 28 % 29 % 30 % 32 % 33 % 34 % 35 % 36 % 38 % 39 % 40 % 41 % 42 % 43 % 59 % 60 % 62 % 63 % 64 % 65 % 67 % 68 % 69 % 71 % 72 % 73 % 74 % 75 % 77 % 78 % 79 % 80 % 82 % 83 % 84 % 85 % 86 % 88 % 89 % 90 % 91 % 93 % 94 % 95 % 96 % 98 % 100 %

=== PARDISO: solving a real nonsymmetric system ===
Single-level factorization algorithm is turned ON

Summary: ( factorization phase )
================

Times:
======
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct)                        : 0.028661 s
Time spent in allocation of internal data structures (malloc)    : 0.000029 s
Time spent in additional calculations                            : 0.000002 s
Total time spent                                                 : 0.028692 s

Statistics:
===========
Parallel Direct Factorization is running on 16 OpenMP

< Linear system Ax = b >
             number of equations:           1035
             number of non-zeros in A:      1035
             number of non-zeros in A (%): 0.096618

number of right-hand sides: 1

< Factors L and U >
             number of columns for each panel: 72
             number of independent subgraphs: 0
             number of supernodes:                    1035
             size of largest supernode:               1
             number of non-zeros in L:                1035
             number of non-zeros in U:                1
             number of non-zeros in L+U:              1036
             gflop   for the numerical factorization: 0.000000

gflop/s for the numerical factorization: 0.000000

=== PARDISO: solving a real nonsymmetric system ===

Summary: ( solution phase )
================

Times:
======
Time spent in direct solver at solve step (solve)                : 0.005805 s
Time spent in additional calculations                            : 0.000017 s
Total time spent                                                 : 0.005822 s

Statistics:
===========
Parallel Direct Factorization is running on 16 OpenMP

< Linear system Ax = b >
             number of equations:           1035
             number of non-zeros in A:      1035
             number of non-zeros in A (%): 0.096618

number of right-hand sides: 1

< Factors L and U >
             number of columns for each panel: 72
             number of independent subgraphs: 0
             number of supernodes:                    1035
             size of largest supernode:               1
             number of non-zeros in L:                1035
             number of non-zeros in U:                1
             number of non-zeros in L+U:              1036
             gflop   for the numerical factorization: 0.000000

gflop/s for the numerical factorization: 0.000000

0: 1013 10.00 1.00 1.00 1.00
Residual 0.000e+00

Dinesh_S_ · ‎11-09-2017

StartThread_ = mkl_get_max_threads();

mkl_set_dynamic(0);

mkl_set_num_threads(SparseConfig_.OpenMpThreads());

maxfct = 1; /* Maximum number of numerical factorizations. */

mnum = 1; /* Which factorization to use. */

msglvl = 0; /* Print statistical information in file */

error = 0;

for (auto i = 0; i < 64; i++) iparm = 0;

iparm[0] = 1; /* No solver default */

iparm[1] = SparseConfig_.PardisoRO(); /* 0: The minimum degree algorithm */

/* 2: The nested dissection algorithm from METIS package*/

/* Numbers of processors, value of OMP_NUM_THREADS */

iparm[2] = 0;

iparm[3] = 0; /* No iterative-direct algorithm */

iparm[4] = 0; /* No user fill-in reducing permutation */

iparm[5] = 0; /* Write solution into x */

iparm[6] = 0; /* Not in use */

iparm[7] = SparseConfig_.NumberOfIterativeRefinements(); /* Max numbers of iterative refinement steps */

iparm[8] = 0; /* Not in use */

iparm[9] = SparseConfig_.PivotShift(); /* Perturb the pivot elements with 1E-13 */

iparm[10] = 1; /* Use nonsymmetric permutation and scaling MPS */

iparm[11] = 0; /* Conjugate transposed/transpose solve */

iparm[12] = 1; /* Maximum weighted matching algorithm is switched-on (default for non-symmetric) */

iparm[13] = 0; /* Output: Number of perturbed pivots */

iparm[14] = 0; /* Not in use */

iparm[15] = 0; /* Not in use */

iparm[16] = 0; /* Not in use */

iparm[17] = -1; /* Output: Number of nonzeros in the factor LU */

iparm[18] = -1; /* Output: Mflops for LU factorization */

iparm[19] = 0; /* Output: Numbers of CG Iterations */

iparm[26] = 1;

iparm[34] = 1; /* zero based index */

for (auto i = 0; i < 64; i++) pt = 0;

phase = 11;

mtype = 11;

nrhs = NRhs;

Note: I get the crash consistently on debug mode run on MS-VS2012

Alexander_K_Intel2 · ‎11-09-2017

Same for windows. Can you check that you set iparm[34] to 1 (zero based CSR matrix)?

Thanks,

Alex

Dinesh_S_ · ‎11-10-2017

Yes it is ;

iparm[34] = 1; /* zero based index */

are you running it under debug mode, VS2012x64? (Release mode, does not always catch this bug)

If you are setting is lite, you can share with me to test

Regards

Dinesh

Dinesh_S_ · ‎11-27-2017

Hi,

Is there any resolution on this issue?

Regards

Dinesh

Dinesh_S_ · ‎01-02-2018

Developers

Any updates on this issue?

Regards

Dinesh

Ying_H_Intel · ‎01-02-2018

Hi Dinesh,

Could you please try the MKL 2018 update 1 version. I build one small test case based on the SparseMatrix you attached in https://software.intel.com/en-us/node/742812. It runs ok in MSVS 2017 with multi-threads.

I'm linking the below library:

Searching C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018.1.156\windows\mkl\lib\intel64_win\mkl_intel_lp64.lib:

1> Searching C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018.1.156\windows\mkl\lib\intel64_win\mkl_intel_thread.lib:

1> Searching C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018.1.156\windows\mkl\lib\intel64_win\mkl_core.lib:

1> Searching C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018.1.156\windows\compiler\lib\intel64_win\libiomp5md.lib:

MKL 2018, minor 0, update 1, version 20180001, build date 20171007

Best Regards,

Ying
non-zero iparm values:
iparm[0] = 1
iparm[1] = 2
iparm[7] = 2
iparm[9] = 13
iparm[10] = 1
iparm[12] = 1
iparm[17] = -1
iparm[18] = -1
iparm[26] = 1
iparm[34] = 1

=== PARDISO: solving a real nonsymmetric system ===
Matrix checker is turned ON
0-based array is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON

Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.000018 s
Time spent in reordering of the initial matrix (reorder)         : 0.000007 s
Time spent in symbolic factorization (symbfct)                   : 0.000588 s
Time spent in data preparations for factorization (parlist)      : 0.000002 s
Time spent in allocation of internal data structures (malloc)    : 0.002990 s
Time spent in additional calculations                            : 0.000125 s
Total time spent                                                 : 0.003729 s

Statistics:
===========
Parallel Direct Factorization is running on 4 OpenMP

< Linear system Ax = b >
             number of equations:           587
             number of non-zeros in A:      587
             number of non-zeros in A (%): 0.170358

number of right-hand sides: 1

< Factors L and U >
             number of columns for each panel: 128
             number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    587
             size of largest supernode:               1
             number of non-zeros in L:                587
             number of non-zeros in U:                1
             number of non-zeros in L+U:              588
=== PARDISO is running in In-Core mode, because iparam(60)=0 ===

Percentage of computed non-zeros for LL^T factorization
1 %
2 %
3 %
4 %
5 %
6 %
7 %
8 %
9 %
10 %
11 %
12 %
13 %
14 %
15 %
16 %
17 %
18 %
19 %
20 %
21 %
22 %
23 %
24 %
25 %
26 %
27 %
28 %
29 %
30 %
31 %
32 %
33 %
34 %
35 %
36 %
37 %
38 %
39 %
40 %
41 %
42 %
43 %
44 %
45 %
46 %
47 %
48 %
49 %
50 %
51 %
52 %
53 %
54 %
55 %
56 %
57 %
58 %
59 %
60 %
61 %
62 %
63 %
64 %
65 %
66 %
67 %
68 %
69 %
70 %
71 %
72 %
73 %
74 %
75 %
76 %
77 %
78 %
79 %
80 %
81 %
82 %
83 %
84 %
85 %
86 %
87 %
88 %
89 %
90 %
91 %
92 %
93 %
94 %
95 %
96 %
97 %
98 %
99 %
100 %

=== PARDISO: solving a real nonsymmetric system ===
Single-level factorization algorithm is turned ON

Summary: ( factorization phase )
================

Times:
======
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct)                        : 0.034930 s
Time spent in allocation of internal data structures (malloc)    : 0.001086 s
Time spent in additional calculations                            : 0.000005 s
Total time spent                                                 : 0.036021 s

Statistics:
===========
Parallel Direct Factorization is running on 4 OpenMP

< Linear system Ax = b >
             number of equations:           587
             number of non-zeros in A:      587
             number of non-zeros in A (%): 0.170358

number of right-hand sides: 1

< Factors L and U >
             number of columns for each panel: 128
             number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    587
             size of largest supernode:               1
             number of non-zeros in L:                587
             number of non-zeros in U:                1
             number of non-zeros in L+U:              588
             gflop   for the numerical factorization: 0.000000

gflop/s for the numerical factorization: 0.000000

=== PARDISO: solving a real nonsymmetric system ===

Summary: ( solution phase )
================

Times:
======
Time spent in direct solver at solve step (solve)                : 0.000082 s
Time spent in additional calculations                            : 0.000833 s
Total time spent                                                 : 0.000915 s

Statistics:
===========
Parallel Direct Factorization is running on 4 OpenMP

< Linear system Ax = b >
             number of equations:           587
             number of non-zeros in A:      587
             number of non-zeros in A (%): 0.170358

number of right-hand sides: 1

< Factors L and U >
             number of columns for each panel: 128
             number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    587
             size of largest supernode:               1
             number of non-zeros in L:                587
             number of non-zeros in U:                1
             number of non-zeros in L+U:              588
             gflop   for the numerical factorization: 0.000000

gflop/s for the numerical factorization: 0.000000

Input and solution norms:
||A|| = 24.2281
||b|| = 24.2281
||x|| = 24.2281
||Ax-b|| = 0

Press any key to continue . . .

Dinesh_S_ · ‎01-03-2018

Hello Ying H,

I have already tried 2018 and 2017 Update 4 as stated in the threads above. The case always fails when tested under debug mode (With release mode it becomes bit rare phenomenon)

Regards

Dinesh

Ying_H_Intel · ‎01-03-2018

Hi Dinesh,

I means the latest version MKL 2018 update 1 version (not 2018 and 2017 update 4) . i seems be able to see the crash with early version.

Best Regards,

Ying