Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Dinesh_S_
Beginner
136 Views

METIS fails with non-diagonal Identity matrix

Numerical factorization stage seems to break with multi processors run for sparse Identity matrices for METIS or parallel METIS

The details are given here

https://software.intel.com/en-us/node/742812

Regards

Dinesh

 

0 Kudos
26 Replies
Gennady_F_Intel
Moderator
103 Views

Have you checked the problem with MKL 2017 u4 or 2018?

Dinesh_S_
Beginner
103 Views

what are the differences between MKL 2017 u4 or 2018 as for as PARDISO is concerned?

Dinesh_S_
Beginner
103 Views

fails with 2017 u4

Dinesh_S_
Beginner
103 Views

and fails with 2018 too

Dinesh_S_
Beginner
103 Views

Gennady F. (Intel) wrote:

Have you checked the problem with MKL 2017 u4 or 2018?

Hi, fails under both updates.. Any insight would be appreciated

Gennady_F_Intel
Moderator
103 Views

ok, thanks, we will check this case.  Have you checked if this case work with minimum degree algorithm? 

Dinesh_S_
Beginner
103 Views

yes, as stated in the link, the error appears only for METIS under mult-processor runs. The minimum degree algorithm is significantly slower compared to METIS

Gennady_F_Intel
Moderator
103 Views

ok, but when i convert from coo to csr format ans pass this input to pardiso ( with matrix checker - on), i see -1 message which mean - the input matrix inconsistency. Could then give us the reproducer you use?

Dinesh_S_
Beginner
103 Views

Gennady F. (Intel) wrote:

ok, but when i convert from coo to csr format ans pass this input to pardiso ( with matrix checker - on), i see -1 message which mean - the input matrix inconsistency. Could then give us the reproducer you use?

Hi

I do not have a converter. I in fact work on csr format, and wrote this matrix out in coo to test for any mistakes using matlab. I am attaching another case where I have the matrix in csr format. Hopefully that helps.

The first file has col index and the column values, and the other file has offset (but these files are quite simple since it is essentially a diagonal identity matrix in under some matrix permutation)

Dinesh

Dinesh_S_
Beginner
103 Views

It may not dependent on the matrix that specific to my problem. If you create any non-diagonal Identity matrix, and run with METIS it might fail.

Dinesh_S_
Beginner
103 Views

Gennady F. (Intel) wrote:

ok, but when i convert from coo to csr format ans pass this input to pardiso ( with matrix checker - on), i see -1 message which mean - the input matrix inconsistency. Could then give us the reproducer you use?

do you need anything else from what has been provided? 

Thanks

Dinesh

103 Views

We quickly checked you matrix on Linux machine and i doesn't see any issues there. Can i ask you to provide iparm set that you use for this test? An of course we will run this matrix on Win

=== PARDISO: solving a real nonsymmetric system ===
Matrix checker is turned ON
0-based array is turned ON
PARDISO double precision computation is turned ON
Parallel METIS algorithm at reorder step is turned ON
Matching is turned ON


Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.000067 s
Time spent in reordering of the initial matrix (reorder)         : 0.000003 s
Time spent in symbolic factorization (symbfct)                   : 0.013133 s
Time spent in data preparations for factorization (parlist)      : 0.000007 s
Time spent in allocation of internal data structures (malloc)    : 0.011820 s
Time spent in additional calculations                            : 0.005717 s
Total time spent                                                 : 0.030747 s

Statistics:
===========
Parallel Direct Factorization is running on 16 OpenMP

< Linear system Ax = b >
             number of equations:           1035
             number of non-zeros in A:      1035
             number of non-zeros in A (%): 0.096618

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 72
             number of independent subgraphs:  0
             number of supernodes:                    1035
             size of largest supernode:               1
             number of non-zeros in L:                1035
             number of non-zeros in U:                1
             number of non-zeros in L+U:              1036
time_reorder 0.0550621
=== PARDISO is running in In-Core mode, because iparam(60)=0 ===

Percentage of computed non-zeros for LL^T factorization
 1 %  4 %  5 %  7 %  8 %  9 %  10 %  11 %  13 %  14 %  15 %  16 %  17 %  19 %  20 %  21 %  22 %  23 %  25 %  26 %  27 %  28 %  29 %  30 %  32 %  33 %  34 %  35 %  36 %  38 %  39 %  40 %  41 %  42 %  43 %  59 %  60 %  62 %  63 %  64 %  65 %  67 %  68 %  69 %  71 %  72 %  73 %  74 %  75 %  77 %  78 %  79 %  80 %  82 %  83 %  84 %  85 %  86 %  88 %  89 %  90 %  91 %  93 %  94 %  95 %  96 %  98 %  100 %

=== PARDISO: solving a real nonsymmetric system ===
Single-level factorization algorithm is turned ON


Summary: ( factorization phase )
================

Times:
======
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct)                        : 0.028661 s
Time spent in allocation of internal data structures (malloc)    : 0.000029 s
Time spent in additional calculations                            : 0.000002 s
Total time spent                                                 : 0.028692 s

Statistics:
===========
Parallel Direct Factorization is running on 16 OpenMP

< Linear system Ax = b >
             number of equations:           1035
             number of non-zeros in A:      1035
             number of non-zeros in A (%): 0.096618

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 72
             number of independent subgraphs:  0
             number of supernodes:                    1035
             size of largest supernode:               1
             number of non-zeros in L:                1035
             number of non-zeros in U:                1
             number of non-zeros in L+U:              1036
             gflop   for the numerical factorization: 0.000000

             gflop/s for the numerical factorization: 0.000000


=== PARDISO: solving a real nonsymmetric system ===


Summary: ( solution phase )
================

Times:
======
Time spent in direct solver at solve step (solve)                : 0.005805 s
Time spent in additional calculations                            : 0.000017 s
Total time spent                                                 : 0.005822 s

Statistics:
===========
Parallel Direct Factorization is running on 16 OpenMP

< Linear system Ax = b >
             number of equations:           1035
             number of non-zeros in A:      1035
             number of non-zeros in A (%): 0.096618

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 72
             number of independent subgraphs:  0
             number of supernodes:                    1035
             size of largest supernode:               1
             number of non-zeros in L:                1035
             number of non-zeros in U:                1
             number of non-zeros in L+U:              1036
             gflop   for the numerical factorization: 0.000000

             gflop/s for the numerical factorization: 0.000000

0: 1013 10.00 1.00 1.00 1.00
Residual 0.000e+00
 

Dinesh_S_
Beginner
103 Views

StartThread_ = mkl_get_max_threads();
mkl_set_dynamic(0);
mkl_set_num_threads(SparseConfig_.OpenMpThreads());
 
maxfct = 1;                                            /* Maximum number of numerical factorizations. */
mnum = 1;                                             /* Which factorization to use. */
msglvl = 0;                                             /* Print statistical information in file */
error = 0; 
for (auto i = 0; i < 64; i++) iparm = 0;
 
 
iparm[0] = 1;                                         /* No solver default */
iparm[1] = SparseConfig_.PardisoRO();                                         /* 0: The minimum degree algorithm */
/* 2: The nested dissection algorithm from METIS package*/
/* Numbers of processors, value of OMP_NUM_THREADS */
iparm[2] = 0;
 
iparm[3] = 0;                                         /* No iterative-direct algorithm */
iparm[4] = 0;                                         /* No user fill-in reducing permutation */
iparm[5] = 0;                                         /* Write solution into x */
iparm[6] = 0;                                         /* Not in use */
iparm[7] = SparseConfig_.NumberOfIterativeRefinements();                                         /* Max numbers of iterative refinement steps */
iparm[8] = 0;                                         /* Not in use */
iparm[9] = SparseConfig_.PivotShift();                      /* Perturb the pivot elements with 1E-13 */
iparm[10] = 1;                      /* Use nonsymmetric permutation and scaling MPS */
iparm[11] = 0;                      /* Conjugate transposed/transpose solve */
iparm[12] = 1;                      /* Maximum weighted matching algorithm is switched-on (default for non-symmetric) */
iparm[13] = 0;                      /* Output: Number of perturbed pivots */
iparm[14] = 0;                      /* Not in use */
iparm[15] = 0;                      /* Not in use */
iparm[16] = 0;                      /* Not in use */
iparm[17] = -1;                     /* Output: Number of nonzeros in the factor LU */
iparm[18] = -1;                     /* Output: Mflops for LU factorization */
iparm[19] = 0;                      /* Output: Numbers of CG Iterations */
 
iparm[26] = 1;
 
iparm[34] = 1; /* zero based index */
for (auto i = 0; i < 64; i++) pt = 0;           
 
phase = 11;
mtype = 11;
nrhs = NRhs;
 
Note: I get the crash consistently on debug mode run on MS-VS2012
103 Views

Same for windows. Can you check that you set iparm[34] to 1 (zero based CSR matrix)?

Thanks,

Alex

Dinesh_S_
Beginner
103 Views

Yes it is ; 

iparm[34] = 1; /* zero based index */

are you running it under debug mode, VS2012x64? (Release mode, does not always catch this bug)

If you are setting is lite, you can share with me to test

Regards

Dinesh

Dinesh_S_
Beginner
103 Views

Hi,

Is there any resolution on this issue?

Regards

Dinesh

Dinesh_S_
Beginner
103 Views

Developers

Any updates on this issue?

Regards

Dinesh

Ying_H_Intel
Employee
103 Views

Hi Dinesh,

Could you please try the MKL 2018 update 1 version.  I build one small test case based on the SparseMatrix  you attached in   https://software.intel.com/en-us/node/742812.   It runs ok in MSVS 2017 with multi-threads.

I'm linking the below library:

Searching C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018.1.156\windows\mkl\lib\intel64_win\mkl_intel_lp64.lib:

1> Searching C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018.1.156\windows\mkl\lib\intel64_win\mkl_intel_thread.lib:

1> Searching C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018.1.156\windows\mkl\lib\intel64_win\mkl_core.lib:

1> Searching C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018.1.156\windows\compiler\lib\intel64_win\libiomp5md.lib:

MKL 2018, minor 0, update 1, version 20180001, build date 20171007

Best Regards,

Ying
non-zero iparm values:
iparm[0] = 1
iparm[1] = 2
iparm[7] = 2
iparm[9] = 13
iparm[10] = 1
iparm[12] = 1
iparm[17] = -1
iparm[18] = -1
iparm[26] = 1
iparm[34] = 1

=== PARDISO: solving a real nonsymmetric system ===
Matrix checker is turned ON
0-based array is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON


Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.000018 s
Time spent in reordering of the initial matrix (reorder)         : 0.000007 s
Time spent in symbolic factorization (symbfct)                   : 0.000588 s
Time spent in data preparations for factorization (parlist)      : 0.000002 s
Time spent in allocation of internal data structures (malloc)    : 0.002990 s
Time spent in additional calculations                            : 0.000125 s
Total time spent                                                 : 0.003729 s

Statistics:
===========
Parallel Direct Factorization is running on 4 OpenMP

< Linear system Ax = b >
             number of equations:           587
             number of non-zeros in A:      587
             number of non-zeros in A (%): 0.170358

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 128
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    587
             size of largest supernode:               1
             number of non-zeros in L:                587
             number of non-zeros in U:                1
             number of non-zeros in L+U:              588
=== PARDISO is running in In-Core mode, because iparam(60)=0 ===

Percentage of computed non-zeros for LL^T factorization
 1 %
 2 %
 3 %
 4 %
 5 %
 6 %
 7 %
 8 %
 9 %
 10 %
 11 %
 12 %
 13 %
 14 %
 15 %
 16 %
 17 %
 18 %
 19 %
 20 %
 21 %
 22 %
 23 %
 24 %
 25 %
 26 %
 27 %
 28 %
 29 %
 30 %
 31 %
 32 %
 33 %
 34 %
 35 %
 36 %
 37 %
 38 %
 39 %
 40 %
 41 %
 42 %
 43 %
 44 %
 45 %
 46 %
 47 %
 48 %
 49 %
 50 %
 51 %
 52 %
 53 %
 54 %
 55 %
 56 %
 57 %
 58 %
 59 %
 60 %
 61 %
 62 %
 63 %
 64 %
 65 %
 66 %
 67 %
 68 %
 69 %
 70 %
 71 %
 72 %
 73 %
 74 %
 75 %
 76 %
 77 %
 78 %
 79 %
 80 %
 81 %
 82 %
 83 %
 84 %
 85 %
 86 %
 87 %
 88 %
 89 %
 90 %
 91 %
 92 %
 93 %
 94 %
 95 %
 96 %
 97 %
 98 %
 99 %
 100 %


=== PARDISO: solving a real nonsymmetric system ===
Single-level factorization algorithm is turned ON


Summary: ( factorization phase )
================

Times:
======
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct)                        : 0.034930 s
Time spent in allocation of internal data structures (malloc)    : 0.001086 s
Time spent in additional calculations                            : 0.000005 s
Total time spent                                                 : 0.036021 s

Statistics:
===========
Parallel Direct Factorization is running on 4 OpenMP

< Linear system Ax = b >
             number of equations:           587
             number of non-zeros in A:      587
             number of non-zeros in A (%): 0.170358

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 128
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    587
             size of largest supernode:               1
             number of non-zeros in L:                587
             number of non-zeros in U:                1
             number of non-zeros in L+U:              588
             gflop   for the numerical factorization: 0.000000

             gflop/s for the numerical factorization: 0.000000


=== PARDISO: solving a real nonsymmetric system ===


Summary: ( solution phase )
================

Times:
======
Time spent in direct solver at solve step (solve)                : 0.000082 s
Time spent in additional calculations                            : 0.000833 s
Total time spent                                                 : 0.000915 s

Statistics:
===========
Parallel Direct Factorization is running on 4 OpenMP

< Linear system Ax = b >
             number of equations:           587
             number of non-zeros in A:      587
             number of non-zeros in A (%): 0.170358

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 128
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    587
             size of largest supernode:               1
             number of non-zeros in L:                587
             number of non-zeros in U:                1
             number of non-zeros in L+U:              588
             gflop   for the numerical factorization: 0.000000

             gflop/s for the numerical factorization: 0.000000


Input and solution norms:
||A|| = 24.2281
||b|| = 24.2281
||x|| = 24.2281
||Ax-b|| = 0


Press any key to continue . . .

Dinesh_S_
Beginner
103 Views

Hello Ying H,

I have already tried 2018 and 2017 Update 4 as stated in the threads above. The case always fails when tested under debug mode (With release mode it becomes bit rare phenomenon)

Regards

Dinesh

Reply