Numerical factorization stage seems to break with multi processors run for sparse Identity matrices for METIS or parallel METIS
The details are given here
https://software.intel.com/en-us/node/742812
Regards
Dinesh
Link Copied
Have you checked the problem with MKL 2017 u4 or 2018?
what are the differences between MKL 2017 u4 or 2018 as for as PARDISO is concerned?
fails with 2017 u4
and fails with 2018 too
Gennady F. (Intel) wrote:
Have you checked the problem with MKL 2017 u4 or 2018?
Hi, fails under both updates.. Any insight would be appreciated
ok, thanks, we will check this case. Have you checked if this case work with minimum degree algorithm?
yes, as stated in the link, the error appears only for METIS under mult-processor runs. The minimum degree algorithm is significantly slower compared to METIS
ok, but when i convert from coo to csr format ans pass this input to pardiso ( with matrix checker - on), i see -1 message which mean - the input matrix inconsistency. Could then give us the reproducer you use?
Gennady F. (Intel) wrote:
ok, but when i convert from coo to csr format ans pass this input to pardiso ( with matrix checker - on), i see -1 message which mean - the input matrix inconsistency. Could then give us the reproducer you use?
Hi
I do not have a converter. I in fact work on csr format, and wrote this matrix out in coo to test for any mistakes using matlab. I am attaching another case where I have the matrix in csr format. Hopefully that helps.
The first file has col index and the column values, and the other file has offset (but these files are quite simple since it is essentially a diagonal identity matrix in under some matrix permutation)
Dinesh
It may not dependent on the matrix that specific to my problem. If you create any non-diagonal Identity matrix, and run with METIS it might fail.
Gennady F. (Intel) wrote:
ok, but when i convert from coo to csr format ans pass this input to pardiso ( with matrix checker - on), i see -1 message which mean - the input matrix inconsistency. Could then give us the reproducer you use?
do you need anything else from what has been provided?
Thanks
Dinesh
We quickly checked you matrix on Linux machine and i doesn't see any issues there. Can i ask you to provide iparm set that you use for this test? An of course we will run this matrix on Win
=== PARDISO: solving a real nonsymmetric system ===
Matrix checker is turned ON
0-based array is turned ON
PARDISO double precision computation is turned ON
Parallel METIS algorithm at reorder step is turned ON
Matching is turned ON
Summary: ( reordering phase )
================
Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.000067 s
Time spent in reordering of the initial matrix (reorder) : 0.000003 s
Time spent in symbolic factorization (symbfct) : 0.013133 s
Time spent in data preparations for factorization (parlist) : 0.000007 s
Time spent in allocation of internal data structures (malloc) : 0.011820 s
Time spent in additional calculations : 0.005717 s
Total time spent : 0.030747 s
Statistics:
===========
Parallel Direct Factorization is running on 16 OpenMP
< Linear system Ax = b >
number of equations: 1035
number of non-zeros in A: 1035
number of non-zeros in A (%): 0.096618
number of right-hand sides: 1
< Factors L and U >
number of columns for each panel: 72
number of independent subgraphs: 0
number of supernodes: 1035
size of largest supernode: 1
number of non-zeros in L: 1035
number of non-zeros in U: 1
number of non-zeros in L+U: 1036
time_reorder 0.0550621
=== PARDISO is running in In-Core mode, because iparam(60)=0 ===
Percentage of computed non-zeros for LL^T factorization
1 % 4 % 5 % 7 % 8 % 9 % 10 % 11 % 13 % 14 % 15 % 16 % 17 % 19 % 20 % 21 % 22 % 23 % 25 % 26 % 27 % 28 % 29 % 30 % 32 % 33 % 34 % 35 % 36 % 38 % 39 % 40 % 41 % 42 % 43 % 59 % 60 % 62 % 63 % 64 % 65 % 67 % 68 % 69 % 71 % 72 % 73 % 74 % 75 % 77 % 78 % 79 % 80 % 82 % 83 % 84 % 85 % 86 % 88 % 89 % 90 % 91 % 93 % 94 % 95 % 96 % 98 % 100 %
=== PARDISO: solving a real nonsymmetric system ===
Single-level factorization algorithm is turned ON
Summary: ( factorization phase )
================
Times:
======
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct) : 0.028661 s
Time spent in allocation of internal data structures (malloc) : 0.000029 s
Time spent in additional calculations : 0.000002 s
Total time spent : 0.028692 s
Statistics:
===========
Parallel Direct Factorization is running on 16 OpenMP
< Linear system Ax = b >
number of equations: 1035
number of non-zeros in A: 1035
number of non-zeros in A (%): 0.096618
number of right-hand sides: 1
< Factors L and U >
number of columns for each panel: 72
number of independent subgraphs: 0
number of supernodes: 1035
size of largest supernode: 1
number of non-zeros in L: 1035
number of non-zeros in U: 1
number of non-zeros in L+U: 1036
gflop for the numerical factorization: 0.000000
gflop/s for the numerical factorization: 0.000000
=== PARDISO: solving a real nonsymmetric system ===
Summary: ( solution phase )
================
Times:
======
Time spent in direct solver at solve step (solve) : 0.005805 s
Time spent in additional calculations : 0.000017 s
Total time spent : 0.005822 s
Statistics:
===========
Parallel Direct Factorization is running on 16 OpenMP
< Linear system Ax = b >
number of equations: 1035
number of non-zeros in A: 1035
number of non-zeros in A (%): 0.096618
number of right-hand sides: 1
< Factors L and U >
number of columns for each panel: 72
number of independent subgraphs: 0
number of supernodes: 1035
size of largest supernode: 1
number of non-zeros in L: 1035
number of non-zeros in U: 1
number of non-zeros in L+U: 1036
gflop for the numerical factorization: 0.000000
gflop/s for the numerical factorization: 0.000000
0: 1013 10.00 1.00 1.00 1.00
Residual 0.000e+00
Same for windows. Can you check that you set iparm[34] to 1 (zero based CSR matrix)?
Thanks,
Alex
Yes it is ;
iparm[34] = 1; /* zero based index */
are you running it under debug mode, VS2012x64? (Release mode, does not always catch this bug)
If you are setting is lite, you can share with me to test
Regards
Dinesh
Hi,
Is there any resolution on this issue?
Regards
Dinesh
Developers
Any updates on this issue?
Regards
Dinesh
Hi Dinesh,
Could you please try the MKL 2018 update 1 version. I build one small test case based on the SparseMatrix you attached in https://software.intel.com/en-us/node/742812. It runs ok in MSVS 2017 with multi-threads.
I'm linking the below library:
Searching C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018.1.156\windows\mkl\lib\intel64_win\mkl_intel_lp64.lib:
1> Searching C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018.1.156\windows\mkl\lib\intel64_win\mkl_intel_thread.lib:
1> Searching C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018.1.156\windows\mkl\lib\intel64_win\mkl_core.lib:
1> Searching C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018.1.156\windows\compiler\lib\intel64_win\libiomp5md.lib:
MKL 2018, minor 0, update 1, version 20180001, build date 20171007
Best Regards,
Ying
non-zero iparm values:
iparm[0] = 1
iparm[1] = 2
iparm[7] = 2
iparm[9] = 13
iparm[10] = 1
iparm[12] = 1
iparm[17] = -1
iparm[18] = -1
iparm[26] = 1
iparm[34] = 1
=== PARDISO: solving a real nonsymmetric system ===
Matrix checker is turned ON
0-based array is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON
Summary: ( reordering phase )
================
Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.000018 s
Time spent in reordering of the initial matrix (reorder) : 0.000007 s
Time spent in symbolic factorization (symbfct) : 0.000588 s
Time spent in data preparations for factorization (parlist) : 0.000002 s
Time spent in allocation of internal data structures (malloc) : 0.002990 s
Time spent in additional calculations : 0.000125 s
Total time spent : 0.003729 s
Statistics:
===========
Parallel Direct Factorization is running on 4 OpenMP
< Linear system Ax = b >
number of equations: 587
number of non-zeros in A: 587
number of non-zeros in A (%): 0.170358
number of right-hand sides: 1
< Factors L and U >
number of columns for each panel: 128
number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
number of supernodes: 587
size of largest supernode: 1
number of non-zeros in L: 587
number of non-zeros in U: 1
number of non-zeros in L+U: 588
=== PARDISO is running in In-Core mode, because iparam(60)=0 ===
Percentage of computed non-zeros for LL^T factorization
1 %
2 %
3 %
4 %
5 %
6 %
7 %
8 %
9 %
10 %
11 %
12 %
13 %
14 %
15 %
16 %
17 %
18 %
19 %
20 %
21 %
22 %
23 %
24 %
25 %
26 %
27 %
28 %
29 %
30 %
31 %
32 %
33 %
34 %
35 %
36 %
37 %
38 %
39 %
40 %
41 %
42 %
43 %
44 %
45 %
46 %
47 %
48 %
49 %
50 %
51 %
52 %
53 %
54 %
55 %
56 %
57 %
58 %
59 %
60 %
61 %
62 %
63 %
64 %
65 %
66 %
67 %
68 %
69 %
70 %
71 %
72 %
73 %
74 %
75 %
76 %
77 %
78 %
79 %
80 %
81 %
82 %
83 %
84 %
85 %
86 %
87 %
88 %
89 %
90 %
91 %
92 %
93 %
94 %
95 %
96 %
97 %
98 %
99 %
100 %
=== PARDISO: solving a real nonsymmetric system ===
Single-level factorization algorithm is turned ON
Summary: ( factorization phase )
================
Times:
======
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct) : 0.034930 s
Time spent in allocation of internal data structures (malloc) : 0.001086 s
Time spent in additional calculations : 0.000005 s
Total time spent : 0.036021 s
Statistics:
===========
Parallel Direct Factorization is running on 4 OpenMP
< Linear system Ax = b >
number of equations: 587
number of non-zeros in A: 587
number of non-zeros in A (%): 0.170358
number of right-hand sides: 1
< Factors L and U >
number of columns for each panel: 128
number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
number of supernodes: 587
size of largest supernode: 1
number of non-zeros in L: 587
number of non-zeros in U: 1
number of non-zeros in L+U: 588
gflop for the numerical factorization: 0.000000
gflop/s for the numerical factorization: 0.000000
=== PARDISO: solving a real nonsymmetric system ===
Summary: ( solution phase )
================
Times:
======
Time spent in direct solver at solve step (solve) : 0.000082 s
Time spent in additional calculations : 0.000833 s
Total time spent : 0.000915 s
Statistics:
===========
Parallel Direct Factorization is running on 4 OpenMP
< Linear system Ax = b >
number of equations: 587
number of non-zeros in A: 587
number of non-zeros in A (%): 0.170358
number of right-hand sides: 1
< Factors L and U >
number of columns for each panel: 128
number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
number of supernodes: 587
size of largest supernode: 1
number of non-zeros in L: 587
number of non-zeros in U: 1
number of non-zeros in L+U: 588
gflop for the numerical factorization: 0.000000
gflop/s for the numerical factorization: 0.000000
Input and solution norms:
||A|| = 24.2281
||b|| = 24.2281
||x|| = 24.2281
||Ax-b|| = 0
Press any key to continue . . .
Hello Ying H,
I have already tried 2018 and 2017 Update 4 as stated in the threads above. The case always fails when tested under debug mode (With release mode it becomes bit rare phenomenon)
Regards
Dinesh
For more complete information about compiler optimizations, see our Optimization Notice.