- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Numerical factorization stage seems to break with multi processors run for sparse Identity matrices for METIS or parallel METIS
The details are given here
https://software.intel.com/en-us/node/742812
Regards
Dinesh
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you checked the problem with MKL 2017 u4 or 2018?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
what are the differences between MKL 2017 u4 or 2018 as for as PARDISO is concerned?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
fails with 2017 u4
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
and fails with 2018 too
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Gennady F. (Intel) wrote:
Have you checked the problem with MKL 2017 u4 or 2018?
Hi, fails under both updates.. Any insight would be appreciated
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ok, thanks, we will check this case. Have you checked if this case work with minimum degree algorithm?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yes, as stated in the link, the error appears only for METIS under mult-processor runs. The minimum degree algorithm is significantly slower compared to METIS
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ok, but when i convert from coo to csr format ans pass this input to pardiso ( with matrix checker - on), i see -1 message which mean - the input matrix inconsistency. Could then give us the reproducer you use?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Gennady F. (Intel) wrote:
ok, but when i convert from coo to csr format ans pass this input to pardiso ( with matrix checker - on), i see -1 message which mean - the input matrix inconsistency. Could then give us the reproducer you use?
Hi
I do not have a converter. I in fact work on csr format, and wrote this matrix out in coo to test for any mistakes using matlab. I am attaching another case where I have the matrix in csr format. Hopefully that helps.
The first file has col index and the column values, and the other file has offset (but these files are quite simple since it is essentially a diagonal identity matrix in under some matrix permutation)
Dinesh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It may not dependent on the matrix that specific to my problem. If you create any non-diagonal Identity matrix, and run with METIS it might fail.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Gennady F. (Intel) wrote:
ok, but when i convert from coo to csr format ans pass this input to pardiso ( with matrix checker - on), i see -1 message which mean - the input matrix inconsistency. Could then give us the reproducer you use?
do you need anything else from what has been provided?
Thanks
Dinesh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We quickly checked you matrix on Linux machine and i doesn't see any issues there. Can i ask you to provide iparm set that you use for this test? An of course we will run this matrix on Win
=== PARDISO: solving a real nonsymmetric system ===
Matrix checker is turned ON
0-based array is turned ON
PARDISO double precision computation is turned ON
Parallel METIS algorithm at reorder step is turned ON
Matching is turned ON
Summary: ( reordering phase )
================
Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.000067 s
Time spent in reordering of the initial matrix (reorder) : 0.000003 s
Time spent in symbolic factorization (symbfct) : 0.013133 s
Time spent in data preparations for factorization (parlist) : 0.000007 s
Time spent in allocation of internal data structures (malloc) : 0.011820 s
Time spent in additional calculations : 0.005717 s
Total time spent : 0.030747 s
Statistics:
===========
Parallel Direct Factorization is running on 16 OpenMP
< Linear system Ax = b >
number of equations: 1035
number of non-zeros in A: 1035
number of non-zeros in A (%): 0.096618
number of right-hand sides: 1
< Factors L and U >
number of columns for each panel: 72
number of independent subgraphs: 0
number of supernodes: 1035
size of largest supernode: 1
number of non-zeros in L: 1035
number of non-zeros in U: 1
number of non-zeros in L+U: 1036
time_reorder 0.0550621
=== PARDISO is running in In-Core mode, because iparam(60)=0 ===
Percentage of computed non-zeros for LL^T factorization
1 % 4 % 5 % 7 % 8 % 9 % 10 % 11 % 13 % 14 % 15 % 16 % 17 % 19 % 20 % 21 % 22 % 23 % 25 % 26 % 27 % 28 % 29 % 30 % 32 % 33 % 34 % 35 % 36 % 38 % 39 % 40 % 41 % 42 % 43 % 59 % 60 % 62 % 63 % 64 % 65 % 67 % 68 % 69 % 71 % 72 % 73 % 74 % 75 % 77 % 78 % 79 % 80 % 82 % 83 % 84 % 85 % 86 % 88 % 89 % 90 % 91 % 93 % 94 % 95 % 96 % 98 % 100 %
=== PARDISO: solving a real nonsymmetric system ===
Single-level factorization algorithm is turned ON
Summary: ( factorization phase )
================
Times:
======
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct) : 0.028661 s
Time spent in allocation of internal data structures (malloc) : 0.000029 s
Time spent in additional calculations : 0.000002 s
Total time spent : 0.028692 s
Statistics:
===========
Parallel Direct Factorization is running on 16 OpenMP
< Linear system Ax = b >
number of equations: 1035
number of non-zeros in A: 1035
number of non-zeros in A (%): 0.096618
number of right-hand sides: 1
< Factors L and U >
number of columns for each panel: 72
number of independent subgraphs: 0
number of supernodes: 1035
size of largest supernode: 1
number of non-zeros in L: 1035
number of non-zeros in U: 1
number of non-zeros in L+U: 1036
gflop for the numerical factorization: 0.000000
gflop/s for the numerical factorization: 0.000000
=== PARDISO: solving a real nonsymmetric system ===
Summary: ( solution phase )
================
Times:
======
Time spent in direct solver at solve step (solve) : 0.005805 s
Time spent in additional calculations : 0.000017 s
Total time spent : 0.005822 s
Statistics:
===========
Parallel Direct Factorization is running on 16 OpenMP
< Linear system Ax = b >
number of equations: 1035
number of non-zeros in A: 1035
number of non-zeros in A (%): 0.096618
number of right-hand sides: 1
< Factors L and U >
number of columns for each panel: 72
number of independent subgraphs: 0
number of supernodes: 1035
size of largest supernode: 1
number of non-zeros in L: 1035
number of non-zeros in U: 1
number of non-zeros in L+U: 1036
gflop for the numerical factorization: 0.000000
gflop/s for the numerical factorization: 0.000000
0: 1013 10.00 1.00 1.00 1.00
Residual 0.000e+00
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Same for windows. Can you check that you set iparm[34] to 1 (zero based CSR matrix)?
Thanks,
Alex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes it is ;
iparm[34] = 1; /* zero based index */
are you running it under debug mode, VS2012x64? (Release mode, does not always catch this bug)
If you are setting is lite, you can share with me to test
Regards
Dinesh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Is there any resolution on this issue?
Regards
Dinesh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Developers
Any updates on this issue?
Regards
Dinesh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dinesh,
Could you please try the MKL 2018 update 1 version. I build one small test case based on the SparseMatrix you attached in https://software.intel.com/en-us/node/742812. It runs ok in MSVS 2017 with multi-threads.
I'm linking the below library:
Searching C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018.1.156\windows\mkl\lib\intel64_win\mkl_intel_lp64.lib:
1> Searching C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018.1.156\windows\mkl\lib\intel64_win\mkl_intel_thread.lib:
1> Searching C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018.1.156\windows\mkl\lib\intel64_win\mkl_core.lib:
1> Searching C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018.1.156\windows\compiler\lib\intel64_win\libiomp5md.lib:
MKL 2018, minor 0, update 1, version 20180001, build date 20171007
Best Regards,
Ying
non-zero iparm values:
iparm[0] = 1
iparm[1] = 2
iparm[7] = 2
iparm[9] = 13
iparm[10] = 1
iparm[12] = 1
iparm[17] = -1
iparm[18] = -1
iparm[26] = 1
iparm[34] = 1
=== PARDISO: solving a real nonsymmetric system ===
Matrix checker is turned ON
0-based array is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON
Summary: ( reordering phase )
================
Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.000018 s
Time spent in reordering of the initial matrix (reorder) : 0.000007 s
Time spent in symbolic factorization (symbfct) : 0.000588 s
Time spent in data preparations for factorization (parlist) : 0.000002 s
Time spent in allocation of internal data structures (malloc) : 0.002990 s
Time spent in additional calculations : 0.000125 s
Total time spent : 0.003729 s
Statistics:
===========
Parallel Direct Factorization is running on 4 OpenMP
< Linear system Ax = b >
number of equations: 587
number of non-zeros in A: 587
number of non-zeros in A (%): 0.170358
number of right-hand sides: 1
< Factors L and U >
number of columns for each panel: 128
number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
number of supernodes: 587
size of largest supernode: 1
number of non-zeros in L: 587
number of non-zeros in U: 1
number of non-zeros in L+U: 588
=== PARDISO is running in In-Core mode, because iparam(60)=0 ===
Percentage of computed non-zeros for LL^T factorization
1 %
2 %
3 %
4 %
5 %
6 %
7 %
8 %
9 %
10 %
11 %
12 %
13 %
14 %
15 %
16 %
17 %
18 %
19 %
20 %
21 %
22 %
23 %
24 %
25 %
26 %
27 %
28 %
29 %
30 %
31 %
32 %
33 %
34 %
35 %
36 %
37 %
38 %
39 %
40 %
41 %
42 %
43 %
44 %
45 %
46 %
47 %
48 %
49 %
50 %
51 %
52 %
53 %
54 %
55 %
56 %
57 %
58 %
59 %
60 %
61 %
62 %
63 %
64 %
65 %
66 %
67 %
68 %
69 %
70 %
71 %
72 %
73 %
74 %
75 %
76 %
77 %
78 %
79 %
80 %
81 %
82 %
83 %
84 %
85 %
86 %
87 %
88 %
89 %
90 %
91 %
92 %
93 %
94 %
95 %
96 %
97 %
98 %
99 %
100 %
=== PARDISO: solving a real nonsymmetric system ===
Single-level factorization algorithm is turned ON
Summary: ( factorization phase )
================
Times:
======
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct) : 0.034930 s
Time spent in allocation of internal data structures (malloc) : 0.001086 s
Time spent in additional calculations : 0.000005 s
Total time spent : 0.036021 s
Statistics:
===========
Parallel Direct Factorization is running on 4 OpenMP
< Linear system Ax = b >
number of equations: 587
number of non-zeros in A: 587
number of non-zeros in A (%): 0.170358
number of right-hand sides: 1
< Factors L and U >
number of columns for each panel: 128
number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
number of supernodes: 587
size of largest supernode: 1
number of non-zeros in L: 587
number of non-zeros in U: 1
number of non-zeros in L+U: 588
gflop for the numerical factorization: 0.000000
gflop/s for the numerical factorization: 0.000000
=== PARDISO: solving a real nonsymmetric system ===
Summary: ( solution phase )
================
Times:
======
Time spent in direct solver at solve step (solve) : 0.000082 s
Time spent in additional calculations : 0.000833 s
Total time spent : 0.000915 s
Statistics:
===========
Parallel Direct Factorization is running on 4 OpenMP
< Linear system Ax = b >
number of equations: 587
number of non-zeros in A: 587
number of non-zeros in A (%): 0.170358
number of right-hand sides: 1
< Factors L and U >
number of columns for each panel: 128
number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
number of supernodes: 587
size of largest supernode: 1
number of non-zeros in L: 587
number of non-zeros in U: 1
number of non-zeros in L+U: 588
gflop for the numerical factorization: 0.000000
gflop/s for the numerical factorization: 0.000000
Input and solution norms:
||A|| = 24.2281
||b|| = 24.2281
||x|| = 24.2281
||Ax-b|| = 0
Press any key to continue . . .
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Ying H,
I have already tried 2018 and 2017 Update 4 as stated in the threads above. The case always fails when tested under debug mode (With release mode it becomes bit rare phenomenon)
Regards
Dinesh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dinesh,
I means the latest version MKL 2018 update 1 version (not 2018 and 2017 update 4) . i seems be able to see the crash with early version.
Best Regards,
Ying
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page