PARDISO consistent crash for INCORE RUN

Kostas_S_ · ‎05-22-2015

Hi, I am trying to solve several big 3d solid FE models with PARDISO 11.2

Although the out-of-core run is successful I am consistently getting segmentation fault errors for the in core runs.

This also happens with pardiso_64 and cpardiso when only 1 mpi process is used

With more than 1 mpi processes the run is successful.

The error is reproducible and occurs for almost all big models which I have tried.

Thanks

Kostas

Gennady_F_Intel · ‎05-23-2015

Hi, Have you checked the error returned by Pardiso? may be this is == -2 ( not enough memory)?

Kostas_S_ · ‎05-24-2015

Unfortuately, no error is returned neither from symbolic phase or factotization phase. It just crashes inside the factorization phase at about 1 or 2 %. Sometimes the crash is followed by a message corrupted double linked list or double free malloc. Of course the required memory is huge (60 or 70 GB) but the machine has plenty.

Gennady_F_Intel · ‎05-25-2015

Kostas, Could you set msglvl == 1 and give us statistical information you will receive?

Kostas_S_ · ‎05-25-2015

Hi. Please find below reported statistics for two crashing models.

3D solid model (knuckle) 600K 10-node TETRA

=== PARDISO: solving a symmetric indefinite system ===
1-based array indexing is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON

Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 2.234720 s
Time spent in reordering of the initial matrix (reorder)         : 0.014349 s
Time spent in symbolic factorization (symbfct)                   : 5.550181 s
Time spent in data preparations for factorization (parlist)      : 0.195583 s
Time spent in allocation of internal data structures (malloc)    : 40.220556 s
Time spent in additional calculations                            : 7.978722 s
Total time spent                                                 : 56.194111 s

Statistics:
===========
Parallel Direct Factorization is running on 8 OpenMP

< Linear system Ax = b >
             number of equations:           2798127
             number of non-zeros in A:      117286720
             number of non-zeros in A (%): 0.001498

number of right-hand sides: 1

< Factors L and U >
             number of columns for each panel: 96
             number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    330907
             size of largest supernode:               26616
             number of non-zeros in L:                6990935225
             number of non-zeros in U:                1
             number of non-zeros in L+U:              6990935226

*** INFORMATION # 3438
PARDISO solver requires 65123 MB for selected incore execution.
=== PARDISO is running in In-Core mode, because iparam(60)=0 ===
Percentage of computed non-zeros for LL^T factorization
0 Signal 11 :: SIGSEGV

mixed solid-shell model (carbody) 1.7M 8-node QUADS, 300K 10-node TETRA

=== PARDISO: solving a symmetric indefinite system ===
1-based array indexing is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON

Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 2.548694 s
Time spent in reordering of the initial matrix (reorder)         : 0.030692 s
Time spent in symbolic factorization (symbfct)                   : 9.595601 s
Time spent in data preparations for factorization (parlist)      : 0.240214 s
Time spent in allocation of internal data structures (malloc)    : 74.102451 s
Time spent in additional calculations                            : 15.991493 s
Total time spent                                                 : 102.509145 s

Statistics:
===========
Parallel Direct Factorization is running on 8 OpenMP

< Linear system Ax = b >
             number of equations:           11495787
             number of non-zeros in A:      311214695
             number of non-zeros in A (%): 0.000235

number of right-hand sides: 1

< Factors L and U >
             number of columns for each panel: 96
             number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    1278886
             size of largest supernode:               10939
             number of non-zeros in L:                3733796105
             number of non-zeros in U:                1
             number of non-zeros in L+U:              3733796106

*** INFORMATION # 3438
PARDISO solver requires 47351 MB for selected incore execution.
=== PARDISO is running in In-Core mode, because iparam(60)=0 ===
Percentage of computed non-zeros for LL^T factorization
0 1 Signal 11 :: SIGSEGV
Signal 11 :: SIGSEGV

Gennady_F_Intel · ‎05-25-2015

Ok, thanks.

Is that Linux or Windows OS?

What MKL version you are using?

What is the available size of RAM on your system?

Kostas_S_ · ‎05-25-2015

Hi

This is Linux OS. The MKL version is 11.2, but I am not sure which update because I wasn't the one who installed it.

The install directory is /opt/intel/composer_xe_2015.2.164

The available ram on the system is 192GB but probably about 130GB were available at the time of the runs.

Thanks

Gennady_F_Intel · ‎05-25-2015

Ok, we will try to emulate the behavior on our side and will back soon.

Gennady_F_Intel · ‎05-28-2015

Hello,

We checked how such type of task work on our side. We didn't see the problem with in-core version while solving the 8*10^6 symmetric indefinite system . This case requires ~ 120 GB ( 15*10^9 of non-zeros in L+U ). The task finished successfully. see below the log we received. The MKL version we have used is 11.3 beta ( the latest version which we are working on ). For your cases, we need to have your the matrix and reproducer this problem. All of this stuff, you can give us via private thread.

--Gennady

====================== below is the statistical info from our side ==============

=== PARDISO: solving a symmetric indefinite system ===
1-based array indexing is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON

Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.389905 s
Time spent in reordering of the initial matrix (reorder) : 84.442869 s
Time spent in symbolic factorization (symbfct) : 51.253236 s
Time spent in data preparations for factorization (parlist) : 1.419931 s
Time spent in allocation of internal data structures (malloc) : 3.260579 s
Time spent in additional calculations : 4.545302 s
Total time spent : 145.311822 s

Statistics:
===========
Parallel Direct Factorization is running on 40 OpenMP

< Linear system Ax = b >
number of equations: 8000000
number of non-zeros in A: 31880000
number of non-zeros in A (%): 0.000050

number of right-hand sides: 1

< Factors L and U >
number of columns for each panel: 96
number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
number of supernodes: 5219408
size of largest supernode: 40119
number of non-zeros in L: 14765464233
number of non-zeros in U: 1
number of non-zeros in L+U: 14765464234
time(reorder)= 145.405552864075
=== PARDISO is running in In-Core mode, because iparam(60)=0 ===
Percentage of computed non-zeros for LL^T factorization
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 44 45 46 47 48 49 50 51 52 53 54 55 56 58 59 60 61 62 63 64 65 66 67 68 70 71 72 73 75 76 77 78 79 80 81 83 84 86 87 88 89 90 92 93 94 95 96 97 98 99 100

=== PARDISO: solving a symmetric indefinite system ===
Single-level factorization algorithm is turned ON

Summary: ( factorization phase )
================

Times:
======
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct) : 2579.202247 s
Time spent in allocation of internal data structures (malloc) : 0.000056 s
Time spent in additional calculations : 0.000002 s
Total time spent : 2579.202305 s

Statistics:
===========
Parallel Direct Factorization is running on 40 OpenMP

< Linear system Ax = b >
number of equations: 8000000
number of non-zeros in A: 31880000
number of non-zeros in A (%): 0.000050

number of right-hand sides: 1

< Factors L and U >
number of columns for each panel: 96
number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
number of supernodes: 5219408
size of largest supernode: 40119
number of non-zeros in L: 14765464233
number of non-zeros in U: 1
number of non-zeros in L+U: 14765464234
gflop for the numerical factorization: 387596.095197

gflop/s for the numerical factorization: 150.277511

time(factor)= 2579.20247197151

=== PARDISO: solving a symmetric indefinite system ===

Summary: ( solution phase )
================

Times:
======
Time spent in direct solver at solve step (solve) : 24.682167 s
Time spent in additional calculations : 51.274367 s
Total time spent : 75.956534 s

Statistics:
===========
Parallel Direct Factorization is running on 40 OpenMP

< Linear system Ax = b >
number of equations: 8000000
number of non-zeros in A: 31880000
number of non-zeros in A (%): 0.000050

number of right-hand sides: 1

< Factors L and U >
number of columns for each panel: 96
number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
number of supernodes: 5219408
size of largest supernode: 40119
number of non-zeros in L: 14765464233
number of non-zeros in U: 1
number of non-zeros in L+U: 14765464234
gflop for the numerical factorization: 387596.095197

gflop/s for the numerical factorization: 150.277511

time(solve)= 75.9567329883575
200 200 200 2800.56483387947
csr norm of residual 4.719264874719631E-016

Kostas_S_ · ‎05-28-2015

Thank you for looking into this.

Indeed, I have also performed successful incore runs with huge models where the requested memory was more than 110GB.

But I get this segmentation fault with many models at more or less the same point of the factorization phase.

Do you think that may be this is fixed in version 11.3 beta because this is not the version that I am using.

I could dump the matrix and rhs to a file, but I assume it will be many GB large so transfer would take quite some time.

If that is ok with you, I can prepare the data and you can give me details where I can upload them.

Gennady_F_Intel · ‎05-28-2015

a couple of issues with in-core version of Pardiso have been fixed in 11.3 beta and you can try to check the problem with this version. How to take this version please refer to this page at the Top of the this forum: https://software.intel.com/en-us/forums/topic/549590

If the problem would still exist with 11.3 beta, then you can provide us this matrix as smallest as possible for reproducing the issue. You may use Intel(R) Premier Support channel to submit this issue and upload there this matrix.

--Gennady

Kostas_S_ · ‎05-28-2015

Thanks

I will try this asap and act accordingly