Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Koshkarev_A_
Beginner
119 Views

PARDISO segmentation fault

idbc wrote after 80% of LL' factorization:

Program received signal SIGSEGV
mkl_blas_mc_sgem2vu_odd () in /mnt/storage/opt/intel/composer_xe_2013_sp1.0.080/mkl/lib/intel64/libmkl_mc.so

in the attachment there is matrix with the program and makefile to reproduce this fault.

Matrix is CSR 3-array-variation 1-based (Upper triangle part of hermitian matrix) with about 22 000 000 nonzeros and 64000x64000 size

The same program with smaller size worked, max size tested 17280x17280.

The program executed on the: MACHTYPE=x86_64-suse-linux; HP DL580 G5 with 4x Intel Xeon 7350

0 Kudos
10 Replies
Koshkarev_A_
Beginner
119 Views

edit: 640000x640000 matrix size

Gennady_F_Intel
Moderator
119 Views

thanks for the issue. we will check it on our side.

Koshkarev_A_
Beginner
119 Views

found that these parameters runs ok (mtype = -4):

    iparm[0]  = 1;            // No solver default
    iparm[1]  = 2;            // Fill-in reordering from METIS
    // Numbers of processors, value of OMP_NUM_THREADS
    iparm[2]  = 0;
    iparm[3]  = 0;            // No iterative-direct algorithm
    iparm[4]  = 0;            // No user fill-in reducing permutation
    iparm[5]  = 0;            // Write solution into x
    iparm[6]  = 0;            // Not in use
    iparm[7]  = 2;            // Max numbers of iterative refinement steps
    iparm[8]  = 0;            // Not in use
    iparm[9]  = 8;            // Perturb the pivot elements with 1E-8 (default for symmetric indefinite)
    iparm[10] = 0;            // disable scaling (default for symmetric indefinite)
    iparm[11] = 0;            // Conjugate transposed/transpose solve == non-transposed
    iparm[12] = 0;            // Maximum weighted matching algorithm is switched-off (default for symmetric indefinite)
    iparm[13] = 0;            // Output: Number of perturbed pivots
    iparm[14] = 0;            // Not in use
    iparm[15] = 0;            // Not in use
    iparm[16] = 0;            // Not in use
    iparm[17] = -1;            // Output: Number of nonzeros in the factor LU
    iparm[18] = -1;            // Output: Mflops for LU factorization
    iparm[19] = 0;            // Output: Numbers of CG Iterations
//user defined:
    iparm[20] = 1;            
    iparm[23] = 0;            // uses a two-level factorization algorithm. This algorithm generally improves scalability in case of parallel factorization on many threads (more than eight).
    iparm[26] = 0;            // matrix check
    iparm[59] = 0;            // OOC Mode is off

    maxfct = 1;            // Maximum number of numerical factorizations.
    mnum = 1;            // Which factorization to use.

 

 

but segmentation fault appears with these (mtype = -4):

    iparm[0] = 1;            /* No solver default */
    iparm[1] = 3;            /* Fill-in reordering from METIS */ // edited: OpenMP version 
    /* Numbers of processors, value of OMP_NUM_THREADS */
    iparm[2] = 1;
    iparm[3] = 0;            /* No iterative-direct algorithm */
    iparm[4] = 0;            /* No user fill-in reducing permutation */
    iparm[5] = 0;            /* Write solution into x */
    iparm[6] = 0;            /* Not in use */
    iparm[7] = 2;            /* Max numbers of iterative refinement steps */
    iparm[8] = 0;            /* Not in use */
    iparm[9] = 13;            /* Perturb the pivot elements with 1E-13 */
    iparm[10] = 1;            /* Use nonsymmetric permutation and scaling MPS */
    iparm[11] = 0;            /* Conjugate transposed/transpose solve == non-transposed*/
    iparm[12] = 0;            /* Maximum weighted matching algorithm is switched-on (default for non-symmetric) */
    iparm[13] = 0;            /* Output: Number of perturbed pivots */
    iparm[14] = 0;            /* Not in use */
    iparm[15] = 0;            /* Not in use */
    iparm[16] = 0;            /* Not in use */
    iparm[17] = -1;            /* Output: Number of nonzeros in the factor LU */
    iparm[18] = -1;            /* Output: Mflops for LU factorization */
    iparm[19] = 0;            /* Output: Numbers of CG Iterations */
    iparm[26] = 1;            // matrix checker is on
//    iparm[59] = 2;            // turn on the OOC Mode

    maxfct = 1;            /* Maximum number of numerical factorizations.  */
    mnum = 1;            /* Which factorization to use. */

 

Gennady_F_Intel
Moderator
119 Views

number of non-zeros in L:                5009690816
number of non-zeros in U:                1
number of non-zeros in L+U:              5009690817

You use in-core version for the case where for factorizations you have to have  memory available  at least sizeof(MKL_Complex16) * number of non-zeros in L+U .  ( 5*10^9 * 16 bytes >= 80 Gb ).

Do you have enough RAM of that system? Please check it first,  

I'd recommend to try OOC version. iparm[59] == 2

--Gennady

 

Koshkarev_A_
Beginner
119 Views

of course I use it on the server with RAM enough (132Gb, 2Tb) and during execution on the graph memory consumption does not exceed 80Gb.

I guess that the problem is in using "nonsymmetric permutation and scaling MPS" (iparm[10]=1) for big Hermitian matrices (smaller runs ok).

Gennady_F_Intel
Moderator
119 Views

but if I am not mistaken, you used LP64 interfaces which may not addres arrays with more than 231-1 elements... therefore ILP64 interfaces has to be used for such sort of probem. Did you try that?

Gennady_F_Intel
Moderator
119 Views

in any case, what would be output , in the case if iparm[14] = 1;    iparm[15] = 1;    iparm[16] = 1;      ?

 

Koshkarev_A_
Beginner
119 Views

in the case if

    iparm[0] = 1;            /* No solver default */

    iparm[1] = 3;            /* Fill-in reordering from METIS */ // edited: OpenMP version 
    /* Numbers of processors, value of OMP_NUM_THREADS */
    iparm[2] = 1;
    iparm[3] = 0;            /* No iterative-direct algorithm */
    iparm[4] = 0;            /* No user fill-in reducing permutation */
    iparm[5] = 0;            /* Write solution into x */
    iparm[6] = 0;            /* Not in use */
    iparm[7] = 2;            /* Max numbers of iterative refinement steps */
    iparm[8] = 0;            /* Not in use */
    iparm[9] = 13;            /* Perturb the pivot elements with 1E-13 */
    iparm[10] = 1;            /* Use nonsymmetric permutation and scaling MPS */
    iparm[11] = 0;            /* Conjugate transposed/transpose solve == non-transposed*/
    iparm[12] = 0;            /* Maximum weighted matching algorithm is switched-on (default for non-symmetric) */
    iparm[13] = 0;            /* Output: Number of perturbed pivots */
    iparm[14] = 1;            /* Not in use */
    iparm[15] = 1;            /* Not in use */
    iparm[16] = 1;            /* Not in use */
    iparm[17] = -1;            /* Output: Number of nonzeros in the factor LU */
    iparm[18] = -1;            /* Output: Mflops for LU factorization */
    iparm[19] = 0;            /* Output: Numbers of CG Iterations */
    iparm[26] = 1;            // matrix checker is on
//    iparm[59] = 2;            // turn on the OOC Mode

    maxfct = 1;            /* Maximum number of numerical factorizations.  */
    mnum = 1;            /* Which factorization to use. */

 

Intel(R) Debugger for applications running on Intel(R) 64, Version 13.0, Build [80.483.23]
------------------
object file name: ./main.out
Reading symbols from /mnt/storage/home/aakoshkarev/anton/Magneto/_results/intel_lp64_parallel_intel64_so/main.out...done.
(idb) run
Starting program: /mnt/storage/home/aakoshkarev/anton/Magneto/_results/intel_lp64_parallel_intel64_so/main.out
[New Thread 14487 (LWP 14487)]
[New Thread 15058 (LWP 15058)]
[New Thread 15059 (LWP 15059)]
[New Thread 15060 (LWP 15060)]
[New Thread 15084 (LWP 15084)]
[New Thread 15085 (LWP 15085)]
[New Thread 15086 (LWP 15086)]
[New Thread 15087 (LWP 15087)]
[New Thread 15088 (LWP 15088)]
[New Thread 15089 (LWP 15089)]
[New Thread 15121 (LWP 15121)]
[New Thread 15122 (LWP 15122)]
[New Thread 15123 (LWP 15123)]
[New Thread 15124 (LWP 15124)]
[New Thread 15149 (LWP 15149)]
[New Thread 15150 (LWP 15150)]

 Matrix A
 (  0.98,  0.00) (  0.19,  0.09)
 (  0.19, -0.09) ( -0.98,  0.00)

 Eigenvalues
 (  1.00, -0.00) ( -1.00,  0.00)

 Right eigenvectors
 (  0.99,  0.00) (  0.09,  0.04)
 ( -0.09,  0.04) (  0.99,  0.00)


Energy1 = <psi+|H1|psi+> = (0.000033 -0.000000)

maxcurrent = 23551999

<psi+|psi-> = (0.000000, -0.000000)

<psi+|s_z|psi+> = (0.999984, 0.000000)

<psi+|s_z|psi-> = (-0.000000, 0.000000)

<psi-|s_z|psi+> = (-0.000000, -0.000000)

<psi-|s_z|psi-> = (-0.999984, 0.000000)

=== PARDISO: solving a Hermitian indefinite system ===
The local (internal) PARDISO version is                          : 103911000
1-based array indexing is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Scaling is turned ON


Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.397580 s
Time spent in reordering of the initial matrix (reorder)         : 19.229663 s
Time spent in symbolic factorization (symbfct)                   : 20.043214 s
Time spent in data preparations for factorization (parlist)      : 0.484405 s
Time spent in allocation of internal data structures (malloc)    : 2.318397 s
Time spent in additional calculations                            : 4.486993 s
Total time spent                                                 : 46.960252 s

Statistics:
===========
< Parallel Direct Factorization with number of processors: > 15
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b >
             number of equations:           640000
             number of non-zeros in A:      23552000
             number of non-zeros in A (%): 0.005750

             number of right-hand sides:    2

< Factors L and U >
             number of columns for each panel: 96
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    181459
             size of largest supernode:               46400
             number of non-zeros in L:                5035922220
             number of non-zeros in U:                1
             number of non-zeros in L+U:              5035922221

Reordering completed ...
Number of nonzeros in factors  = 740954925
Number of factorization MFLOPS = 515273800=== PARDISO is running in In-Core mode, because iparam(60)=0 ===
Percentage of computed non-zeros for LL^T factorization
 0 %  1 %  2 %  3 %  4 %  5 %  6 %  7 %  8 %  9 %  10 %  11 %  12 %  13 %  14 %  15 %  16 %  17 %  18 %  19 %  20 %  21 %  22 %  23 %  24 %  25 %  26 %  27 %  28 %  29 %  30 %  31 %  32 %  %  34 %  35 %  36 %  37 %  38 %  39 %  40 %  41 %  42 %  43 %  44 %  45 %  46 %  47 %  49 %  50 %  51 %  52 %  53 %  54 %  55 %  56 %  57 %  58 %  59 %  60 %  61 %  62 %  63 %  64 %  65 %  %  68 %  69 %  70 %  71 %  72 %  74 %  75 %  76 %  77 %  78 %  80 %  81 %  82 %  83 %  84 %  85 %  86 %  87 %  88 %  89 %  90 %  91 %  92 %  93 %  94 %  95 %  96 %  97 %  98 %  99 %  100 %

=== PARDISO: solving a Hermitian indefinite system ===
Single-level factorization algorithm is turned ON


Summary: ( factorization phase )
================

Times:
======
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct)                        : 9888.516431 s
Time spent in allocation of internal data structures (malloc)    : 0.002765 s
Time spent in additional calculations                            : 0.000020 s
Total time spent                                                 : 9888.519216 s

Statistics:
===========
< Parallel Direct Factorization with number of processors: > 15
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b >
             number of equations:           640000
             number of non-zeros in A:      23552000
             number of non-zeros in A (%): 0.005750

             number of right-hand sides:    2

< Factors L and U >
             number of columns for each panel: 96
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    181459
             size of largest supernode:               46400
             number of non-zeros in L:                5035922220
             number of non-zeros in U:                1
             number of non-zeros in L+U:              5035922221
             gflop   for the numerical factorization: 515273.800873

             gflop/s for the numerical factorization: 52.108302


Factorization completed ...

=== PARDISO: solving a Hermitian indefinite system ===


Summary: ( solution phase )
================

Times:
======
Time spent in direct solver at solve step (solve)                : 145.461488 s
Time spent in additional calculations                            : 292.539900 s
Total time spent                                                 : 438.001388 s

Statistics:
===========
< Parallel Direct Factorization with number of processors: > 15
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b >
             number of equations:           640000
             number of non-zeros in A:      23552000
             number of non-zeros in A (%): 0.005750

             number of right-hand sides:    2

< Factors L and U >
             number of columns for each panel: 96
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    181459
             size of largest supernode:               46400
             number of non-zeros in L:                5035922220
             number of non-zeros in U:                1
             number of non-zeros in L+U:              5035922221
             gflop   for the numerical factorization: 515273.800873

             gflop/s for the numerical factorization: 52.108302


[Thread 15059 (LWP 15059) exited] with exit status 0
[Thread 15060 (LWP 15060) exited] with exit status 0
[Thread 15084 (LWP 15084) exited] with exit status 0
[Thread 15085 (LWP 15085) exited] with exit status 0
[Thread 15086 (LWP 15086) exited] with exit status 0
[Thread 15087 (LWP 15087) exited] with exit status 0
[Thread 15088 (LWP 15088) exited] with exit status 0
[Thread 15089 (LWP 15089) exited] with exit status 0
[Thread 15121 (LWP 15121) exited] with exit status 0
[Thread 15122 (LWP 15122) exited] with exit status 0
[Thread 15123 (LWP 15123) exited] with exit status 0
[Thread 15124 (LWP 15124) exited] with exit status 0
[Thread 15149 (LWP 15149) exited] with exit status 0
[Thread 15150 (LWP 15150) exited] with exit status 0
[Thread 15058 (LWP 15058) exited] with exit status 0
Program exited normally.
(idb) =>> PBS: job killed: ncpus 3891.0 exceeded limit 15 (burst)
Terminated

Koshkarev_A_
Beginner
119 Views

ILP64 just run with iparm's that caused fault just run ok, running now the same with LP64 to make sure

Koshkarev_A_
Beginner
119 Views

LP64 runs ok too, just cant stand why