Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Error in PARDISO memory allocation: MATCHING_REORDERING_DATA

Stefano_B_
Beginner
16,623 Views

Hi,
I am running PARDISO routines to solve a linear equation system with more than 40000 parameters. I use the "intel mkl 2016.1.150" libraries and set up the variable "export MKL_PARDISO_OOC_MAX_CORE_SIZE=120000". My system has 138Gb of RAM. I attach here below an extract of my code which gets in input the A matrix in CSR3 format and gives the solution x vector as output.

SUBROUTINE parsol(neq, a, ja, ia)

IMPLICIT NONE

c List of Parameters
c ------------------
        TYPE(t_neq)               :: neq
        INTEGER*4, DIMENSION(*)   :: ia
        INTEGER*4, DIMENSION(*)   :: ja
        REAL*8, DIMENSION(*)      :: a

c Local Parameters
c ----------------

c Local Variables
c ---------------
C..     Internal solver memory pointer
        INTEGER*8 pt(64)

C..     All other variables
        INTEGER*4    maxfct, mnum, mtype, phase, error, nrhs, msglvl
        INTEGER*4    iparm(64)
        REAL*8       dparm(64)
        REAL*8       b(neq%misc%npar)
        REAL*8       x(neq%misc%npar)

        INTEGER*4 i, j, idum, solver
        REAL*8  waltime1, waltime2, ddum, normb, normr

C.. Fill all arrays containing matrix data.

C   Number of right-hand-sides to solve
      nrhs = 1
C   Other parameters
      maxfct = 1
      mnum = 1

C
C  .. Setup Pardiso control parameters und initialize the solvers
C     internal adress pointers. This is only necessary for the FIRST
C     call of the PARDISO solver.

C  mtype = ...
C       1    real and structurally symmetric
C       2    real and symmetric positive definite
C       -2    real and symmetric indefinite
C       3    complex and structurally symmetric
C       4    complex and Hermitian positive definite
C       -4    complex and Hermitian indefinite
C       6    complex and symmetric
C       11    real and nonsymmetric
C       13    complex and nonsymmetric

      mtype     = 2

C  Initialisation
      pt(:) = 0
      iparm(1) = 0      ! initializes all iparm to their default values

      CALL pardisoinit(pt, mtype, iparm)

C  .. Memory use (in or out core)
      iparm(27) = 1
      iparm(60) = 1

C..   Reordering and Symbolic Factorization, This step also allocates
C     all memory that is necessary for the factorization

      phase     = 11  ! only reordering and symbolic factorization
      msglvl    = 1   ! with (1) or without (0) statistical information

      WRITE(*,*) 'Starting reordering ...'

      CALL pardiso (pt, maxfct, mnum, mtype, phase,
     1              neq%misc%npar, a, ia, ja,
     1              idum, nrhs, iparm, msglvl, ddum, ddum, error, dparm)

      WRITE(*,*) 'Reordering completed ! ',
     1            max(iparm(15), iparm(16)+iparm(63))

      IF (error .NE. 0) THEN
        WRITE(*,*) 'The following ERROR was detected: ', error
        STOP
      END IF

C.. Factorization.
C  phase = ...
C       11    Analysis
C       12    Analysis, numerical factorization
C       13    Analysis, numerical factorization, solve, iterative refinement
C       22    Numerical factorization
C       23    Numerical factorization, solve, iterative refinement
C       33    Solve, iterative refinement
C       331   like phase=33, but only forward substitution
C       332   like phase=33, but only diagonal substitution (if available)
C       333   like phase=33, but only backward substitution
C       0    Release internal memory for L and U matrix number mnum
C       -1   Release all internal memory for all matrices

      phase     = 22  ! only factorization
      CALL pardiso (pt, maxfct, mnum, mtype, phase,
     1              neq%misc%npar, a, ia, ja, idum,
     2              nrhs, iparm, msglvl, ddum, ddum, error, dparm)

      WRITE(*,*) 'Factorization completed ... '
      IF (error .NE. 0) THEN
         WRITE(*,*) 'The following ERROR was detected: ', error
         STOP
      ENDIF

C.. Back substitution and iterative refinement
      iparm(8)  = 1   ! max numbers of iterative refinement steps
      phase     = 33  ! only solve

      b = neq%bnor

      CALL pardiso (pt, maxfct, mnum, mtype, phase,
     1              neq%misc%npar, a, ia, ja,
     1              idum, nrhs, iparm, msglvl, b, x, error, dparm)

      WRITE(*,*) 'Solve completed ...  '

      neq%xxx = x

C.. Memory release
      phase     = -1  ! only solve

      CALL pardiso (pt, maxfct, mnum, mtype, phase,
     1              neq%misc%npar, a, ia, ja,
     1              idum, nrhs, iparm, msglvl, b, x, error, dparm)

      WRITE(*,*) 'Memory released ...  '

Here below is the program output giving back an memory problem. When I run the same program with the same configuration with up to around 32000 parameters, everything works smoothly (and it's astonishingly fast and efficient!).

 Starting reordering ...
*** Error in PARDISO  (     insufficient_memory) error_num= 1
*** Error in PARDISO memory allocation: MATCHING_REORDERING_DATA, allocation of 1 bytes failed
total memory wanted here: 6388548 kbyte

=== PARDISO: solving a symmetric positive definite system ===
1-based array indexing is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON


Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 22.780998 s
Time spent in reordering of the initial matrix (reorder)         : 0.000000 s
Time spent in symbolic factorization (symbfct)                   : 0.000000 s
Time spent in allocation of internal data structures (malloc)    : 1.213865 s
Time spent in additional calculations                            : 9.502059 s
Total time spent                                                 : 33.496922 s

Statistics:
===========
Parallel Direct Factorization is running on 16 OpenMP

< Linear system Ax = b >
             number of equations:           40397
             number of non-zeros in A:      815979003
             number of non-zeros in A (%): 50.001238

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 64
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    0
             size of largest supernode:               0
             number of non-zeros in L:                0
             number of non-zeros in U:                0
             number of non-zeros in L+U:              0
 Reordering completed !            0
 The following ERROR was detected:           -2

Any idea what's the problem? I do not think it's an hardware limitation since the system has 136Gb of RAM and the system is only 6Gb ... I also tried to solve the problem in OOC mode or with less threads without any luck.

Thanks for your help,

Stefano

0 Kudos
1 Solution
Gennady_F_Intel
Moderator
16,623 Views

Stefano,  40000 parameters, is that the neq?   

 could you try to set iparm(2) == 0 and check the problem again?

View solution in original post

0 Kudos
8 Replies
Gennady_F_Intel
Moderator
16,624 Views

Stefano,  40000 parameters, is that the neq?   

 could you try to set iparm(2) == 0 and check the problem again?

0 Kudos
Stefano_B_
Beginner
16,623 Views

Hi, thanks for the quick reply!

Yes, ~40000 parameters is the NEQ (passed to the routine in CSR3 format in the a, ia and ja vectors, I use the neq structure only to pass the r.h.s. and the solution vector).

I tried to set iparm(2)=0 as you suggested and everything worked correctly (between 22-59Gb of memory were needed using 16 OpenMP, as with iparm(2) = 1 but this time nothing crashed).

Why does the [Karypis98] algorithm give this issue with my neq or settings? what are the pros and cons of the two algorithms? Which one should I use - or what else - when going for larger matrices?

Thanks a lot!

0 Kudos
Stefano_B_
Beginner
16,623 Views

Hi again,

unfortunately I got again a similar error when trying to solve a 240x240 neq ~ 58000 par ... this time "BEFORE REORDERING" and with 180Gb RAM made available (I can potentially have up to 256Gb and my goal is to process a 300x300 NEQ).

I also notice that this time the "total memory wanted" is negative (overflow?) ... could this be a problem of 32/64 bits ?

Starting reordering ...
*** Error in PARDISO  (     insufficient_memory) error_num= 2
*** Error in PARDISO memory allocation: BEFORE_REORDERING, allocation of -3601676 bytes failed
total memory wanted here: -3581798 kbyte

=== PARDISO: solving a symmetric positive definite system ===


Summary: ( reordering phase )
================

Times:
======
Time spent in additional calculations                            : 2.038890 s
Total time spent                                                 : 2.038890 s

Statistics:
===========
Parallel Direct Factorization is running on 16 OpenMP

< Linear system Ax = b >
             number of equations:           58077
             number of non-zeros in A:      1686498003
             number of non-zeros in A (%): 50.000861

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 64
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    0
             size of largest supernode:               0
             number of non-zeros in L:                0
             number of non-zeros in U:                0
             number of non-zeros in L+U:              0
 Reordering completed !            0
 The following ERROR was detected:           -2

 

0 Kudos
Gennady_F_Intel
Moderator
16,623 Views

MKL 11.3 experiences the run-time issue with METIS reordering ( iparm[2] == 2 ). The issue has been already fixed into the nearest MKL 11.3 update 3 which we planning to release soon. The notification about that will be published at the Top of MKL forum and I will keep you updated also into this thread.  As a temporarily work-around the problem, please use minimum degree algorithm which is a little slower vs nested dissection. 

0 Kudos
Stefano_B_
Beginner
16,623 Views

Ok, thanks for the notification! I also solved the problem with larger NEQs by switching to PARDISO_64 , which works great although is a bit annoying to convert the ia and ja vectors of the CSR3 format to INT*8, thus doubling the allocated space...

0 Kudos
Stefano_B_
Beginner
16,623 Views

Ok,here we are with the 90000 parameters NEQ to be solved ...

 Starting reordering ...
*** Error in PARDISO  (     insufficient_memory) error_num= 4
*** Error in PARDISO memory allocation: BEFORE_INIT_PARALLEL_DATA, allocation of 32289340 bytes failed
total memory wanted here: 160960551 kbyte

=== PARDISO: solving a symmetric positive definite system ===
1-based array indexing is turned ON
PARDISO double precision computation is turned ON
Minimum degree algorithm at reorder step is turned ON


Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 192.547947 s
Time spent in reordering of the initial matrix (reorder)         : 43.399159 s
Time spent in symbolic factorization (symbfct)                   : 49.032472 s
Time spent in allocation of internal data structures (malloc)    : 10.008380 s
Time spent in additional calculations                            : 0.000203 s
Total time spent                                                 : 294.988161 s

Statistics:
===========
Parallel Direct Factorization is running on 5 OpenMP

< Linear system Ax = b >
             number of equations:           90598
             number of non-zeros in A:      4104044101
             number of non-zeros in A (%): 50.000552

             number of right-hand sides:    1

< Factors L and U >
< Preprocessing with multiple minimum degree, tree height >
< Reduction for efficient parallel factorization >
             number of columns for each panel: 80
             number of independent subgraphs:  0
             number of supernodes:                    1133
             size of largest supernode:               90598
             number of non-zeros in L:                4107621924
             number of non-zeros in U:                1
             number of non-zeros in L+U:              4107621925
 Reordering completed !                      0
 The following ERROR was detected:                     -2

 

I have provided 210 Gb of memory but still get a memory issue at the reordering phase. I also chose OOC mode with 300 Gb available but I suppose this is only used AFTER phase 11. I am using pardis_64 compiled with ILP64 (to be able to use INTEGER*8 and avoid overflows).

Is there any way around this issue? Would it help of increasing/reducing the number of cores? Or maybe combine different phases? Or skip the reordering phase? It would be of great help with "conference season" approaching.

Thanks for any hints!

0 Kudos
Gennady_F_Intel
Moderator
16,623 Views

Your input is not sparse ( number of non-zeros in A (%): 50.000552) therefore for such dense cases this is the expected behavior. For such matrices we strongly recommend to use dense solver instead sparse one.

0 Kudos
Stefano_B_
Beginner
16,623 Views

You are indeed right. Could you please provide a link to an MKL/Lapack library or routine you would suggest to use for such a problem?

This is what I found googling around: ?gels , https://software.intel.com/en-us/node/469160#EC9BE639-8638-4AF2-A4AC-74C9E0334883

Thanks!

0 Kudos
Reply