Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.
7234 Discussions

Error in PARDISO memory allocation: MATCHING_REORDERING_DATA

Stefano_B_
Beginner
18,693 Views

Hi,
I am running PARDISO routines to solve a linear equation system with more than 40000 parameters. I use the "intel mkl 2016.1.150" libraries and set up the variable "export MKL_PARDISO_OOC_MAX_CORE_SIZE=120000". My system has 138Gb of RAM. I attach here below an extract of my code which gets in input the A matrix in CSR3 format and gives the solution x vector as output.

SUBROUTINE parsol(neq, a, ja, ia)

IMPLICIT NONE

c List of Parameters
c ------------------
        TYPE(t_neq)               :: neq
        INTEGER*4, DIMENSION(*)   :: ia
        INTEGER*4, DIMENSION(*)   :: ja
        REAL*8, DIMENSION(*)      :: a

c Local Parameters
c ----------------

c Local Variables
c ---------------
C..     Internal solver memory pointer
        INTEGER*8 pt(64)

C..     All other variables
        INTEGER*4    maxfct, mnum, mtype, phase, error, nrhs, msglvl
        INTEGER*4    iparm(64)
        REAL*8       dparm(64)
        REAL*8       b(neq%misc%npar)
        REAL*8       x(neq%misc%npar)

        INTEGER*4 i, j, idum, solver
        REAL*8  waltime1, waltime2, ddum, normb, normr

C.. Fill all arrays containing matrix data.

C   Number of right-hand-sides to solve
      nrhs = 1
C   Other parameters
      maxfct = 1
      mnum = 1

C
C  .. Setup Pardiso control parameters und initialize the solvers
C     internal adress pointers. This is only necessary for the FIRST
C     call of the PARDISO solver.

C  mtype = ...
C       1    real and structurally symmetric
C       2    real and symmetric positive definite
C       -2    real and symmetric indefinite
C       3    complex and structurally symmetric
C       4    complex and Hermitian positive definite
C       -4    complex and Hermitian indefinite
C       6    complex and symmetric
C       11    real and nonsymmetric
C       13    complex and nonsymmetric

      mtype     = 2

C  Initialisation
      pt(:) = 0
      iparm(1) = 0      ! initializes all iparm to their default values

      CALL pardisoinit(pt, mtype, iparm)

C  .. Memory use (in or out core)
      iparm(27) = 1
      iparm(60) = 1

C..   Reordering and Symbolic Factorization, This step also allocates
C     all memory that is necessary for the factorization

      phase     = 11  ! only reordering and symbolic factorization
      msglvl    = 1   ! with (1) or without (0) statistical information

      WRITE(*,*) 'Starting reordering ...'

      CALL pardiso (pt, maxfct, mnum, mtype, phase,
     1              neq%misc%npar, a, ia, ja,
     1              idum, nrhs, iparm, msglvl, ddum, ddum, error, dparm)

      WRITE(*,*) 'Reordering completed ! ',
     1            max(iparm(15), iparm(16)+iparm(63))

      IF (error .NE. 0) THEN
        WRITE(*,*) 'The following ERROR was detected: ', error
        STOP
      END IF

C.. Factorization.
C  phase = ...
C       11    Analysis
C       12    Analysis, numerical factorization
C       13    Analysis, numerical factorization, solve, iterative refinement
C       22    Numerical factorization
C       23    Numerical factorization, solve, iterative refinement
C       33    Solve, iterative refinement
C       331   like phase=33, but only forward substitution
C       332   like phase=33, but only diagonal substitution (if available)
C       333   like phase=33, but only backward substitution
C       0    Release internal memory for L and U matrix number mnum
C       -1   Release all internal memory for all matrices

      phase     = 22  ! only factorization
      CALL pardiso (pt, maxfct, mnum, mtype, phase,
     1              neq%misc%npar, a, ia, ja, idum,
     2              nrhs, iparm, msglvl, ddum, ddum, error, dparm)

      WRITE(*,*) 'Factorization completed ... '
      IF (error .NE. 0) THEN
         WRITE(*,*) 'The following ERROR was detected: ', error
         STOP
      ENDIF

C.. Back substitution and iterative refinement
      iparm(8)  = 1   ! max numbers of iterative refinement steps
      phase     = 33  ! only solve

      b = neq%bnor

      CALL pardiso (pt, maxfct, mnum, mtype, phase,
     1              neq%misc%npar, a, ia, ja,
     1              idum, nrhs, iparm, msglvl, b, x, error, dparm)

      WRITE(*,*) 'Solve completed ...  '

      neq%xxx = x

C.. Memory release
      phase     = -1  ! only solve

      CALL pardiso (pt, maxfct, mnum, mtype, phase,
     1              neq%misc%npar, a, ia, ja,
     1              idum, nrhs, iparm, msglvl, b, x, error, dparm)

      WRITE(*,*) 'Memory released ...  '

Here below is the program output giving back an memory problem. When I run the same program with the same configuration with up to around 32000 parameters, everything works smoothly (and it's astonishingly fast and efficient!).

 Starting reordering ...
*** Error in PARDISO  (     insufficient_memory) error_num= 1
*** Error in PARDISO memory allocation: MATCHING_REORDERING_DATA, allocation of 1 bytes failed
total memory wanted here: 6388548 kbyte

=== PARDISO: solving a symmetric positive definite system ===
1-based array indexing is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON


Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 22.780998 s
Time spent in reordering of the initial matrix (reorder)         : 0.000000 s
Time spent in symbolic factorization (symbfct)                   : 0.000000 s
Time spent in allocation of internal data structures (malloc)    : 1.213865 s
Time spent in additional calculations                            : 9.502059 s
Total time spent                                                 : 33.496922 s

Statistics:
===========
Parallel Direct Factorization is running on 16 OpenMP

< Linear system Ax = b >
             number of equations:           40397
             number of non-zeros in A:      815979003
             number of non-zeros in A (%): 50.001238

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 64
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    0
             size of largest supernode:               0
             number of non-zeros in L:                0
             number of non-zeros in U:                0
             number of non-zeros in L+U:              0
 Reordering completed !            0
 The following ERROR was detected:           -2

Any idea what's the problem? I do not think it's an hardware limitation since the system has 136Gb of RAM and the system is only 6Gb ... I also tried to solve the problem in OOC mode or with less threads without any luck.

Thanks for your help,

Stefano

0 Kudos
1 Solution
Gennady_F_Intel
Moderator
18,693 Views

Stefano,  40000 parameters, is that the neq?   

 could you try to set iparm(2) == 0 and check the problem again?

View solution in original post

0 Kudos
8 Replies
Gennady_F_Intel
Moderator
18,694 Views

Stefano,  40000 parameters, is that the neq?   

 could you try to set iparm(2) == 0 and check the problem again?

0 Kudos
Stefano_B_
Beginner
18,693 Views

Hi, thanks for the quick reply!

Yes, ~40000 parameters is the NEQ (passed to the routine in CSR3 format in the a, ia and ja vectors, I use the neq structure only to pass the r.h.s. and the solution vector).

I tried to set iparm(2)=0 as you suggested and everything worked correctly (between 22-59Gb of memory were needed using 16 OpenMP, as with iparm(2) = 1 but this time nothing crashed).

Why does the [Karypis98] algorithm give this issue with my neq or settings? what are the pros and cons of the two algorithms? Which one should I use - or what else - when going for larger matrices?

Thanks a lot!

0 Kudos
Stefano_B_
Beginner
18,693 Views

Hi again,

unfortunately I got again a similar error when trying to solve a 240x240 neq ~ 58000 par ... this time "BEFORE REORDERING" and with 180Gb RAM made available (I can potentially have up to 256Gb and my goal is to process a 300x300 NEQ).

I also notice that this time the "total memory wanted" is negative (overflow?) ... could this be a problem of 32/64 bits ?

Starting reordering ...
*** Error in PARDISO  (     insufficient_memory) error_num= 2
*** Error in PARDISO memory allocation: BEFORE_REORDERING, allocation of -3601676 bytes failed
total memory wanted here: -3581798 kbyte

=== PARDISO: solving a symmetric positive definite system ===


Summary: ( reordering phase )
================

Times:
======
Time spent in additional calculations                            : 2.038890 s
Total time spent                                                 : 2.038890 s

Statistics:
===========
Parallel Direct Factorization is running on 16 OpenMP

< Linear system Ax = b >
             number of equations:           58077
             number of non-zeros in A:      1686498003
             number of non-zeros in A (%): 50.000861

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 64
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    0
             size of largest supernode:               0
             number of non-zeros in L:                0
             number of non-zeros in U:                0
             number of non-zeros in L+U:              0
 Reordering completed !            0
 The following ERROR was detected:           -2

 

0 Kudos
Gennady_F_Intel
Moderator
18,693 Views

MKL 11.3 experiences the run-time issue with METIS reordering ( iparm[2] == 2 ). The issue has been already fixed into the nearest MKL 11.3 update 3 which we planning to release soon. The notification about that will be published at the Top of MKL forum and I will keep you updated also into this thread.  As a temporarily work-around the problem, please use minimum degree algorithm which is a little slower vs nested dissection. 

0 Kudos
Stefano_B_
Beginner
18,693 Views

Ok, thanks for the notification! I also solved the problem with larger NEQs by switching to PARDISO_64 , which works great although is a bit annoying to convert the ia and ja vectors of the CSR3 format to INT*8, thus doubling the allocated space...

0 Kudos
Stefano_B_
Beginner
18,693 Views

Ok,here we are with the 90000 parameters NEQ to be solved ...

 Starting reordering ...
*** Error in PARDISO  (     insufficient_memory) error_num= 4
*** Error in PARDISO memory allocation: BEFORE_INIT_PARALLEL_DATA, allocation of 32289340 bytes failed
total memory wanted here: 160960551 kbyte

=== PARDISO: solving a symmetric positive definite system ===
1-based array indexing is turned ON
PARDISO double precision computation is turned ON
Minimum degree algorithm at reorder step is turned ON


Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 192.547947 s
Time spent in reordering of the initial matrix (reorder)         : 43.399159 s
Time spent in symbolic factorization (symbfct)                   : 49.032472 s
Time spent in allocation of internal data structures (malloc)    : 10.008380 s
Time spent in additional calculations                            : 0.000203 s
Total time spent                                                 : 294.988161 s

Statistics:
===========
Parallel Direct Factorization is running on 5 OpenMP

< Linear system Ax = b >
             number of equations:           90598
             number of non-zeros in A:      4104044101
             number of non-zeros in A (%): 50.000552

             number of right-hand sides:    1

< Factors L and U >
< Preprocessing with multiple minimum degree, tree height >
< Reduction for efficient parallel factorization >
             number of columns for each panel: 80
             number of independent subgraphs:  0
             number of supernodes:                    1133
             size of largest supernode:               90598
             number of non-zeros in L:                4107621924
             number of non-zeros in U:                1
             number of non-zeros in L+U:              4107621925
 Reordering completed !                      0
 The following ERROR was detected:                     -2

 

I have provided 210 Gb of memory but still get a memory issue at the reordering phase. I also chose OOC mode with 300 Gb available but I suppose this is only used AFTER phase 11. I am using pardis_64 compiled with ILP64 (to be able to use INTEGER*8 and avoid overflows).

Is there any way around this issue? Would it help of increasing/reducing the number of cores? Or maybe combine different phases? Or skip the reordering phase? It would be of great help with "conference season" approaching.

Thanks for any hints!

0 Kudos
Gennady_F_Intel
Moderator
18,693 Views

Your input is not sparse ( number of non-zeros in A (%): 50.000552) therefore for such dense cases this is the expected behavior. For such matrices we strongly recommend to use dense solver instead sparse one.

0 Kudos
Stefano_B_
Beginner
18,693 Views

You are indeed right. Could you please provide a link to an MKL/Lapack library or routine you would suggest to use for such a problem?

This is what I found googling around: ?gels , https://software.intel.com/en-us/node/469160#EC9BE639-8638-4AF2-A4AC-74C9E0334883

Thanks!

0 Kudos
Reply