Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

## Error in PARDISO memory allocation: MATCHING_REORDERING_DATA

Beginner
16,675 Views

Hi,
I am running PARDISO routines to solve a linear equation system with more than 40000 parameters. I use the "intel mkl 2016.1.150" libraries and set up the variable "export MKL_PARDISO_OOC_MAX_CORE_SIZE=120000". My system has 138Gb of RAM. I attach here below an extract of my code which gets in input the A matrix in CSR3 format and gives the solution x vector as output.

```SUBROUTINE parsol(neq, a, ja, ia)

IMPLICIT NONE

c List of Parameters
c ------------------
TYPE(t_neq)               :: neq
INTEGER*4, DIMENSION(*)   :: ia
INTEGER*4, DIMENSION(*)   :: ja
REAL*8, DIMENSION(*)      :: a

c Local Parameters
c ----------------

c Local Variables
c ---------------
C..     Internal solver memory pointer
INTEGER*8 pt(64)

C..     All other variables
INTEGER*4    maxfct, mnum, mtype, phase, error, nrhs, msglvl
INTEGER*4    iparm(64)
REAL*8       dparm(64)
REAL*8       b(neq%misc%npar)
REAL*8       x(neq%misc%npar)

INTEGER*4 i, j, idum, solver
REAL*8  waltime1, waltime2, ddum, normb, normr

C.. Fill all arrays containing matrix data.

C   Number of right-hand-sides to solve
nrhs = 1
C   Other parameters
maxfct = 1
mnum = 1

C
C  .. Setup Pardiso control parameters und initialize the solvers
C     internal adress pointers. This is only necessary for the FIRST
C     call of the PARDISO solver.

C  mtype = ...
C       1    real and structurally symmetric
C       2    real and symmetric positive definite
C       -2    real and symmetric indefinite
C       3    complex and structurally symmetric
C       4    complex and Hermitian positive definite
C       -4    complex and Hermitian indefinite
C       6    complex and symmetric
C       11    real and nonsymmetric
C       13    complex and nonsymmetric

mtype     = 2

C  Initialisation
pt(:) = 0
iparm(1) = 0      ! initializes all iparm to their default values

CALL pardisoinit(pt, mtype, iparm)

C  .. Memory use (in or out core)
iparm(27) = 1
iparm(60) = 1

C..   Reordering and Symbolic Factorization, This step also allocates
C     all memory that is necessary for the factorization

phase     = 11  ! only reordering and symbolic factorization
msglvl    = 1   ! with (1) or without (0) statistical information

WRITE(*,*) 'Starting reordering ...'

CALL pardiso (pt, maxfct, mnum, mtype, phase,
1              neq%misc%npar, a, ia, ja,
1              idum, nrhs, iparm, msglvl, ddum, ddum, error, dparm)

WRITE(*,*) 'Reordering completed ! ',
1            max(iparm(15), iparm(16)+iparm(63))

IF (error .NE. 0) THEN
WRITE(*,*) 'The following ERROR was detected: ', error
STOP
END IF

C.. Factorization.
C  phase = ...
C       11    Analysis
C       12    Analysis, numerical factorization
C       13    Analysis, numerical factorization, solve, iterative refinement
C       22    Numerical factorization
C       23    Numerical factorization, solve, iterative refinement
C       33    Solve, iterative refinement
C       331   like phase=33, but only forward substitution
C       332   like phase=33, but only diagonal substitution (if available)
C       333   like phase=33, but only backward substitution
C       0    Release internal memory for L and U matrix number mnum
C       -1   Release all internal memory for all matrices

phase     = 22  ! only factorization
CALL pardiso (pt, maxfct, mnum, mtype, phase,
1              neq%misc%npar, a, ia, ja, idum,
2              nrhs, iparm, msglvl, ddum, ddum, error, dparm)

WRITE(*,*) 'Factorization completed ... '
IF (error .NE. 0) THEN
WRITE(*,*) 'The following ERROR was detected: ', error
STOP
ENDIF

C.. Back substitution and iterative refinement
iparm(8)  = 1   ! max numbers of iterative refinement steps
phase     = 33  ! only solve

b = neq%bnor

CALL pardiso (pt, maxfct, mnum, mtype, phase,
1              neq%misc%npar, a, ia, ja,
1              idum, nrhs, iparm, msglvl, b, x, error, dparm)

WRITE(*,*) 'Solve completed ...  '

neq%xxx = x

C.. Memory release
phase     = -1  ! only solve

CALL pardiso (pt, maxfct, mnum, mtype, phase,
1              neq%misc%npar, a, ia, ja,
1              idum, nrhs, iparm, msglvl, b, x, error, dparm)

WRITE(*,*) 'Memory released ...  '```

Here below is the program output giving back an memory problem. When I run the same program with the same configuration with up to around 32000 parameters, everything works smoothly (and it's astonishingly fast and efficient!).

``` Starting reordering ...
*** Error in PARDISO  (     insufficient_memory) error_num= 1
*** Error in PARDISO memory allocation: MATCHING_REORDERING_DATA, allocation of 1 bytes failed
total memory wanted here: 6388548 kbyte

=== PARDISO: solving a symmetric positive definite system ===
1-based array indexing is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON

Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 22.780998 s
Time spent in reordering of the initial matrix (reorder)         : 0.000000 s
Time spent in symbolic factorization (symbfct)                   : 0.000000 s
Time spent in allocation of internal data structures (malloc)    : 1.213865 s
Time spent in additional calculations                            : 9.502059 s
Total time spent                                                 : 33.496922 s

Statistics:
===========
Parallel Direct Factorization is running on 16 OpenMP

< Linear system Ax = b >
number of equations:           40397
number of non-zeros in A:      815979003
number of non-zeros in A (%): 50.001238

number of right-hand sides:    1

< Factors L and U >
number of columns for each panel: 64
number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
number of supernodes:                    0
size of largest supernode:               0
number of non-zeros in L:                0
number of non-zeros in U:                0
number of non-zeros in L+U:              0
Reordering completed !            0
The following ERROR was detected:           -2
```

Any idea what's the problem? I do not think it's an hardware limitation since the system has 136Gb of RAM and the system is only 6Gb ... I also tried to solve the problem in OOC mode or with less threads without any luck.

Stefano

1 Solution
Moderator
16,675 Views

Stefano,  40000 parameters, is that the neq?

could you try to set iparm(2) == 0 and check the problem again?

8 Replies
Moderator
16,676 Views

Stefano,  40000 parameters, is that the neq?

could you try to set iparm(2) == 0 and check the problem again?

Beginner
16,675 Views

Hi, thanks for the quick reply!

Yes, ~40000 parameters is the NEQ (passed to the routine in CSR3 format in the a, ia and ja vectors, I use the neq structure only to pass the r.h.s. and the solution vector).

I tried to set iparm(2)=0 as you suggested and everything worked correctly (between 22-59Gb of memory were needed using 16 OpenMP, as with iparm(2) = 1 but this time nothing crashed).

Why does the [Karypis98] algorithm give this issue with my neq or settings? what are the pros and cons of the two algorithms? Which one should I use - or what else - when going for larger matrices?

Thanks a lot!

Beginner
16,675 Views

Hi again,

unfortunately I got again a similar error when trying to solve a 240x240 neq ~ 58000 par ... this time "BEFORE REORDERING" and with 180Gb RAM made available (I can potentially have up to 256Gb and my goal is to process a 300x300 NEQ).

I also notice that this time the "total memory wanted" is negative (overflow?) ... could this be a problem of 32/64 bits ?

```Starting reordering ...
*** Error in PARDISO  (     insufficient_memory) error_num= 2
*** Error in PARDISO memory allocation: BEFORE_REORDERING, allocation of -3601676 bytes failed
total memory wanted here: -3581798 kbyte

=== PARDISO: solving a symmetric positive definite system ===

Summary: ( reordering phase )
================

Times:
======
Time spent in additional calculations                            : 2.038890 s
Total time spent                                                 : 2.038890 s

Statistics:
===========
Parallel Direct Factorization is running on 16 OpenMP

< Linear system Ax = b >
number of equations:           58077
number of non-zeros in A:      1686498003
number of non-zeros in A (%): 50.000861

number of right-hand sides:    1

< Factors L and U >
number of columns for each panel: 64
number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
number of supernodes:                    0
size of largest supernode:               0
number of non-zeros in L:                0
number of non-zeros in U:                0
number of non-zeros in L+U:              0
Reordering completed !            0
The following ERROR was detected:           -2```

Moderator
16,675 Views

MKL 11.3 experiences the run-time issue with METIS reordering ( iparm[2] == 2 ). The issue has been already fixed into the nearest MKL 11.3 update 3 which we planning to release soon. The notification about that will be published at the Top of MKL forum and I will keep you updated also into this thread.  As a temporarily work-around the problem, please use minimum degree algorithm which is a little slower vs nested dissection.

Beginner
16,675 Views

Ok, thanks for the notification! I also solved the problem with larger NEQs by switching to PARDISO_64 , which works great although is a bit annoying to convert the ia and ja vectors of the CSR3 format to INT*8, thus doubling the allocated space...

Beginner
16,675 Views

Ok,here we are with the 90000 parameters NEQ to be solved ...

``` Starting reordering ...
*** Error in PARDISO  (     insufficient_memory) error_num= 4
*** Error in PARDISO memory allocation: BEFORE_INIT_PARALLEL_DATA, allocation of 32289340 bytes failed
total memory wanted here: 160960551 kbyte

=== PARDISO: solving a symmetric positive definite system ===
1-based array indexing is turned ON
PARDISO double precision computation is turned ON
Minimum degree algorithm at reorder step is turned ON

Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 192.547947 s
Time spent in reordering of the initial matrix (reorder)         : 43.399159 s
Time spent in symbolic factorization (symbfct)                   : 49.032472 s
Time spent in allocation of internal data structures (malloc)    : 10.008380 s
Time spent in additional calculations                            : 0.000203 s
Total time spent                                                 : 294.988161 s

Statistics:
===========
Parallel Direct Factorization is running on 5 OpenMP

< Linear system Ax = b >
number of equations:           90598
number of non-zeros in A:      4104044101
number of non-zeros in A (%): 50.000552

number of right-hand sides:    1

< Factors L and U >
< Preprocessing with multiple minimum degree, tree height >
< Reduction for efficient parallel factorization >
number of columns for each panel: 80
number of independent subgraphs:  0
number of supernodes:                    1133
size of largest supernode:               90598
number of non-zeros in L:                4107621924
number of non-zeros in U:                1
number of non-zeros in L+U:              4107621925
Reordering completed !                      0
The following ERROR was detected:                     -2```

I have provided 210 Gb of memory but still get a memory issue at the reordering phase. I also chose OOC mode with 300 Gb available but I suppose this is only used AFTER phase 11. I am using pardis_64 compiled with ILP64 (to be able to use INTEGER*8 and avoid overflows).

Is there any way around this issue? Would it help of increasing/reducing the number of cores? Or maybe combine different phases? Or skip the reordering phase? It would be of great help with "conference season" approaching.

Thanks for any hints!

Moderator
16,675 Views

Your input is not sparse ( number of non-zeros in A (%): 50.000552) therefore for such dense cases this is the expected behavior. For such matrices we strongly recommend to use dense solver instead sparse one.

Beginner
16,675 Views

You are indeed right. Could you please provide a link to an MKL/Lapack library or routine you would suggest to use for such a problem?

This is what I found googling around: ?gels , https://software.intel.com/en-us/node/469160#EC9BE639-8638-4AF2-A4AC-74C9E0334883

Thanks!