- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am running PARDISO routines to solve a linear equation system with more than 40000 parameters. I use the "intel mkl 2016.1.150" libraries and set up the variable "export MKL_PARDISO_OOC_MAX_CORE_SIZE=120000". My system has 138Gb of RAM. I attach here below an extract of my code which gets in input the A matrix in CSR3 format and gives the solution x vector as output.
SUBROUTINE parsol(neq, a, ja, ia)
IMPLICIT NONE
c List of Parameters
c ------------------
TYPE(t_neq) :: neq
INTEGER*4, DIMENSION(*) :: ia
INTEGER*4, DIMENSION(*) :: ja
REAL*8, DIMENSION(*) :: a
c Local Parameters
c ----------------
c Local Variables
c ---------------
C.. Internal solver memory pointer
INTEGER*8 pt(64)
C.. All other variables
INTEGER*4 maxfct, mnum, mtype, phase, error, nrhs, msglvl
INTEGER*4 iparm(64)
REAL*8 dparm(64)
REAL*8 b(neq%misc%npar)
REAL*8 x(neq%misc%npar)
INTEGER*4 i, j, idum, solver
REAL*8 waltime1, waltime2, ddum, normb, normr
C.. Fill all arrays containing matrix data.
C Number of right-hand-sides to solve
nrhs = 1
C Other parameters
maxfct = 1
mnum = 1
C
C .. Setup Pardiso control parameters und initialize the solvers
C internal adress pointers. This is only necessary for the FIRST
C call of the PARDISO solver.
C mtype = ...
C 1 real and structurally symmetric
C 2 real and symmetric positive definite
C -2 real and symmetric indefinite
C 3 complex and structurally symmetric
C 4 complex and Hermitian positive definite
C -4 complex and Hermitian indefinite
C 6 complex and symmetric
C 11 real and nonsymmetric
C 13 complex and nonsymmetric
mtype = 2
C Initialisation
pt(:) = 0
iparm(1) = 0 ! initializes all iparm to their default values
CALL pardisoinit(pt, mtype, iparm)
C .. Memory use (in or out core)
iparm(27) = 1
iparm(60) = 1
C.. Reordering and Symbolic Factorization, This step also allocates
C all memory that is necessary for the factorization
phase = 11 ! only reordering and symbolic factorization
msglvl = 1 ! with (1) or without (0) statistical information
WRITE(*,*) 'Starting reordering ...'
CALL pardiso (pt, maxfct, mnum, mtype, phase,
1 neq%misc%npar, a, ia, ja,
1 idum, nrhs, iparm, msglvl, ddum, ddum, error, dparm)
WRITE(*,*) 'Reordering completed ! ',
1 max(iparm(15), iparm(16)+iparm(63))
IF (error .NE. 0) THEN
WRITE(*,*) 'The following ERROR was detected: ', error
STOP
END IF
C.. Factorization.
C phase = ...
C 11 Analysis
C 12 Analysis, numerical factorization
C 13 Analysis, numerical factorization, solve, iterative refinement
C 22 Numerical factorization
C 23 Numerical factorization, solve, iterative refinement
C 33 Solve, iterative refinement
C 331 like phase=33, but only forward substitution
C 332 like phase=33, but only diagonal substitution (if available)
C 333 like phase=33, but only backward substitution
C 0 Release internal memory for L and U matrix number mnum
C -1 Release all internal memory for all matrices
phase = 22 ! only factorization
CALL pardiso (pt, maxfct, mnum, mtype, phase,
1 neq%misc%npar, a, ia, ja, idum,
2 nrhs, iparm, msglvl, ddum, ddum, error, dparm)
WRITE(*,*) 'Factorization completed ... '
IF (error .NE. 0) THEN
WRITE(*,*) 'The following ERROR was detected: ', error
STOP
ENDIF
C.. Back substitution and iterative refinement
iparm(8) = 1 ! max numbers of iterative refinement steps
phase = 33 ! only solve
b = neq%bnor
CALL pardiso (pt, maxfct, mnum, mtype, phase,
1 neq%misc%npar, a, ia, ja,
1 idum, nrhs, iparm, msglvl, b, x, error, dparm)
WRITE(*,*) 'Solve completed ... '
neq%xxx = x
C.. Memory release
phase = -1 ! only solve
CALL pardiso (pt, maxfct, mnum, mtype, phase,
1 neq%misc%npar, a, ia, ja,
1 idum, nrhs, iparm, msglvl, b, x, error, dparm)
WRITE(*,*) 'Memory released ... '
Here below is the program output giving back an memory problem. When I run the same program with the same configuration with up to around 32000 parameters, everything works smoothly (and it's astonishingly fast and efficient!).
Starting reordering ...
*** Error in PARDISO ( insufficient_memory) error_num= 1
*** Error in PARDISO memory allocation: MATCHING_REORDERING_DATA, allocation of 1 bytes failed
total memory wanted here: 6388548 kbyte
=== PARDISO: solving a symmetric positive definite system ===
1-based array indexing is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Summary: ( reordering phase )
================
Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 22.780998 s
Time spent in reordering of the initial matrix (reorder) : 0.000000 s
Time spent in symbolic factorization (symbfct) : 0.000000 s
Time spent in allocation of internal data structures (malloc) : 1.213865 s
Time spent in additional calculations : 9.502059 s
Total time spent : 33.496922 s
Statistics:
===========
Parallel Direct Factorization is running on 16 OpenMP
< Linear system Ax = b >
number of equations: 40397
number of non-zeros in A: 815979003
number of non-zeros in A (%): 50.001238
number of right-hand sides: 1
< Factors L and U >
number of columns for each panel: 64
number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
number of supernodes: 0
size of largest supernode: 0
number of non-zeros in L: 0
number of non-zeros in U: 0
number of non-zeros in L+U: 0
Reordering completed ! 0
The following ERROR was detected: -2
Any idea what's the problem? I do not think it's an hardware limitation since the system has 136Gb of RAM and the system is only 6Gb ... I also tried to solve the problem in OOC mode or with less threads without any luck.
Thanks for your help,
Stefano
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Stefano, 40000 parameters, is that the neq?
could you try to set iparm(2) == 0 and check the problem again?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Stefano, 40000 parameters, is that the neq?
could you try to set iparm(2) == 0 and check the problem again?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, thanks for the quick reply!
Yes, ~40000 parameters is the NEQ (passed to the routine in CSR3 format in the a, ia and ja vectors, I use the neq structure only to pass the r.h.s. and the solution vector).
I tried to set iparm(2)=0 as you suggested and everything worked correctly (between 22-59Gb of memory were needed using 16 OpenMP, as with iparm(2) = 1 but this time nothing crashed).
Why does the [Karypis98] algorithm give this issue with my neq or settings? what are the pros and cons of the two algorithms? Which one should I use - or what else - when going for larger matrices?
Thanks a lot!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi again,
unfortunately I got again a similar error when trying to solve a 240x240 neq ~ 58000 par ... this time "BEFORE REORDERING" and with 180Gb RAM made available (I can potentially have up to 256Gb and my goal is to process a 300x300 NEQ).
I also notice that this time the "total memory wanted" is negative (overflow?) ... could this be a problem of 32/64 bits ?
Starting reordering ...
*** Error in PARDISO ( insufficient_memory) error_num= 2
*** Error in PARDISO memory allocation: BEFORE_REORDERING, allocation of -3601676 bytes failed
total memory wanted here: -3581798 kbyte
=== PARDISO: solving a symmetric positive definite system ===
Summary: ( reordering phase )
================
Times:
======
Time spent in additional calculations : 2.038890 s
Total time spent : 2.038890 s
Statistics:
===========
Parallel Direct Factorization is running on 16 OpenMP
< Linear system Ax = b >
number of equations: 58077
number of non-zeros in A: 1686498003
number of non-zeros in A (%): 50.000861
number of right-hand sides: 1
< Factors L and U >
number of columns for each panel: 64
number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
number of supernodes: 0
size of largest supernode: 0
number of non-zeros in L: 0
number of non-zeros in U: 0
number of non-zeros in L+U: 0
Reordering completed ! 0
The following ERROR was detected: -2
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
MKL 11.3 experiences the run-time issue with METIS reordering ( iparm[2] == 2 ). The issue has been already fixed into the nearest MKL 11.3 update 3 which we planning to release soon. The notification about that will be published at the Top of MKL forum and I will keep you updated also into this thread. As a temporarily work-around the problem, please use minimum degree algorithm which is a little slower vs nested dissection.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok, thanks for the notification! I also solved the problem with larger NEQs by switching to PARDISO_64 , which works great although is a bit annoying to convert the ia and ja vectors of the CSR3 format to INT*8, thus doubling the allocated space...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok,here we are with the 90000 parameters NEQ to be solved ...
Starting reordering ...
*** Error in PARDISO ( insufficient_memory) error_num= 4
*** Error in PARDISO memory allocation: BEFORE_INIT_PARALLEL_DATA, allocation of 32289340 bytes failed
total memory wanted here: 160960551 kbyte
=== PARDISO: solving a symmetric positive definite system ===
1-based array indexing is turned ON
PARDISO double precision computation is turned ON
Minimum degree algorithm at reorder step is turned ON
Summary: ( reordering phase )
================
Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 192.547947 s
Time spent in reordering of the initial matrix (reorder) : 43.399159 s
Time spent in symbolic factorization (symbfct) : 49.032472 s
Time spent in allocation of internal data structures (malloc) : 10.008380 s
Time spent in additional calculations : 0.000203 s
Total time spent : 294.988161 s
Statistics:
===========
Parallel Direct Factorization is running on 5 OpenMP
< Linear system Ax = b >
number of equations: 90598
number of non-zeros in A: 4104044101
number of non-zeros in A (%): 50.000552
number of right-hand sides: 1
< Factors L and U >
< Preprocessing with multiple minimum degree, tree height >
< Reduction for efficient parallel factorization >
number of columns for each panel: 80
number of independent subgraphs: 0
number of supernodes: 1133
size of largest supernode: 90598
number of non-zeros in L: 4107621924
number of non-zeros in U: 1
number of non-zeros in L+U: 4107621925
Reordering completed ! 0
The following ERROR was detected: -2
I have provided 210 Gb of memory but still get a memory issue at the reordering phase. I also chose OOC mode with 300 Gb available but I suppose this is only used AFTER phase 11. I am using pardis_64 compiled with ILP64 (to be able to use INTEGER*8 and avoid overflows).
Is there any way around this issue? Would it help of increasing/reducing the number of cores? Or maybe combine different phases? Or skip the reordering phase? It would be of great help with "conference season" approaching.
Thanks for any hints!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your input is not sparse ( number of non-zeros in A (%): 50.000552) therefore for such dense cases this is the expected behavior. For such matrices we strongly recommend to use dense solver instead sparse one.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You are indeed right. Could you please provide a link to an MKL/Lapack library or routine you would suggest to use for such a problem?
This is what I found googling around: ?gels , https://software.intel.com/en-us/node/469160#EC9BE639-8638-4AF2-A4AC-74C9E0334883
Thanks!
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page