- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am running PARDISO routines to solve a linear equation system with more than 40000 parameters. I use the "intel mkl 2016.1.150" libraries and set up the variable "export MKL_PARDISO_OOC_MAX_CORE_SIZE=120000". My system has 138Gb of RAM. I attach here below an extract of my code which gets in input the A matrix in CSR3 format and gives the solution x vector as output.
SUBROUTINE parsol(neq, a, ja, ia) IMPLICIT NONE c List of Parameters c ------------------ TYPE(t_neq) :: neq INTEGER*4, DIMENSION(*) :: ia INTEGER*4, DIMENSION(*) :: ja REAL*8, DIMENSION(*) :: a c Local Parameters c ---------------- c Local Variables c --------------- C.. Internal solver memory pointer INTEGER*8 pt(64) C.. All other variables INTEGER*4 maxfct, mnum, mtype, phase, error, nrhs, msglvl INTEGER*4 iparm(64) REAL*8 dparm(64) REAL*8 b(neq%misc%npar) REAL*8 x(neq%misc%npar) INTEGER*4 i, j, idum, solver REAL*8 waltime1, waltime2, ddum, normb, normr C.. Fill all arrays containing matrix data. C Number of right-hand-sides to solve nrhs = 1 C Other parameters maxfct = 1 mnum = 1 C C .. Setup Pardiso control parameters und initialize the solvers C internal adress pointers. This is only necessary for the FIRST C call of the PARDISO solver. C mtype = ... C 1 real and structurally symmetric C 2 real and symmetric positive definite C -2 real and symmetric indefinite C 3 complex and structurally symmetric C 4 complex and Hermitian positive definite C -4 complex and Hermitian indefinite C 6 complex and symmetric C 11 real and nonsymmetric C 13 complex and nonsymmetric mtype = 2 C Initialisation pt(:) = 0 iparm(1) = 0 ! initializes all iparm to their default values CALL pardisoinit(pt, mtype, iparm) C .. Memory use (in or out core) iparm(27) = 1 iparm(60) = 1 C.. Reordering and Symbolic Factorization, This step also allocates C all memory that is necessary for the factorization phase = 11 ! only reordering and symbolic factorization msglvl = 1 ! with (1) or without (0) statistical information WRITE(*,*) 'Starting reordering ...' CALL pardiso (pt, maxfct, mnum, mtype, phase, 1 neq%misc%npar, a, ia, ja, 1 idum, nrhs, iparm, msglvl, ddum, ddum, error, dparm) WRITE(*,*) 'Reordering completed ! ', 1 max(iparm(15), iparm(16)+iparm(63)) IF (error .NE. 0) THEN WRITE(*,*) 'The following ERROR was detected: ', error STOP END IF C.. Factorization. C phase = ... C 11 Analysis C 12 Analysis, numerical factorization C 13 Analysis, numerical factorization, solve, iterative refinement C 22 Numerical factorization C 23 Numerical factorization, solve, iterative refinement C 33 Solve, iterative refinement C 331 like phase=33, but only forward substitution C 332 like phase=33, but only diagonal substitution (if available) C 333 like phase=33, but only backward substitution C 0 Release internal memory for L and U matrix number mnum C -1 Release all internal memory for all matrices phase = 22 ! only factorization CALL pardiso (pt, maxfct, mnum, mtype, phase, 1 neq%misc%npar, a, ia, ja, idum, 2 nrhs, iparm, msglvl, ddum, ddum, error, dparm) WRITE(*,*) 'Factorization completed ... ' IF (error .NE. 0) THEN WRITE(*,*) 'The following ERROR was detected: ', error STOP ENDIF C.. Back substitution and iterative refinement iparm(8) = 1 ! max numbers of iterative refinement steps phase = 33 ! only solve b = neq%bnor CALL pardiso (pt, maxfct, mnum, mtype, phase, 1 neq%misc%npar, a, ia, ja, 1 idum, nrhs, iparm, msglvl, b, x, error, dparm) WRITE(*,*) 'Solve completed ... ' neq%xxx = x C.. Memory release phase = -1 ! only solve CALL pardiso (pt, maxfct, mnum, mtype, phase, 1 neq%misc%npar, a, ia, ja, 1 idum, nrhs, iparm, msglvl, b, x, error, dparm) WRITE(*,*) 'Memory released ... '
Here below is the program output giving back an memory problem. When I run the same program with the same configuration with up to around 32000 parameters, everything works smoothly (and it's astonishingly fast and efficient!).
Starting reordering ... *** Error in PARDISO ( insufficient_memory) error_num= 1 *** Error in PARDISO memory allocation: MATCHING_REORDERING_DATA, allocation of 1 bytes failed total memory wanted here: 6388548 kbyte === PARDISO: solving a symmetric positive definite system === 1-based array indexing is turned ON PARDISO double precision computation is turned ON METIS algorithm at reorder step is turned ON Summary: ( reordering phase ) ================ Times: ====== Time spent in calculations of symmetric matrix portrait (fulladj): 22.780998 s Time spent in reordering of the initial matrix (reorder) : 0.000000 s Time spent in symbolic factorization (symbfct) : 0.000000 s Time spent in allocation of internal data structures (malloc) : 1.213865 s Time spent in additional calculations : 9.502059 s Total time spent : 33.496922 s Statistics: =========== Parallel Direct Factorization is running on 16 OpenMP < Linear system Ax = b > number of equations: 40397 number of non-zeros in A: 815979003 number of non-zeros in A (%): 50.001238 number of right-hand sides: 1 < Factors L and U > number of columns for each panel: 64 number of independent subgraphs: 0 < Preprocessing with state of the art partitioning metis> number of supernodes: 0 size of largest supernode: 0 number of non-zeros in L: 0 number of non-zeros in U: 0 number of non-zeros in L+U: 0 Reordering completed ! 0 The following ERROR was detected: -2
Any idea what's the problem? I do not think it's an hardware limitation since the system has 136Gb of RAM and the system is only 6Gb ... I also tried to solve the problem in OOC mode or with less threads without any luck.
Thanks for your help,
Stefano
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Stefano, 40000 parameters, is that the neq?
could you try to set iparm(2) == 0 and check the problem again?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Stefano, 40000 parameters, is that the neq?
could you try to set iparm(2) == 0 and check the problem again?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, thanks for the quick reply!
Yes, ~40000 parameters is the NEQ (passed to the routine in CSR3 format in the a, ia and ja vectors, I use the neq structure only to pass the r.h.s. and the solution vector).
I tried to set iparm(2)=0 as you suggested and everything worked correctly (between 22-59Gb of memory were needed using 16 OpenMP, as with iparm(2) = 1 but this time nothing crashed).
Why does the [Karypis98] algorithm give this issue with my neq or settings? what are the pros and cons of the two algorithms? Which one should I use - or what else - when going for larger matrices?
Thanks a lot!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi again,
unfortunately I got again a similar error when trying to solve a 240x240 neq ~ 58000 par ... this time "BEFORE REORDERING" and with 180Gb RAM made available (I can potentially have up to 256Gb and my goal is to process a 300x300 NEQ).
I also notice that this time the "total memory wanted" is negative (overflow?) ... could this be a problem of 32/64 bits ?
Starting reordering ... *** Error in PARDISO ( insufficient_memory) error_num= 2 *** Error in PARDISO memory allocation: BEFORE_REORDERING, allocation of -3601676 bytes failed total memory wanted here: -3581798 kbyte === PARDISO: solving a symmetric positive definite system === Summary: ( reordering phase ) ================ Times: ====== Time spent in additional calculations : 2.038890 s Total time spent : 2.038890 s Statistics: =========== Parallel Direct Factorization is running on 16 OpenMP < Linear system Ax = b > number of equations: 58077 number of non-zeros in A: 1686498003 number of non-zeros in A (%): 50.000861 number of right-hand sides: 1 < Factors L and U > number of columns for each panel: 64 number of independent subgraphs: 0 < Preprocessing with state of the art partitioning metis> number of supernodes: 0 size of largest supernode: 0 number of non-zeros in L: 0 number of non-zeros in U: 0 number of non-zeros in L+U: 0 Reordering completed ! 0 The following ERROR was detected: -2
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
MKL 11.3 experiences the run-time issue with METIS reordering ( iparm[2] == 2 ). The issue has been already fixed into the nearest MKL 11.3 update 3 which we planning to release soon. The notification about that will be published at the Top of MKL forum and I will keep you updated also into this thread. As a temporarily work-around the problem, please use minimum degree algorithm which is a little slower vs nested dissection.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok, thanks for the notification! I also solved the problem with larger NEQs by switching to PARDISO_64 , which works great although is a bit annoying to convert the ia and ja vectors of the CSR3 format to INT*8, thus doubling the allocated space...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok,here we are with the 90000 parameters NEQ to be solved ...
Starting reordering ... *** Error in PARDISO ( insufficient_memory) error_num= 4 *** Error in PARDISO memory allocation: BEFORE_INIT_PARALLEL_DATA, allocation of 32289340 bytes failed total memory wanted here: 160960551 kbyte === PARDISO: solving a symmetric positive definite system === 1-based array indexing is turned ON PARDISO double precision computation is turned ON Minimum degree algorithm at reorder step is turned ON Summary: ( reordering phase ) ================ Times: ====== Time spent in calculations of symmetric matrix portrait (fulladj): 192.547947 s Time spent in reordering of the initial matrix (reorder) : 43.399159 s Time spent in symbolic factorization (symbfct) : 49.032472 s Time spent in allocation of internal data structures (malloc) : 10.008380 s Time spent in additional calculations : 0.000203 s Total time spent : 294.988161 s Statistics: =========== Parallel Direct Factorization is running on 5 OpenMP < Linear system Ax = b > number of equations: 90598 number of non-zeros in A: 4104044101 number of non-zeros in A (%): 50.000552 number of right-hand sides: 1 < Factors L and U > < Preprocessing with multiple minimum degree, tree height > < Reduction for efficient parallel factorization > number of columns for each panel: 80 number of independent subgraphs: 0 number of supernodes: 1133 size of largest supernode: 90598 number of non-zeros in L: 4107621924 number of non-zeros in U: 1 number of non-zeros in L+U: 4107621925 Reordering completed ! 0 The following ERROR was detected: -2
I have provided 210 Gb of memory but still get a memory issue at the reordering phase. I also chose OOC mode with 300 Gb available but I suppose this is only used AFTER phase 11. I am using pardis_64 compiled with ILP64 (to be able to use INTEGER*8 and avoid overflows).
Is there any way around this issue? Would it help of increasing/reducing the number of cores? Or maybe combine different phases? Or skip the reordering phase? It would be of great help with "conference season" approaching.
Thanks for any hints!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your input is not sparse ( number of non-zeros in A (%): 50.000552) therefore for such dense cases this is the expected behavior. For such matrices we strongly recommend to use dense solver instead sparse one.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You are indeed right. Could you please provide a link to an MKL/Lapack library or routine you would suggest to use for such a problem?
This is what I found googling around: ?gels , https://software.intel.com/en-us/node/469160#EC9BE639-8638-4AF2-A4AC-74C9E0334883
Thanks!

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page