Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Pardiso Out of Core

atpq2680
Beginner
18,803 Views
Is PARDISO an out of core solver?. Then, can it access an out of core stored matrix?
Where can I find information on how the matrix has to be stored for out of core solution?
Is there other function in MKL that can solve a linear symmetrix system out of core?
Thanks
0 Kudos
25 Replies
Sudha_Rangan
Beginner
746 Views
Sergey, thank you for your response. We actually don't have any swap space set up on the machine, we have 64GB of RAM and that's it - I can make sure we set up enough swap space on the machine (maybe 128GB or more). But using the minimum degree algorithm instead of METIS also did not work, should that be the case (probably is if that also needs a significant amount of memory)?

So if we purchase the latest version of the MKL, it will be 10.3.0 Gold?

With METIS, I get a segmentation fault (immediately after the message about not opening the file ./pardiso_ooc.cfg).

With minimum degree, as posted before, the error message is as below. I will try things again as soon as we have swap space set up.

You entered matrix4096bf
Nonzero elements: 2890432512 Size (number of equations): 7077888
first value = -4422846.000000
first ia index = 1
, 2nd ia index = 217first ja index = 1
first rhs = -0.121058
a0: -4.422846e+06 a_end: -9.250865e+03
ia 0: 1 ai end: 2890432297
ja 0: 1 ja end: 7077888
b 0: -0.121058 b end: -0.004208
ooc_max_core_size got by Env = 256000
The file ./pardiso_ooc.cfg was not opened
*** Error in PARDISO ( reordering_phase) error_num= -180
*** error PARDISO: reordering, symb. factorization

================ PARDISO: solving a real struct. sym. system ================


Summary PARDISO: ( reorder to reorder )
================

Times:
======
Time fulladj: 283.975595 s
Time reorder: 5.503694 s
Time symbfct: 23.955411 s
Time malloc : 269.677817 s
Time total : 602.334729 s total - sum: 19.222212 s

Statistics:
===========
< Parallel Direct Factorization with #processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>
#equations: 7077888
#non-zeros in A: 2890432511
non-zeros in A (%): 0.005770

#right-hand sides: 1

< Factors L and U >
< Preprocessing with multiple minimum degree, tree height >
< Reduction for efficient parallel factorization >
#columns for each panel: 72
#independent subgraphs: 0
#supernodes: 125067

size of largest supernode: 810
number of nonzeros in L 5062328928
number of nonzeros in U 4636596168
number of nonzeros in L+U 9698925096

ERROR during symbolic factorization: -3

Thanks,
Sudha
0 Kudos
Sergey_Solovev__Inte
New Contributor I
746 Views

Sudha, In MKL10.3.0 Beta minimum degree couldnt work too.
I think that the latest available version is MKL10.2.6 and 10.3.Beta (http://software.intel.com/en-us/forums/intel-math-kernel-library/ ).

My recommendations are:
1) Use MKL10.2.6
2) Set 128G swap
3) Set MKL_PARDISO_OOC_MAX_CORE_SIZE not more than size of free RAM. (As I see, size of input matrix is about 48G. So free RAM is just 12000. )
4) Print iparm(57) and iparm(64) after reordering step and provide us with log.
An additional question: Could you variety the size of problem? What is the largest problem, which you can solve by PARDISO ILP64?

0 Kudos
Sudha_Rangan
Beginner
746 Views
Sergey - again thanks for your response. I will be sure to use 10.2.6, follow your suggestion on setting the MKL_PARDISO_OOC_MAX_CORE_SIZE and see what happens (as soon as we have our swap space set up).
The largest size of problem I have been able to solve thus far on this hardware:
# of rows: 3538944 and
# of non-zeros: 1445216255

As you can see, I am just doubling this matrix to get the one that fails. These are real, structurally symmetric matrices

Since I hadn't gotten to the ~2^31 limit yet in terms of NNZs, I was just hoping that there was no additional problem relating to int (vs. int64s) that this run was uncovering, hopefully there isn't and I will find out as soon as we can do the larger run (it is more difficult for me to produce a matrix that is closer to 2^31 in NNZs, while simple to produce the larger one with > 2.8 x 10^9 non-zeros), else I would try it with a matrix larger than the one that ran successfully, but smaller than the one I'm trying to run. (Our ultimate aim is to actually solve a matrix whose size in terms of non zeros is closer to 500x10^9).

Thanks,
Sudha
0 Kudos
Sudha_Rangan
Beginner
746 Views
Sergey - hello. It took a while to follow up, but we just recently obtained the resources to re-run this test. On a machine with 500G of RAM (no swap space, but humungous RAM ), using iparm(2) = 2 (nested dissection algorithm from Metis (actually this is iparm[1] here, C-style indexing), the matrix containing > 2.8x10^9 non-zeros and > 7x10^6 rows ran successfully, very quickly (under 1/2 an hour, I think) and produced seemingly correct results..
However, the same matrix still cannot be run in Out of Core mode, regardless of which reordering algorithm is used (Metis/minimum tree etc.). I get either the same error as reported before or a segmentation fault as before. (Smaller matrices are fine). Either in OOC mode, there is some problem when we cross the 32-bit threshold of roughly 2.2 x 10^9 non-zeros - or something (even with OOC_MKL_MAX_CORE_SIZE set to 256000 etc. - and now we do have enough memory for me to be able to set to such big sizes)..


Thanks,
Sudha
0 Kudos
Sergey_Solovev__Inte
New Contributor I
746 Views

Sudha, probably there are some problems of OOC behavior on large matrices in ILP64 mode.As I see, number of nonzero elements of LU factors is ~9,6*10^9, so 80GB RAM should be enough to store all LU factors. Could you try variety the OOC_MKL_MAX_CORE_SIZE?
Is there the same error if OOC_MKL_MAX_CORE_SIZE = 20000, 40000 or 80000?
Best regards,
Sergey Solovev

0 Kudos
Reply