- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Nowadays, I'm trying to solve a very large problem by mkl_pardiso. I have enough RAM which is 512G and enough disk memory which is 700G. I had tried to use in-core and out-core respectively to solve this problem, but the same error appeared as follows:

*** Error in PARDISO ( insufficient_memory) error_num= 8

*** Error in PARDISO memory allocation: FACTORIZE_SOLVING_LU_DATA, allocation of 399802848 bytes failed

total memory wanted here: 431388504 kbyte

I have not found the reason causing this error and could not solve that.

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi!

According to the part of the mssage which says "total memory wanted here: 431388504 kbyte", the amount of data required for the factorization at that place is ~ 421 Gb. While it is less than 512G which you say you have, remember that there is also a memory required for storing other information, e.g. from the first phase.

My suggestion is:

Split the call to pardiso (in case you call it with phase = 13) into three calls (phase = 11, phase = 22 and phase = 33). After the call with phase = 11 print out iparm[14]-iparm[17] (see https://software.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/sparse-solver-routines/onemkl-pardiso-parallel-direct-sparse-solver-interface/pardiso-iparm-parameter.html).

1) for In-Core version: I assume that you will see that max(iparm[14], iparm[15]+iparm[16]

2) for Out-of-Core version: as the description of iparm[59] and the KB article say, check that you have increased MKL_PARDISO_OOC_MAX_CORE_SIZE as much as possible. For OOC mode the peak memory consumption is different, it is approx. the sum of the permanent memory required for phase 1 + minimal amount of RAM memory which is needed for OOC (OOC puts the majority of data on the disc but not everything).

I assume that in your case phase 1, iparm[15] is a huge number already so there is not enough RAM left even for the OOC factorization with reduced memory consumption.

Also, you can set msglvl = 1 and share with us the output from phase 1, as well as the output values of iparm[14]-iparm[17] after the first phase.

I hope this helps.

Best,

Kirill

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Thanks for the quick reply!

I have tried your advice to print iparm(14)~iparm(17). The results are as follows:

###########################################################

=== PARDISO: solving a complex symmetric system ===

1-based array indexing is turned ON

PARDISO double precision computation is turned ON

METIS algorithm at reorder step is turned ON

Summary: ( reordering phase )

================

Times:

======

Time spent in calculations of symmetric matrix portrait (fulladj): 4.072709 s

Time spent in reordering of the initial matrix (reorder) : 80.376569 s

Time spent in symbolic factorization (symbfct) : 48.914574 s

Time spent in data preparations for factorization (parlist) : 0.925490 s

Time spent in allocation of internal data structures (malloc) : 2.337880 s

Time spent in additional calculations : 35.297460 s

Total time spent : 171.924682 s

Statistics:

===========

Parallel Direct Factorization is running on 52 OpenMP

< Linear system Ax = b >

number of equations: 8578350

number of non-zeros in A: 341060574

number of non-zeros in A (%): 0.000463

number of right-hand sides: 750

< Factors L and U >

number of columns for each panel: 96

number of independent subgraphs: 0

< Preprocessing with state of the art partitioning metis>

number of supernodes: 833888

size of largest supernode: 46866

number of non-zeros in L: 26367541862

number of non-zeros in U: 1

number of non-zeros in L+U: 26367541863

iparm14 0

iparm15 26143180

iparm16 22409342

iparm17 433185621

max ! 48552522

=== PARDISO is running in In-Core mode, because iparam(60)=0 ===

*** Error in PARDISO ( insufficient_memory) error_num= 8

*** Error in PARDISO memory allocation: FACTORIZE_SOLVING_LU_DATA, allocation of 417321914 bytes failed

total memory wanted here: 439731255 kbyte

############################################

According to the above result, I think the memory required is (iparm[16]+iparm[17]) which is below 512G. Why did such an error happen? How can I solve this error except by increasing the RAM or reducing the equation?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi again,

Thanks for sharing the numbers! It does seem to me that indeed the peak total memory should be no more than approx. (22409342 + 433185621) Kb ~ 456 Gb so that it should fit into 512 Gb RAM.

I see several possibilities:

1) The memory estimate provided by PARDISO is not accurate and the actual memory consumption is more than reported.

2) There is something else which eats the memory. Can there be any other data in the application that consumes a significant amount of RAM?

To check 1) on our side, we'd need your data so that we can set up our own experiment and see what happens inside the solver. If it is possible, please share your data

I have a couple of suggestions:

1) How many rhs are do you have? If you have multiple of them, do a loop with phase = 33 with a single rhs.

2) Decrease the number of threads (say, use twice less).

The rationale is that there is certain amount of memory used in PARDISO which is proportional to #rhs, #threads and even #rhs * #threads, if I am not mistaken. Check if doing 1) or 2) reduces the numbers reported and fixes the failure on allocation.

3) I think you're using iparm[23]=0 or you have scaling &matching on. Turn off matching & scaling (iparm[10] = iparm[12] = 0) and try with iparm[23] = 1. The PARDISO output should become smaller which would indicate implicitly that another major factorization algorithm is used, which can fix the memory issue potentially.

4) Have you considered switching to our Cluster Sparse Solver (distributed direct sparse solver)? You could use several compute nodes to fight against the RAM limitation.

Best,

Kirill

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Thanks, again!

I have changed to an HPC platform with 1T memory. Now there are no errors. The output is as follows:

#############################################################################

=== PARDISO: solving a complex symmetric system ===

1-based array indexing is turned ON

PARDISO double precision computation is turned ON

METIS algorithm at reorder step is turned ON

Summary: ( reordering phase )

================

Times:

======

Time spent in calculations of symmetric matrix portrait (fulladj): 2.184849 s

Time spent in reordering of the initial matrix (reorder) : 50.963541 s

Time spent in symbolic factorization (symbfct) : 36.010459 s

Time spent in data preparations for factorization (parlist) : 0.278686 s

Time spent in allocation of internal data structures (malloc) : 0.782705 s

Time spent in additional calculations : 17.872615 s

Total time spent : 108.092855 s

Statistics:

===========

Parallel Direct Factorization is running on 64 OpenMP

< Linear system Ax = b >

number of equations: 8578350

number of non-zeros in A: 341060648

number of non-zeros in A (%): 0.000463

number of right-hand sides: 1

< Factors L and U >

number of columns for each panel: 192

number of independent subgraphs: 0

< Preprocessing with state of the art partitioning metis>

number of supernodes: 810389

size of largest supernode: 42072

number of non-zeros in L: 25587382264

number of non-zeros in U: 1

number of non-zeros in L+U: 25587382265

iparm14 0

iparm15 43927359

iparm16 36780394

iparm17 440462269

max ! 80707753

ooc_max_core_size got by Env=1000000000

ooc_max_swap_size got by Env=1000000000

ooc_keep_file got by Env=0

The file ./pardiso_ooc.cfg was not opened

=== PARDISO is running in In-Core mode, because iparam(60)=1 and there is enough RAM for In-Core ===

Percentage of computed non-zeros for LL^T factorization

1 % 2 % 3 % 4 % 5 % 6 % 7 % 8 % 9 % 10 % 11 % 12 % 13 % 14 % 15 % 16 % 17 % 18 % 19 % 20 % 21 % 22 % 23 % 24 % 25 % 26 % 27 % 28 % 29 % 30 % 31 % 32 % 33 % 34 % 35 % 36 % 37 % 38 % 39 % 40 % 41 % 42 % 43 % 44 % 45 % 46 % 47 % 48 % 49 % 50 % 51 % 52 % 53 % 54 % 55 % 56 % 57 % 58 % 59 % 60 % 61 % 62 % 63 % 64 % 65 % 66 % 67 % 68 % 70 % 72 % 73 % 74 % 75 % 77 % 78 % 80 % 81 % 83 % 84 % 86 % 87 % 89 % 90 % 92 % 93 % 95 % 96 % 97 % 99 % 100 %

=== PARDISO: solving a complex symmetric system ===

Single-level factorization algorithm is turned ON

Summary: ( factorization phase )

================

Times:

======

Time spent in copying matrix to internal data structure (A to LU): 0.000000 s

Time spent in factorization step (numfct) : 896.804199 s

Time spent in allocation of internal data structures (malloc) : 0.000631 s

Time spent in additional calculations : 0.000006 s

Total time spent : 896.804836 s

Statistics:

===========

Parallel Direct Factorization is running on 64 OpenMP

< Linear system Ax = b >

number of equations: 8578350

number of non-zeros in A: 341060648

number of non-zeros in A (%): 0.000463

number of right-hand sides: 1

< Factors L and U >

number of columns for each panel: 192

number of independent subgraphs: 0

< Preprocessing with state of the art partitioning metis>

number of supernodes: 810389

size of largest supernode: 42072

number of non-zeros in L: 25587382264

number of non-zeros in U: 1

number of non-zeros in L+U: 25587382265

gflop for the numerical factorization: 1668942.981819

gflop/s for the numerical factorization: 1860.989259

=== PARDISO: solving a complex symmetric system ===

Summary: ( solution phase )

================

Times:

======

Time spent in direct solver at solve step (solve) : 53.879760 s

Time spent in additional calculations : 52.638003 s

Total time spent : 106.517763 s

#######################################################################

But bad luck, a new problem appeared. The results obtained are NAN. I tried many solutions but failed. So I adjusted my example whose equations were reduced to 4500000 (origin is 8570000). The adjusted example got well results. So how can I solve this problem in the condition of keeping my original equation? Does PARDISO have a limit in solving equations?

#########################################################################

My current iparm settings are as follows:

iparm=0

iparm( 1)= 1!

iparm( 2)= 2 ! the minimum degree algorithm is applied

iparm(3)=mkl_set_num_threads(64)

iparm( 4)= 0 ! no iterative-direct algorithm

iparm( 5)= 0 ! no user fill-in reducing permutation

iparm( 6)= 0 ! =0 solution on the first n compoments of x

iparm( 8)= 0 ! numbers of iterative refinement steps

iparm(10)= 8 ! perturbe the pivot elements with 1e-8

iparm(11)= 0 ! 0 for symmetric indefinite matrices (mtype =-2, mtype =-4, or mtype =6)

iparm(13)= 0 ! maximum weighted matching algorithm is switched-off (default for symmetric). try iparm(13) = 1 in case of inappropriate accuracy

iparm(14)= 0 ! output: number of perturbed pivots

iparm(18)= 0 ! output: number of nonzeros in the factor lu

iparm(19)= 0 ! output: mflops for lu factorization

iparm(20)= 0 ! output: numbers of cg iterations

iparm(60)= 1 !

################################################################

Looking forward to your responses!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi again,

There are a couple of comments regarding your iparm settings:

Do you initialize your iparm with all zeros before changing some specific ones? I hope you do.

iparm( 1)= 1!

iparm( 2)= 2 ! the minimum degree algorithm is applied

iparm(3)=**mkl_set_num_threads(64) ! must be 0, see the docs https://software.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-fortran/top/sparse-solver-routines/onemkl-pardiso-parallel-direct-sparse-solver-interface/pardiso-iparm-parameter.html**

iparm( 4)= 0 ! no iterative-direct algorithm

iparm( 5)= 0 ! no user fill-in reducing permutation

iparm( 6)= 0 ! =0 solution on the first n compoments of x

iparm( 8)= 0 ! numbers of iterative refinement steps

iparm(10)= 8 ! perturbe the pivot elements with 1e-8

iparm(11)= 0 ! 0 for symmetric indefinite matrices (mtype =-2, mtype =-4, or mtype =6)

iparm(13)= 0 ! maximum weighted matching algorithm is switched-off (default for symmetric). try iparm(13) = **1 in case of inappropriate accuracy, ! please try with 0 as one experiment**

iparm(14)= 0 ! output: number of perturbed pivots

iparm(18)= 0 ! output: number of nonzeros in the factor lu

iparm(19)= 0 ! output: mflops for lu factorization

iparm(20)= 0 ! output: numbers of cg iterations

iparm(60)= 1 !

iparm(23) = **1 as another experiment, try setting iparm(24)=1 together with iparm(13)=0**

Also, which version of MKL are you using? If smth old, please try the latest.

If bad behavior remains, please share with us your matrix data so that we can reproduce and analyze the issue on our side.

Best,

Kirill

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi @wangsen,

Any updates on this?

Just a quick reminder to share your MKL version.

Let us know if you face any issues.

Thanks,

Rahul

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi @wangsen,

Could you please let us know if your issue is resolved?

If not, could you please share your matrix data so that we can try it out at our end?

Thanks,

Rahul

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi,

I have not heard back from you. So, I will go ahead and close this thread from my end. Feel free to post a new query if you require further assistance from Intel.

Thanks,

Rahul

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page