Re:mkl_pardiso error_num=8

wangsen · ‎02-01-2021

Nowadays, I'm trying to solve a very large problem by mkl_pardiso. I have enough RAM which is 512G and enough disk memory which is 700G. I had tried to use in-core and out-core respectively to solve this problem, but the same error appeared as follows:

*** Error in PARDISO ( insufficient_memory) error_num= 8
*** Error in PARDISO memory allocation: FACTORIZE_SOLVING_LU_DATA, allocation of 399802848 bytes failed
total memory wanted here: 431388504 kbyte

I have not found the reason causing this error and could not solve that.

Kirill_V_Intel · ‎02-01-2021

Hi!

According to the part of the mssage which says "total memory wanted here: 431388504 kbyte", the amount of data required for the factorization at that place is ~ 421 Gb. While it is less than 512G which you say you have, remember that there is also a memory required for storing other information, e.g. from the first phase.

My suggestion is:
Split the call to pardiso (in case you call it with phase = 13) into three calls (phase = 11, phase = 22 and phase = 33). After the call with phase = 11 print out iparm[14]-iparm[17] (see https://software.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/sparse-solver-routines/onemkl-pardiso-parallel-direct-sparse-solver-interface/pardiso-iparm-parameter.html).
1) for In-Core version: I assume that you will see that max(iparm[14], iparm[15]+iparm[16]) (total peak consumption for In-Core mode) will be > 512 G.
2) for Out-of-Core version: as the description of iparm[59] and the KB article say, check that you have increased MKL_PARDISO_OOC_MAX_CORE_SIZE as much as possible. For OOC mode the peak memory consumption is different, it is approx. the sum of the permanent memory required for phase 1 + minimal amount of RAM memory which is needed for OOC (OOC puts the majority of data on the disc but not everything).
I assume that in your case phase 1, iparm[15] is a huge number already so there is not enough RAM left even for the OOC factorization with reduced memory consumption.

Also, you can set msglvl = 1 and share with us the output from phase 1, as well as the output values of iparm[14]-iparm[17] after the first phase.

I hope this helps.

Best,
Kirill

wangsen · ‎02-01-2021

Thanks for the quick reply!

I have tried your advice to print iparm(14)~iparm(17). The results are as follows:

###########################################################

=== PARDISO: solving a complex symmetric system ===
1-based array indexing is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON

Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 4.072709 s
Time spent in reordering of the initial matrix (reorder) : 80.376569 s
Time spent in symbolic factorization (symbfct) : 48.914574 s
Time spent in data preparations for factorization (parlist) : 0.925490 s
Time spent in allocation of internal data structures (malloc) : 2.337880 s
Time spent in additional calculations : 35.297460 s
Total time spent : 171.924682 s

Statistics:
===========
Parallel Direct Factorization is running on 52 OpenMP

< Linear system Ax = b >
number of equations: 8578350
number of non-zeros in A: 341060574
number of non-zeros in A (%): 0.000463

number of right-hand sides: 750

< Factors L and U >
number of columns for each panel: 96
number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
number of supernodes: 833888
size of largest supernode: 46866
number of non-zeros in L: 26367541862
number of non-zeros in U: 1
number of non-zeros in L+U: 26367541863
iparm14 0
iparm15 26143180
iparm16 22409342
iparm17 433185621
max ! 48552522
=== PARDISO is running in In-Core mode, because iparam(60)=0 ===
*** Error in PARDISO ( insufficient_memory) error_num= 8
*** Error in PARDISO memory allocation: FACTORIZE_SOLVING_LU_DATA, allocation of 417321914 bytes failed
total memory wanted here: 439731255 kbyte

############################################

According to the above result, I think the memory required is (iparm[16]+iparm[17]) which is below 512G. Why did such an error happen? How can I solve this error except by increasing the RAM or reducing the equation?

Kirill_V_Intel · ‎02-01-2021

Hi again,

Thanks for sharing the numbers! It does seem to me that indeed the peak total memory should be no more than approx. (22409342 + 433185621) Kb ~ 456 Gb so that it should fit into 512 Gb RAM.

I see several possibilities:

1) The memory estimate provided by PARDISO is not accurate and the actual memory consumption is more than reported.
2) There is something else which eats the memory. Can there be any other data in the application that consumes a significant amount of RAM?

To check 1) on our side, we'd need your data so that we can set up our own experiment and see what happens inside the solver. If it is possible, please share your data

I have a couple of suggestions:
1) How many rhs are do you have? If you have multiple of them, do a loop with phase = 33 with a single rhs.
2) Decrease the number of threads (say, use twice less).
The rationale is that there is certain amount of memory used in PARDISO which is proportional to #rhs, #threads and even #rhs * #threads, if I am not mistaken. Check if doing 1) or 2) reduces the numbers reported and fixes the failure on allocation.
3) I think you're using iparm[23]=0 or you have scaling &matching on. Turn off matching & scaling (iparm[10] = iparm[12] = 0) and try with iparm[23] = 1. The PARDISO output should become smaller which would indicate implicitly that another major factorization algorithm is used, which can fix the memory issue potentially.
4) Have you considered switching to our Cluster Sparse Solver (distributed direct sparse solver)? You could use several compute nodes to fight against the RAM limitation.

Best,
Kirill

wangsen · ‎02-03-2021

Thanks, again!

I have changed to an HPC platform with 1T memory. Now there are no errors. The output is as follows:

#############################################################################

=== PARDISO: solving a complex symmetric system ===
1-based array indexing is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON

Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 2.184849 s
Time spent in reordering of the initial matrix (reorder) : 50.963541 s
Time spent in symbolic factorization (symbfct) : 36.010459 s
Time spent in data preparations for factorization (parlist) : 0.278686 s
Time spent in allocation of internal data structures (malloc) : 0.782705 s
Time spent in additional calculations : 17.872615 s
Total time spent : 108.092855 s

Statistics:
===========
Parallel Direct Factorization is running on 64 OpenMP

< Linear system Ax = b >
number of equations: 8578350
number of non-zeros in A: 341060648
number of non-zeros in A (%): 0.000463

number of right-hand sides: 1

< Factors L and U >
number of columns for each panel: 192
number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
number of supernodes: 810389
size of largest supernode: 42072
number of non-zeros in L: 25587382264
number of non-zeros in U: 1
number of non-zeros in L+U: 25587382265
iparm14 0
iparm15 43927359
iparm16 36780394
iparm17 440462269
max ! 80707753
ooc_max_core_size got by Env=1000000000
ooc_max_swap_size got by Env=1000000000
ooc_keep_file got by Env=0
The file ./pardiso_ooc.cfg was not opened
=== PARDISO is running in In-Core mode, because iparam(60)=1 and there is enough RAM for In-Core ===

Percentage of computed non-zeros for LL^T factorization
1 % 2 % 3 % 4 % 5 % 6 % 7 % 8 % 9 % 10 % 11 % 12 % 13 % 14 % 15 % 16 % 17 % 18 % 19 % 20 % 21 % 22 % 23 % 24 % 25 % 26 % 27 % 28 % 29 % 30 % 31 % 32 % 33 % 34 % 35 % 36 % 37 % 38 % 39 % 40 % 41 % 42 % 43 % 44 % 45 % 46 % 47 % 48 % 49 % 50 % 51 % 52 % 53 % 54 % 55 % 56 % 57 % 58 % 59 % 60 % 61 % 62 % 63 % 64 % 65 % 66 % 67 % 68 % 70 % 72 % 73 % 74 % 75 % 77 % 78 % 80 % 81 % 83 % 84 % 86 % 87 % 89 % 90 % 92 % 93 % 95 % 96 % 97 % 99 % 100 %

=== PARDISO: solving a complex symmetric system ===
Single-level factorization algorithm is turned ON

Summary: ( factorization phase )
================

Times:
======
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct) : 896.804199 s
Time spent in allocation of internal data structures (malloc) : 0.000631 s
Time spent in additional calculations : 0.000006 s
Total time spent : 896.804836 s

Statistics:
===========
Parallel Direct Factorization is running on 64 OpenMP

< Linear system Ax = b >
number of equations: 8578350
number of non-zeros in A: 341060648
number of non-zeros in A (%): 0.000463

number of right-hand sides: 1

< Factors L and U >
number of columns for each panel: 192
number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
number of supernodes: 810389
size of largest supernode: 42072
number of non-zeros in L: 25587382264
number of non-zeros in U: 1
number of non-zeros in L+U: 25587382265
gflop for the numerical factorization: 1668942.981819

gflop/s for the numerical factorization: 1860.989259

=== PARDISO: solving a complex symmetric system ===

Summary: ( solution phase )
================

Times:
======
Time spent in direct solver at solve step (solve) : 53.879760 s
Time spent in additional calculations : 52.638003 s
Total time spent : 106.517763 s

#######################################################################

But bad luck, a new problem appeared. The results obtained are NAN. I tried many solutions but failed. So I adjusted my example whose equations were reduced to 4500000 (origin is 8570000). The adjusted example got well results. So how can I solve this problem in the condition of keeping my original equation? Does PARDISO have a limit in solving equations?

#########################################################################

My current iparm settings are as follows:
iparm=0
iparm( 1)= 1!
iparm( 2)= 2 ! the minimum degree algorithm is applied
iparm(3)=mkl_set_num_threads(64)
iparm( 4)= 0 ! no iterative-direct algorithm
iparm( 5)= 0 ! no user fill-in reducing permutation
iparm( 6)= 0 ! =0 solution on the first n compoments of x
iparm( 8)= 0 ! numbers of iterative refinement steps
iparm(10)= 8 ! perturbe the pivot elements with 1e-8
iparm(11)= 0 ! 0 for symmetric indefinite matrices (mtype =-2, mtype =-4, or mtype =6)
iparm(13)= 0 ! maximum weighted matching algorithm is switched-off (default for symmetric). try iparm(13) = 1 in case of inappropriate accuracy
iparm(14)= 0 ! output: number of perturbed pivots
iparm(18)= 0 ! output: number of nonzeros in the factor lu
iparm(19)= 0 ! output: mflops for lu factorization
iparm(20)= 0 ! output: numbers of cg iterations
iparm(60)= 1 !
################################################################

Looking forward to your responses!

Kirill_V_Intel · ‎02-05-2021

Hi again,

There are a couple of comments regarding your iparm settings:

Do you initialize your iparm with all zeros before changing some specific ones? I hope you do.

iparm( 1)= 1!
iparm( 2)= 2 ! the minimum degree algorithm is applied
iparm(3)=mkl_set_num_threads(64) ! must be 0, see the docs https://software.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-fortran/top/sparse-solver-routines/onemkl-pardiso-parallel-direct-sparse-solver-interface/pardiso-iparm-parameter.html
iparm( 4)= 0 ! no iterative-direct algorithm
iparm( 5)= 0 ! no user fill-in reducing permutation
iparm( 6)= 0 ! =0 solution on the first n compoments of x
iparm( 8)= 0 ! numbers of iterative refinement steps
iparm(10)= 8 ! perturbe the pivot elements with 1e-8
iparm(11)= 0 ! 0 for symmetric indefinite matrices (mtype =-2, mtype =-4, or mtype =6)
iparm(13)= 0 ! maximum weighted matching algorithm is switched-off (default for symmetric). try iparm(13) = 1 in case of inappropriate accuracy, ! please try with 0 as one experiment
iparm(14)= 0 ! output: number of perturbed pivots
iparm(18)= 0 ! output: number of nonzeros in the factor lu
iparm(19)= 0 ! output: mflops for lu factorization
iparm(20)= 0 ! output: numbers of cg iterations
iparm(60)= 1 !
iparm(23) = 1 as another experiment, try setting iparm(24)=1 together with iparm(13)=0

Also, which version of MKL are you using? If smth old, please try the latest.

If bad behavior remains, please share with us your matrix data so that we can reproduce and analyze the issue on our side.

Best,
Kirill

RahulV_intel · ‎02-11-2021

Hi @wangsen,

Any updates on this?

Just a quick reminder to share your MKL version.

Let us know if you face any issues.

Thanks,

Rahul

RahulV_intel · ‎02-18-2021

Hi @wangsen,

Could you please let us know if your issue is resolved?

If not, could you please share your matrix data so that we can try it out at our end?

Thanks,

Rahul

RahulV_intel · ‎02-25-2021

Hi,

I have not heard back from you. So, I will go ahead and close this thread from my end. Feel free to post a new query if you require further assistance from Intel.

Thanks,

Rahul