Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
7266 Discussions

Error in PARDISO ( numerical_factorization) error_num= -987

Andreas_Fabri__Geome
4 967 Visites
Hello,

I try to solve a sparse system with pardiso, using the evaluation version of the Beta of the MKL
on Windows 7, 64.

As I have to enable out-of-core if necessary I initialize the parameters as follows:

m_piparm[0] = 1; // No solver default
m_piparm[1] = 2;
m_piparm[9] = 0;
m_piparm[17] = -1;
m_piparm[20] = 1;
m_piparm[26] = 1;
m_piparm[59] = 1; // out off core if necessary


Here is the trace of the pardiso run. Any help is appreciated, and if necessary
I could dump the sparse symmetric matrix in a file and make it available.

Best regards,

Andreas Fabri


=== PARDISO is running in Out-Of-Core mode, because iparam(60)=1 and there is no
t enough RAM for In-Core ===


================ PARDISO: solving a symm. posit. def. system ================


Summary PARDISO: ( reorder to reorder )
================

Times:
======
Time fulladj: 1.618750 s
Time reorder: 48.901887 s
Time symbfct: 6.202610 s
Time malloc : 1.084790 s
Time total : 85.589953 s total - sum: 27.781916 s

Statistics:
===========
< Parallel Direct Factorization with #processors: > 1
< Numerical Factorization with Level-3 BLAS performance >

< Linear system Ax = b>
#equations: 2797565
#non-zeros in A: 23286826
non-zeros in A (): 0.000298

#right-hand sides: 1

< Factors L and U >
#columns for each panel: 128
#independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
#supernodes: 1300322
size of largest supernode: 3421
number of nonzeros in L 604905508
number of nonzeros in U 1
number of nonzeros in L+U 604905509
Percentage of computed non-zeros for LL^T factorization
0 %
1 %
.
.
44 %
Fseek failed
*** Error in PARDISO ( numerical_factorization) error_num= -987
PARDISO Internationalization error! Message -987 is unknown

================ PARDISO: solving a symm. posit. def. system ================


Summary PARDISO: ( factorize to factorize )
================

Times:
======
Time A to LU: 0.000000 s
Factorization: Time for writing to files : 0.000000
Factorization: Time for reading from files : 0.000000
Time numfct : 0.000000 s
Time malloc : 0.053992 s
Time total : 105.836084 s total - sum: 105.782091 s

Statistics:
===========
< Parallel Direct Factorization with #processors: > 1
< Numerical Factorization with Level-3 BLAS performance >

< Linear system Ax = b>
#equations: 2797565
#non-zeros in A: 23286826
non-zeros in A (): 0.000298

#right-hand sides: 1

< Factors L and U >
#columns for each panel: 128
#independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
#supernodes: 1300322
size of largest supernode: 3421
number of nonzeros in L 604905508
number of nonzeros in U 1
number of nonzeros in L+U 604905509
gflop for the numerical factorization: 886.436031


The error code is : -4
0 Compliments
24 Réponses
mecej4
Contributeur émérite III
4 338 Visites
This is just a guess -- I have no experience with huge matrices--:

An out-of-core solver needs to write and read large temporary files, so the 'fseek error' suggests that you look at the possibility that the program ran out of disk space while processing the temporary files.
0 Compliments
Alexander_K_Intel2
4 338 Visites
Hi,

This problem could occur when during LL^T decomposition zero or negative diagonalelement appeared. Try to change mtype =2 on mtype = -2, probably it could resolve the problem.
With best regards,
Alexander Kalinkin
0 Compliments
Andreas_Fabri__Geome
4 338 Visites
Switching to mtype=-2 did not help. Here is the output.
As you are from Intel. the error_num -987 should help
you to help me, shouldn't it?


best regards,

andreas


The file .\pardiso_ooc.cfg was not opened

=== PARDISO is running in Out-Of-Core mode, because iparam(60)=1 and there is no
t enough RAM for In-Core ===


================ PARDISO: solving a symmetric indef. system ================


Summary PARDISO: ( reorder to reorder )
================

Times:
======
Time fulladj: 1.662469 s
Time reorder: 49.211687 s
Time symbfct: 6.262312 s
Time malloc : 1.055497 s
Time total : 86.830331 s total - sum: 28.638366 s

Statistics:
===========
< Parallel Direct Factorization with #processors: > 1
< Numerical Factorization with Level-3 BLAS performance >

< Linear system Ax = b>
#equations: 2796570
#non-zeros in A: 23279108
non-zeros in A (): 0.000298

#right-hand sides: 1

< Factors L and U >
#columns for each panel: 128
#independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
#supernodes: 1300079
size of largest supernode: 3576
number of nonzeros in L 588215272
number of nonzeros in U 1
number of nonzeros in L+U 588215273
Percentage of computed non-zeros for LL^T factorization
0 Compliments
Alexander_K_Intel2
4 338 Visites
Hi Andreas

The error=-987 is internal error that couldn't appeared in normal situation. Could you check your matrix by setting iparm(27) = 1 in Fortran (iparm[26] in C) and size of free memory on hard disk (you must have around 8Gb free space on HDD). If everything is correct could you send testcase (example with matrix that chrashed) to investigate problem?
With best regards,
Alexander Kalinkin
0 Compliments
Gennady_F_Intel
Modérateur
4 338 Visites
Andreas, how about free space availble on your system?
nnz is ~ 588215272 will require ~ 5 Gb memory available
--Gennady
0 Compliments
Andreas_Fabri__Geome
4 338 Visites
Hello,

I have 83 GB available, so disk space should not be the problem.

I also had alreadyt set iparm[26]. For completeness, here are the other parameters I've set.
Could you verify that they are correct. I find it rather error-prone that when I only want
to change one parameter(as out of core), I must figure out for all the others, what the default is.

m_piparm[0] = 1; // No solver default
m_piparm[1] = 2;
m_piparm[9] = 8; // iparm(10)- pivoting perturbation.
m_piparm[17] = -1;
m_piparm[20] = 1;
m_piparm[26] = 1;
m_piparm[59] = 1; // out off core if necessary


Do you have any standard file format that I should use for storing the system?

Best regards,

andreas
0 Compliments
Gennady_F_Intel
Modérateur
4 338 Visites
Andreas,
What MKL beta version you are evaluate?
Could you check how it will works with clear OOC mode ( iparm[59] == 2) instead of hybrid mode you are using.
--Gennady
0 Compliments
Andreas_Fabri__Geome
4 338 Visites

I downloaded w_mkl_10.3.0.055.exe

Concerning the temporary file, in which directory does it go?
I ask because I am wondering what happens when the virus scanner
(Norton) tries to check it.

andreas
0 Compliments
Gennady_F_Intel
Modérateur
4 338 Visites
by default -the OOC PARDISO uses the current directory for storing data.
0 Compliments
Gennady_F_Intel
Modérateur
4 338 Visites
Thierry,
the same problem with OOC or hybryd mode?
--Gennady
0 Compliments
Thierry_LE_SOMMER__E
4 338 Visites
I tested the two modes (iparam[59]=1 and iparam[59]=2) without success.

I am using MKL 10.2.5.035 with Visual Studio 2008 on Windows 7 x64

Thierry
0 Compliments
Gennady_F_Intel
Modérateur
4 338 Visites
well and you had the similar error == -987?

0 Compliments
Thierry_LE_SOMMER__E
4 338 Visites
Here is the log :

ooc_path got by Env = C:\Dev\OptimTopo\Code\ooc_file
ooc_max_core_size got by Env = 3000
ooc_keep_file got by Env = 1

=== PARDISO is running in Out-Of-Core mode, because iparam(60)=2 ===

Percentage of computed non-zeros for LL^T factorization
0 %
1 %
2 %
...
40 %
41 %
42 %
Fseek failed
*** Error in PARDISO ( numerical_factorization) error_num= -987
*** Error in PARDISO: zero pivot

================ PARDISO: solving a real struct. sym. system ================


Summary PARDISO: ( reorder to factorize )
================

Times:
======
Time fulladj: 0.134167 s
Time reorder: 4.507111 s
Time symbfct: 2.230421 s
Time parlist: 2.000479 s
Time A to LU: 0.000000 s
Factorization: Time for writing to files : 0.000000
Factorization: Time for reading from files : 0.000000
Time numfct : 0.000000 s
Time malloc : 10.436600 s
Time total : 294.919680 s total - sum: 275.610902 s

Statistics:
===========
< Parallel Direct Factorization with #processors: > 4
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>
#equations: 408483
#non-zeros in A: 31756329
non-zeros in A (): 0.019032

#right-hand sides: 1

< Factors L and U >
#columns for each panel: 96
#independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
#supernodes: 39349
size of largest supernode: 9840
number of nonzeros in L 626636223
number of nonzeros in U 605550762
number of nonzeros in L+U 1232186985
gflop for the numerical factorization: 5644.826505


ERROR during symbolic and numerical factorization: -4*** Error in PARDISO (read/write OOC data file) error_num= 0


0 Compliments
Thierry_LE_SOMMER__E
4 338 Visites
I tried MKL 10.3 Beta and i had the same error.

With iparam[27]=0, I got :

=== PARDISO is running in Out-Of-Core mode, because iparam(60)=2 ===

Percentage of computed non-zeros for LL^T factorization
0 %
1 %
2 %
...
40 %
41 %
42 %
Fseek failed
*** Error in PARDISO ( numerical_factorization) error_num= -987
PARDISO Internationalization error! Message -987 is unknown



With iparam[27]=1, I got :

=== PARDISO is running in Out-Of-Core mode, because iparam(60)=2 ===

Percentage of computed non-zeros for LL^T factorization
0 %
1 %
2 %
...
83 %
84 %
85 %
Fseek failed
Fseek failed
Fseek failed
*** Error in PARDISO ( numerical_factorization) error_num= -987
PARDISO Internationalization error! Message -987 is unknown


This can perhaps help you...
0 Compliments
Gennady_F_Intel
Modérateur
4 338 Visites
Hello guys,because it is completely unknown to us the problem and our internal tests do not reproduce it,
I can only ask to send us this information.
At least this will allow us to significantly speed up this error investigation.
--Gennady
0 Compliments
Thierry_LE_SOMMER__E
4 338 Visites
You can download my matrix (ia, ja and a arrays) here : http://lesommer.free.fr/matrix_ed_lesommer.zip
I know that my matrix has zero elements.
0 Compliments
Gennady_F_Intel
Modérateur
4 338 Visites
Thanks, we will check and let you know if any update.
0 Compliments
Sergey_Solovev__Inte
Nouveau contributeur I
4 338 Visites

Hello,

We downloaded matrix and successfully factorized it with MKL10.2.5 (see log below).

May be the problem is in free space on hard disc. Number of LU-factors is 1 232 186 985. To store them on hard disc, MKL OOC PARDISO requires about 12GB free space (1 232 186985 *8Byte).

How much free space is on hard disc? Also, please print out iparam[63]. It is internal parameter, which can help us identify the version of MKL PARDISO.

************************************ ooc_max_core_size got by Env = 3000

The file .\pardiso_ooc.cfg was not opened

=== PARDISO is running in Out-Of-Core mode, because iparam(60)=2 ===

Percentage of computed non-zeros for LL^T factorization
0 %

1 %

2 %

3 %

...

98 %

99 %

100 %

================ PARDISO: solving a real struct. sym. system ================

Summary PARDISO: ( reorder to factorize )

================

Times:

======

Time fulladj: 0.115263 s

Time reorder: 3.636191 s

Time symbfct: 3.471022 s

Time parlist: 0.321256 s

Time A to LU: 0.000000 s

Factorization: Time for writing to files : 0.000000

Factorization: Time for reading from files : 0.000000

Time numfct : 428.636476 s

Time malloc : 0.586887 s

Time total : 440.670663 s total - sum: 3.903568 s

Statistics:

===========

< Parallel Direct Factorization with #processors: > 4

< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>

#equations: 408483

#non-zeros in A: 31756329

non-zeros in A (): 0.019032

#right-hand sides: 1

< Factors L and U >

#columns for each panel: 96

#independent subgraphs: 0

< Preprocessing with state of the art partitioning metis>

#supernodes: 39349

size of largest supernode: 9840

number of nonzeros in L 626636223

number of nonzeros in U 605550762

number of nonzeros in L+U 1232186985

gflop for the numerical factorization: 5644.826505

gflop/s for the numerical factorization: 13.169263

0 Compliments
Thierry_LE_SOMMER__E
4 338 Visites
Hello,

The free space on the hard disk is not the problem. I have 100Go free.

I think I found the problem. This comes from the library mkl_intel_thread.lib.
With mkl_intel_thread.lib => OK
With mkl_intel_thread_dll.lib => Error -987

Now it works for me with the versions : 10.2.5, 10.2.6 and 10.3.0 beta

Thierry
0 Compliments
Gennady_F_Intel
Modérateur
4 127 Visites
Thierry,
Could you please clarifyhow did you link application when the erro_num = -987 has been encountered?
then, we will try to reproduce the problem on our side also.
--Gennady
0 Compliments
Répondre