Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.
7236 Discussions

Pardiso Bunch-Kaufman pivoting unstable for a saddle-point problem, CGS alternative is too slow

Oliari
Beginner
8,194 Views

Dear fellows,

 

I'm optimizing the CPU-time of a mixed finite element simulation for the Darcy's equation of a particular finite-element library coded in C++. The method results in a saddle-point problem where the solution x is found by solving a linear system Ax = b.  A is a sparse symmetric indefinite matrix, and b is the source term. The library' default  approach uses PARDISO CGS implementation, which delivers a consistent solution most of the time (in fact after thousands of simulations, I've encountered only 1 inconsistent result).  The issue with the current implementation arises after comparing the CPU-time of the statically condensed system A'x=b with the original Ax=b. In fact, the condensed system is faster for small problems, but for larger problems, the time is roughly the same, even when the number of degrees of freedom of A' is one order of magnitude lower than A. Further investigation using PARDISO messaging options, showed that although the dimensions of A and A' are a magnitude apart, the number of non-zeros in the factor L is close. It is important to remark that CGS A is previously ordered using SLOAN [SLO86] algorithm to minimize matrix' profile and wavefront and the program is supplied with a permutation vector equivalent to the identity matrix, otherwise the method either breaks or does not converge.

 

The PARDISO's suggested approach for saddle point problems works for small/middle-sized problems but is unstable for larger ones and, therefore, is unsuited for the general-purpose library I'm working with. For large problems the factors have so many perturbed pivots that each iteration refinement step increases the numerical error. Nonetheless, in one case, the simulation time dropped from 230 s to 8 s, the difference is big enough that optimizing the solver seems extremely important.  This approach uses Bunch-Kaufman 1x1 and 2x2 pivoting and either the Minimum Degree or Multiscale Nested Dissection algorithm for reducing fill-ins, scaling and matching are enabled, a range of  iteration refinement number of steps are tested, matrix A/A' is previously ordered using SLOAN algorithm, but this time the program uses its own permutation procedure.

 

Iparm values and Pardiso messages

 

The implementation of the library for symmetric indefinite systems calls pardiso_64 twice, for decomposition (phase 12) and for forward substitution (phase 33).  The non-default iParm values for and the user supplied permutation vector for phase 12 in the default implementation are:

 

fParam[34] = 1;
fParam[59] = 0;
fParam[4] = 1;
fParam[3 ] = 10*6+2;

\\permutation vector:
fPermutation.resize(n);
    for (long long i=0; i<n; i++) {
        fPermutation[i] = i;
    }

 

This implementation has little fill reduction control since fParam[4] restrict Pardiso from ordering the matrix, but when fParam[4] = 0 and fParam[10]=fParam[12] = \alpha, \alpha being either 0 or 1, the following error appears: Pardiso:: Zero pivot, numerical fact. or iterative refinement problem. The same error occurs when A is not previously reordered with the sloan algorithm. Phase 33 is pretty standard and I report its configuration isn't required.

To illustrate the problem, the output of phase 12 is reported next, alongside with the value of certain iparm outputs. 

 

// Phase 12 Pardiso messages
=== PARDISO is running in In-Core mode, because iparam(60)=0 ===

Percentage of computed non-zeros for LL^T factorization
 1 %  2 %  3 %  4 %  5 %  6 %  7 %  8 %  9 %  10 %  11 %  12 %  13 %  14 %  15 %  16 %  17 %  18 %  19 %  20 %  21 %  22 %  23 %  24 %  25 %  26 %  27 %  28 %  29 %  30 %  31 %  32 %  33 %  34 %  35 %  36 %  37 %  38 %  39 %  40 %  41 %  42 %  43 %  44 %  45 %  46 %  47 %  48 %  49 %  50 %  51 %  52 %  53 %  54 %  55 %  56 %  57 %  58 %  59 %  60 %  61 %  62 %  63 %  64 %  65 %  66 %  67 %  68 %  69 %  70 %  71 %  72 %  73 %  74 %  75 %  76 %  77 %  78 %  79 %  80 %  81 %  82 %  83 %  84 %  85 %  86 %  87 %  88 %  89 %  90 %  91 %  92 %  93 %  94 %  95 %  96 %  97 %  98 %  99 %  100 % 

=== PARDISO: solving a symmetric indefinite system ===
0-based array is turned ON
PARDISO double precision computation is turned ON
User provided fill-in reducing permutation is turned ON
Single-level factorization algorithm is turned ON


Summary: ( starting phase is reordering, ending phase is factorization )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.007723 s
Time spent in reordering of the initial matrix (reorder)         : 0.000130 s
Time spent in symbolic factorization (symbfct)                   : 0.165558 s
Time spent in data preparations for factorization (parlist)      : 0.022134 s
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct)                        : 2.676305 s
Time spent in allocation of internal data structures (malloc)    : 0.041895 s
Time spent in additional calculations                            : 0.040864 s
Total time spent                                                 : 2.954609 s

Statistics:
===========
Parallel Direct Factorization is running on 16 OpenMP
< Hybrid Solver PARDISO with CGS/CG Iteration >

< Linear system Ax = b >
             number of equations:           82432
             number of non-zeros in A:      639744
             number of non-zeros in A (%): 0.009415

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 64
             number of independent subgraphs:  0
< Preprocessing with input permutation >
             number of supernodes:                    16519
             size of largest supernode:               11
             number of non-zeros in L:                35771839
             number of non-zeros in U:                1
             number of non-zeros in L+U:              35771840
             gflop   for the numerical factorization: 17.535356

             gflop/s for the numerical factorization: 6.552077

// Printing some iparm outputs: 
Number of perturbed pivots(iParm[13]) = 0
Number of positive eingenvalues(iParm[21]) = 16384
Number of negative eingenvalues(iParm[22]) = 66048

// Phase 33 Pardiso messages
=== PARDISO: solving a symmetric indefinite system ===


Summary: ( solution phase )
================

Times:
======
Time spent in iterative solver at solve step (cgs)               : 0.217816 s cgx iterations 2

Time spent in allocation of internal data structures (malloc)    : 0.000049 s
Time spent in additional calculations                            : 0.000000 s
Total time spent                                                 : 0.217865 s

Statistics:
===========
Parallel Direct Factorization is running on 16 OpenMP
< Hybrid Solver PARDISO with CGS/CG Iteration >

< Linear system Ax = b >
             number of equations:           82432
             number of non-zeros in A:      639744
             number of non-zeros in A (%): 0.009415

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 64
             number of independent subgraphs:  0
< Preprocessing with input permutation >
             number of supernodes:                    16519
             size of largest supernode:               11
             number of non-zeros in L:                35771839
             number of non-zeros in U:                1
             number of non-zeros in L+U:              35771840
             gflop   for the numerical factorization: 17.535356

             gflop/s for the numerical factorization: 6.552077

// User print
nDofs = 82432
Computing Error 
Errors =(2.57042e-05, 0.0325949, 0.0325949, 0.0325949, 

 

Remark the number of perturbed pivots, the number of non-zeros in L.

Next, the parameters using the default configuration for saddle-point problems:

 

 fParam[34] = 1;
 fParam[59] = 0;
 fParam[10] = 1;
 fParam[12] = 1;

 

A combination of parameters iParm[20], iParm[23] and iParam[7] were tested as well. The output for this configuration is:

 

// Phase 12
=== PARDISO is running in In-Core mode, because iparam(60)=0 ===

Percentage of computed non-zeros for LL^T factorization
 1 %  2 %  3 %  4 %  5 %  6 %  7 %  8 %  9 %  10 %  11 %  12 %  13 %  14 %  15 %  16 %  17 %  18 %  19 %  20 %  21 %  22 %  23 %  24 %  25 %  26 %  27 %  28 %  29 %  30 %  31 %  32 %  33 %  34 %  35 %  36 %  37 %  38 %  39 %  40 %  41 %  42 %  43 %  44 %  45 %  46 %  47 %  48 %  49 %  50 %  51 %  52 %  53 %  55 %  72 %  85 %  95 %  100 % 

=== PARDISO: solving a symmetric indefinite system ===
0-based array is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON
Single-level factorization algorithm is turned ON


Summary: ( starting phase is reordering, ending phase is factorization )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.006325 s
Time spent in reordering of the initial matrix (reorder)         : 0.000187 s
Time spent in symbolic factorization (symbfct)                   : 0.046494 s
Time spent in data preparations for factorization (parlist)      : 0.001949 s
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct)                        : 0.064802 s
Time spent in allocation of internal data structures (malloc)    : 0.003441 s
Time spent in matching/scaling                                   : 0.397148 s
Time spent in additional calculations                            : 0.038770 s
Total time spent                                                 : 0.559116 s

Statistics:
===========
Parallel Direct Factorization is running on 16 OpenMP

< Linear system Ax = b >
             number of equations:           82432
             number of non-zeros in A:      639744
             number of non-zeros in A (%): 0.009415

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 64
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    16931
             size of largest supernode:               336
             number of non-zeros in L:                5181893
             number of non-zeros in U:                1
             number of non-zeros in L+U:              5181894
             gflop   for the numerical factorization: 0.752081

             gflop/s for the numerical factorization: 11.605837

// Iparm outputs
Number of perturbed pivots(iParm[13]) = 283
Number of positive eingenvalues(iParm[21]) = 16404
Number of negative eingenvalues(iParm[22]) = 66028

//Phase 33
=== PARDISO: solving a symmetric indefinite system ===


Summary: ( solution phase )
================

Times:
======
Time spent in direct solver at solve step (solve)                : 0.006467 s
Time spent in additional calculations                            : 0.014259 s
Total time spent                                                 : 0.020726 s

Statistics:
===========
Parallel Direct Factorization is running on 16 OpenMP

< Linear system Ax = b >
             number of equations:           82432
             number of non-zeros in A:      639744
             number of non-zeros in A (%): 0.009415

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 64
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    16931
             size of largest supernode:               336
             number of non-zeros in L:                5181893
             number of non-zeros in U:                1
             number of non-zeros in L+U:              5181894
             gflop   for the numerical factorization: 0.752081

             gflop/s for the numerical factorization: 11.605837

Pardiso:: perturbed pivots caused iterative refinement. 
// User print
nDofs = 82432
Computing Error 
Errors =(2.00013, 222.035, 222.044, 222.035, 
DOF = 82432

 

The approximation error is not right, and the error gets worse for greater iteration refinement steps.

The problem reported here is the smallest problem in which the Bunch-Kaufman pivoting is unstable. Since this problem is relatively small, the decomposition time is not very expressive, but as the problem scales, the default implementation gets relatively slower.

 

QUESTION

I wonder if

  • Is  there fill-reduction options for CGS method?
  • Is there pivoting alternatives from the Bunch-Kauffman or Diagonal 1x1 methods?
  • Is there other solution in PARDISO kit to accelerate the saddle-point problem or in other MKL toolkit?

With regards,

Victor Oliari

[SLO86] SLOAN S. W.: An algorithm for profile and wavefront reduction of sparse matrices. International Journal for Numerical Methods in Engineering 23, 2 (1986), 239–251.

0 Kudos
22 Replies
VidyalathaB_Intel
Moderator
7,131 Views

Hi Victor,


Thanks for reaching out to us.


Could you please provide us with a sample reproducer and steps to reproduce the issue so that we can work on it from our end?


>>...... the saddle-point problem or in other MKL toolkit?

As per the documentation, when working with a saddle-point problem we need to set iparm[10] = 1 (scaling) and iparm[12] = 1 (matchings) for highly indefinite symmetric matrices, which you already does as I could see it under default configuration for saddle-point problem.

>>Is there pivoting alternatives from the Bunch-Kauffman or Diagonal 1x1 methods?

At present as mentioned here, (https://www.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/sparse-solver-routines/onemkl-pardiso-parallel-direct-sparse-solver-iface.html) the solver uses diagonal pivoting, or 1x1 and 2x2 Bunch-Kaufman pivoting for symmetric indefinite matrices.

To control pivot elements that appear during numerical factorization you can use mkl_pardiso_pivot callback routine from Pardiso. 

>>Is there fill-reduction options for CGS method?

We will get back to you regarding this, meanwhile, you can refer to the following link for some details about CGS https://www.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/sparse-solver-routines/onemkl-pardiso-parallel-direct-sparse-solver-iface/pardiso-iparm-parameter.html#pardiso-iparm-parameter_IPARM13


So, we request you to share the sample reproducer so that we will try to address the problem that you are facing with MKL Pardiso.


Also, please do let us know the OS environment details on which you are working and the MKL version being used in your case.


Regards,

Vidya.


0 Kudos
Oliari
Beginner
7,096 Views

Hi Vidya,


I appreciate your reply,


I work with a general purpose finite element library called neopz on an external project called FEMcomparison. In order to reproduce this issue, please clone the libray and project repositories through the adresses:

Neopz library: https://github.com/labmec/neopz

    Branch: TimerFEMComparison;

    set-up with CMAKE:

       Using_MKL = 0n;

      Using_OMP = On

       after setting "Using_MKL = on" and configuring once: MKL_Thread_Model = omp (press 't' for advanced settings).

The library must be installed.

FEMcomparison project: https://github.com/labmec/femcomparison

    Branch: TimeMeasuring;

    set-up with CMAKE:

       FEMCOMPARISON_TIMER = ON;

       FEMCOMPARISON_USING_MKL = ON;

       USING LCC_MATRIX = ON;

       MKL_THREAD_MODEL = omp (advanced settings after configuring once with "FEMCOMPARISON_USING_MKL = ON")

       NeoPZ_Dir = Path to neopz library: <path to neopz installation>/lib/cmake/neopz;

Run target HybridH1 in Femcomparison/FEM/Simulations/main_HybridH1.cpp. The target let you the following FEM properties:

pConfig.k = 1;//
pConfig.n = 1;
pConfig.problem = "EArcTan"; ///{"ESinSin","EArcTan",ESteklovNonConst", "ESteepWave"}
pConfig.approx = "Mixed"; //// {"H1","Hybrid", "Mixed"}
pConfig.topology = "Hexahedral"; //// Triangular, Quadrilateral, Tetrahedral, Hexahedral, Prism
pConfig.refLevel =5; //// How many refinements
pConfig.postProcess = false; //// Print geometric and computational mesh
pConfig.shouldColor =false;
pConfig.isTBB = false;
pConfig.tData.nThreads = 16;

You can use use the parameters above to run the tests. The problem arises only for big meshes. The topologies "Hexahedral" and "Quadrilateral" were tested for a range of refinement levels (from 2-9 for Quadrilaterals and 1-5, for Hexahedrals). The program uses an uniform refinement pattern, hence "pConfig.reflevel = 0" stands for a single quadrilateral, "pConfig.reflevel = 1", for a 2x2 mesh of identical quadrilaterals and so on. Next, it follows the computed errors using Pardiso pivoting strategy for a range of refinement levels on a quadrilateral topology:

 

nref,nDof,L2Error,EnergyError,maxnRef,iterNum
2,96,0.756311,13.3734,9,3
2,96,0.756311,13.3734,9,-1
2,96,0.756311,13.3734,9,-2
3,352,0.113176,4.12839,9,3
3,352,0.113176,4.12839,9,-1
3,352,0.113176,4.12839,9,-2
4,1344,0.0276299,1.49112,9,3
4,1344,0.0276299,1.49112,9,-1
4,1344,0.0276299,1.49112,9,-2
5,5248,0.00357966,0.530131,9,3
5,5248,0.00357966,0.530131,9,-1
5,5248,0.00357966,0.530131,9,-2
6,20736,20487.3,340166,9,3
6,20736,7088.08,340166,9,-1
6,20736,6773.66,340166,9,-2
7,82432,232.831,94201.7,9,3
7,82432,256.561,94201.7,9,-1
7,82432,264.51,94201.7,9,-2
8,328704,986678,9.43071e+07,9,3
8,328704,856841,9.43071e+07,9,-1
8,328704,804771,9.43071e+07,9,-2
9,1312768,60037.8,5.79

The first row stands for the refinement level and the third row, stands for the error in the L2 norm. Each refinement level is ran 3 times. The results show that the solver is stable until nref = 5, but is unstable for nref > 5. There is a similar pattern for hexahedrals.

The code enables running mixed or hybrid formulations, but both results in a similar condensed matrix A' in the end, so you can ran all simulations with "pConfig.approx = mix", "pConfig.n=1" (for Hybrid approx. n=2 for 2D and n=3 for 3D).

If you happen to test Pardiso CGS implementation, I encourage you to test the non statically condensed version of the problem for "pConfig.topology = hexahedral" and pConfig.relevel = 5". In this case, the non-condensed matrix is an order of magnitude larger than its condensed version, but takes the same time to be decomposed. This can be done by commenting "CreateCondensedMixedElements" function on the line 114 at FEMComparison/FEM/Solver.cpp for "pConfig.approx = mixed" or "createspace.GroupAndCondenseElements" function on the line 160 at the same file.

If you want to disable the SLOAN ordering, please change the call "TPZanalysis(multicmesh)" to "TPZanalysis(multicmesh, 0)" on the line 335 at FEMComparison/FEM/Solver.cpp.

The pardiso configuration takes place at Neopz/Matrix/TPZPardisoControl.cpp. Refer to function

void TPZPardisoControl<TVar>::Decompose()

for phase 12, to function

void TPZPardisoControl<TVar>::Solve(TPZFMatrix<TVar> &rhs, TPZFMatrix<TVar> &sol) const

for phase 33. The neopz is currently solving symetric indefinite systems with Pardiso CGS implementation. To test other pivoting techniques please alter fParam values in this function.

>> Also, please do let us know the OS environment details on which you are working and the MKL version being used in your case.

I'm working on a Ubuntu 18.04.3 LTS Linux distribution. For the mkl version, the function

mkl_get_version(&Version)

yields:

Major version: 2021
Minor version: 0
Update version: 3
Product status: Product
Build: 20210617
Platform: Intel(R) 64 architecture
Processor optimization: Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) enabled processors
================================================================

Please let me know if you have any problem reproducing the issue,


With regards,

Victor Oliari

 

0 Kudos
VidyalathaB_Intel
Moderator
7,090 Views

Hi Victor,


Thanks for sharing the project details.

Could you please share with us the steps to reproduce the issue?

Also please do let us know if there are any dependencies that are required to build the shared project.


Regards,

Vidya


0 Kudos
VidyalathaB_Intel
Moderator
7,048 Views

Hi Victor,


Thanks for sharing the details here. I'll try it out and get back to you if I face any issues in reproducing the case.


We regret the inconvenience caused to you. I have seen your latest post (https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Can-t-reply-edit-or-comment-own-posts/m-p/1396760#M33340) where you have reported your troubles in editing and commenting on the replies, seems to be an issue with the forum, anyways we will take care of it, and you can rest assured as I could see all your earlier replies now on the forum and now we can continue our interaction here in this thread hopefully.


With your permission can I close the latest post (https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Can-t-reply-edit-or-comment-own-posts/m-p/1396760#M33340) on our end and continue our conversation right here in this thread?


Please do let us know. (In case there is still a problem on the forum side, we will get in touch with you privately to resolve the issue for time being.)


Regards,

Vidya.


0 Kudos
Oliari
Beginner
7,038 Views

Hi Vidya,

 

Thank you for solving the forum issue.

 

>> With your permission can I close the latest post (https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Can-t-reply-edit-or-comment-own-post...) on our end and continue our conversation right here in this thread?

 

Sure, that post can be deleted. Also, this post is full of duplicated replies now, and I can't delete them. If it is possible, could you delete the duplicate replies here? I don't think I have access to delete functionalities.

 

Regards,

Oliari

0 Kudos
VidyalathaB_Intel
Moderator
6,993 Views

Hi Victor,

 

I've tried installing the neopz library with the suggested CMake set up and the instructions in the Readme file, during the make install step I got some errors, attaching the screenshot here.

Could you please help us in reproducing the issue by providing us with the exact steps that you have followed for installing the neopz library?

Meanwhile, could you please confirm if you see the same issue with MKL 2022.1 version as we see that you are using mkl 2021?

 

Regards,

Vidya.

0 Kudos
Oliari
Beginner
6,974 Views

Hi Vidya,

 

The cmake set-up might not be right on your end. I've attached a picture with the set-up used to configure Neopz libary and another with FEMComparison CMake configuration. If the issue persists, we could schedule a small meeting to get it right, just let me know an appropriate time and the specific time zone. I can send a google meet link here after settling the right schedule. 

 

With Regards,

Victor Oliari

 

 

 

 

0 Kudos
VidyalathaB_Intel
Moderator
6,940 Views

Hi Victor,

 

I apologize for the delay.

I managed to build the NeoPZ library successfully this time. Now I'm facing issues with the FEMcomparison library (attaching the screenshot of the errors) though I'm setting the option for using MKL it is throwing a warning as manually specified variables was not used by the project during cmake step.

It would be a great help if you provide us with the commands that you followed to build the library.

 

Also could you please try with the latest MKL version 2022.1 and see if the issue still persists with pardiso?

 

Regards,

Vidya.

 

0 Kudos
Oliari
Beginner
6,916 Views

Hi Vidya,

 

I've tested the project with MKL version 2022.2, and the issue persists. From your picture, it seems there is a problem in finding MKL dynamic libraries in the project. Could you share the CMakeCache.txt of both projects here so I can check if the set-up is right? Also, could you try switching the MKL_THRED_MODEL on advanced options in cmake from tbb to omp or vice-versa? There are some flags in neopz-project to activate omp (USING_OMP) or tbb (USING_TBB) that could be useful.

 

I had an issue on mac during the installation where libimop5.dylib could't be found, hence I've copied these libraries into .../mkl/lib. I don't think you should have this problem on linux though.

 

Regards,
Oliari

 

 

0 Kudos
VidyalathaB_Intel
Moderator
6,902 Views

Hi Victor,


>>Could you share the CMakeCache.txt of both projects

At present, I'm facing some issues in accessing those machines which I tried previously. So I can't attach the requested files here. 

If possible, could you please try to isolate the issue that you are facing with MKL pardiso and provide in the form of a sample reproducer code to reproduce the unstable results that you are mentioning so that it would be easier to address the issue quickly?

Please do let me know.


Regards,

Vidya.


0 Kudos
VidyalathaB_Intel
Moderator
6,868 Views

Hi Victor,


Could you please provide us with an update regarding the issue and the possibility of isolating the issue with MKL pardiso? Please do let us know.


Regards,

Vidya.


0 Kudos
Oliari
Beginner
6,858 Views

Hi Vidya,

I've been working on a sample code of the problem, but it is taking a little longer than I anticipated, I apologize for not letting you know about it. I'll post the details once it is ready.

 

With Regards,
Victor

0 Kudos
Oliari
Beginner
6,833 Views

Hi Vidya,

 

I apologize for taking this long to answer, but there is an unexpected behavior in the sample code that I required a more careful attention.

 

The matrix generated by the finite element interface is now saved in a *.txt file  that I read and feed to Pardiso interface. The source code is found here:

https://github.com/OliariVictor/PardisoTest/tree/master 

Due to the size of the matrices and source vector, they are stored in google.drive through the address:

https://drive.google.com/drive/folders/1XKgjevz9khw-c-eESNquyw1MpA7dtTJi?usp=sharing 

There are two matrices and source vectors in the drive, entitled *_ref8.txt and *_ref9.txt. The latter corresponds to a bigger matrix than the former. In order to run the code, please specify the address on your local machine to the matrix and source vector in the beginning of the main.cpp file. In order to verify the numerical error from the pardiso solver, the code checks the 1-norm of Ax-b. Changing the boolean "UsingCGS" from true to false switches the iparm configuration from no pivoting CGS solver to Bunch-Kaufmen pivoting.

Unexpectedly, the numerical error for *_ref8.txt is within acceptable bounds, but for *_ref9.txt Pardiso reports segmentation fault and the execution crashes. I've checked the consistency of the matrix and vector, and both are fine. There seems to be a scalability issue within Pardiso or the source code that I can't figure out why. I've tested both on mac and linux, where I had plenty of RAM-memory space available.

Another unexpected behavior is that the numerical error for *ref8.txt and UsingCGS=false is within acceptable bounds. When I ran the same matrix, with the same iParm values, but using the finite element interface, the error is not acceptable, only if using UsingCGS=true it would be.  In this case, I have the same matrix and source vector, only the data structure might differ, but the results are clearly different. I could not specify the cause of this issue and in order to check if the results are random, I wish to run a bigger matrix (*ref_9.txt), but I faced the segmentation fault issue.

Have  you faced some of these issues?
With regards,

Victor Oliari

 

0 Kudos
VidyalathaB_Intel
Moderator
6,795 Views

Hi Victor,

 

Thanks for your efforts in isolating the issue and providing us with the test code.

I tried running the provided sample and could see that when using *ref9.txt input files (matrix and vector) it is giving a segmentation fault error.

We are working on this issue and will get back to you.

 

Meanwhile, could you please confirm if I made it right about the UsingCGS settings that you are mentioning in the last paragraph?

 

>>numerical error for *ref8.txt and UsingCGS=false is within acceptable bounds.. the error is not acceptable, only if using UsingCGS=true it would be..

When setting UsingCGS to false the error is within acceptable bounds or is it setting UsingCGS to true the error within acceptable bounds?

 

Please share the output that you are getting when working with *.ref8.txt input files.

I'm also attaching the output that I got when using input files from *ref8.txt.

Please do let me know if there is any mismatch or incorrect.

 

Regards,

Vidya.

 

0 Kudos
Oliari
Beginner
6,750 Views

Hi Vidya,

 

Thank you for replying, about the points you've mentioned:

>>I tried running the provided sample and could see that when using *ref9.txt input files (matrix and vector) it is giving a segmentation fault error.

     This issue is a bit odd, because we are able to run this problem within the library, but we're unable to on the sample code. The CMakelist.txt of the sample code is based on the CMakelist of the library and the dependencies (MKL and other third party tools) are the same. Unfortunately, I coudln't solve the issue on the sample code.

>> Meanwhile, could you please confirm if I made it right about the UsingCGS settings that you are mentioning in the last paragraph?

On the sample code, the simulations ran with UsingCGS = true and UsingCGS = false yields reduced and consistent (same within multiple runs) numerical errors, but when I run it on the library, for the same iParm settings, the same dependencies, but maybe different data structures, a consistent and reduced error is obtained only for UsingCGS = true. For UsingCGS = false on the library the error explodes and the numerical error is different for every run.

 

>> Please share the output that you are getting when working with *.ref8.txt input files.

The numerical errors obtained on my end are the same as yours, the solver is consistent. It is notable that even for a small problem like *ref8.txt, the simulation time is significantly greater for UsingCGS = true.

Regards,

Victor

0 Kudos
VidyalathaB_Intel
Moderator
6,736 Views

Hi Victor,


>> I could not specify the cause of this issue and in order to check if the results are random, I wish to run a bigger matrix (*ref_9.txt), but I faced the segmentation fault issue.

Regarding the segmentation fault error that we are getting with *.ref9.txt,

could you please try increasing the stack size by using the below command before running the code?


ulimit -s 65536


Regards,

Vidya.



0 Kudos
Oliari
Beginner
6,718 Views

Hi Vidya,

 

The sample code run smoothly after increasing the user limit stack size conforming your suggestion, thanks. Nonetheless, this sample code is not reproducing the issue we're facing in the library environment, since the numerical error is small when using UsingCGS = false. I'm making some tests running this sample code in the pz environment, I'll inform you after further investigation.

 

Regards,

Victor

0 Kudos
VidyalathaB_Intel
Moderator
6,711 Views

Hi Victor,


>> I'll inform you after further investigation.

Yeah Sure! Thanks for letting me know and you can get back to us with the details once you are done with the investigation.


Regards,

Vidya.


0 Kudos
Oliari
Beginner
6,538 Views

Hi Vidya,

Now we're able to reproduce the issue I'm facing with the library. The matrix and the right-hand side that we used didn't have enough precision to reproduce the issue on the library. This makes sense, since  Bunch-Kauffman algorithm replaces the zero-values in the diagonal blocks with small deltas. The former data structure had 6 digits of precision, and for a delta = 10^-8 (iParm[9]), the algorithm is robust enough. When I ran the algorithm with enhanced precision, the numerical error (measured by norm1 of Ax-b) exploded.

The matrix and the right hand side with enhanced precision for *.ref_8 can be found in the same link as before, at

https://drive.google.com/drive/folders/1XKgjevz9khw-c-eESNquyw1MpA7dtTJi?usp=sharing  

I didn't update the structure for *ref.9 since the error is already evident for *ref.8.

Now the example is representative of the error I'm facing. Note that the numerical error for UsingCGS = true is very small, and it explodes otherwise.

With regards,
Oliari

0 Kudos
VidyalathaB_Intel
Moderator
6,521 Views

Hi Victor,


Could you please show us the results that you are getting after enhancing the precision of matrix and the right hand side values?


Regards,

Vidya.


0 Kudos
Reply