Re:PARDISO: Is it possible to create a deep copy of the _MKL_DSS_HANDLE_t pt for a class?

BrenoA · ‎08-10-2022

I am creating a class that wraps the Intel PARDISO solver, to facilitate performing analysis and numerical factorization once and then performing the solve and iterative refinement repeatedly for any right-hand side later.

The goal is to create something equivalent to Matlab's decomposition class (which can be seen here).

In my class, phase 12 of Intel PARDISO is performed during the instantiation of an object of the class, for a given input sparse CSR matrix, passed to the constructor. Then a "solve" method from the object can be called performing phase 33 (or by directly calling the object, using a functor). Phase -1 is performed in the destructor.

I only use pardiso_64, so static arrays of long long int pt[64] and long long int iparm[64] are member variables of each object, along with all of the other simple parameters. The matrix values a, ia and ja, are stored as std::vector's.

Now, I would like to implement a copy method. I have tried implementing this method by initializing an object of the same type, then deep copying (i.e. copying by value) the attributes from *this to the new object and returning it.

But I am having an issue. Suppose I create a first object called factor1. Then I create a copy, factor2 = factor1.copy(). The problem is that if factor2 is deleted, then factor1 can no longer be used, because phase -1 was called in factor2's destructor. Although the member variable pt of factor2 is different than factor1, their values are the same, and trying to call phase 33 in factor1 causes a segfault.

Is there a way around this?

I have a feeling the only way to implement this would be to create a reference counter. Then only call phase -1 when all copies and the original object are destroyed. Or maybe just not allow copies anyway...

ShanmukhS_Intel · ‎08-13-2022

Hi,

Thank you for posting on Intel Communities.

Could you please share a sample reproducer regarding the implementation of the wrapper class, so that we could check it and guide you on the feasibility accordingly?

Best Regards,

Shanmukh.SS

BrenoA · ‎08-13-2022

Okay! I mostly program in Cython/Python, but I've tried creating a sample project in C++, shown below.

I'm getting an access violation in the destructor in this example.

In the example below, a factor is created from a CSR matrix. A solution is obtained from a right-hand side with two columns. Then another factor is created by copying the first one. And a second solution is obtained from another right-hand side with two columns.

I believe the access violation errors occurs when the second factor is attempted to be destroyed, since the pt paramter shares information between them.

The results are printed out using std::format, thus requiring passing /std:c++20 to MSVC. But by commenting lines 6, 28 to 34 and 43 to 50, of "main.cpp", you can disable this requirement.

I use Visual Studio 2019 and manually use MKL 2021.3 in x64 mode. Thus, /DMKL_ILP64 is passed to the compiler. And libiomp5md.dll is copied to the debug/release folders (how do I avoid copying them, add the intelOneAPI's version to PATH?).

Firstly, the header file, "intelParSolver.h":

#ifndef INTEL_PARDISO_CLASS
#define INTEL_PARDISO_CLASS
#include <vector>
using LLINT = long long int;

class ParSolver {
  public:
    LLINT m_idum;
    LLINT m_pt[64], m_iparm[64];
    LLINT m_maxfct, m_mnum, m_phase, m_error, m_msglvl, m_mtype;
    LLINT m_nrows, m_ncols;
    std::vector<LLINT> m_indices, m_indptr;
    std::vector<double> m_data;
    double m_ddum;

    ParSolver(std::vector<double> data_v, std::vector<LLINT> indices_v,
              std::vector<LLINT> indptr_v, LLINT nrows_v, LLINT ncols_v,
              bool is_symmetric_v, bool enable_maching_scaling_v);

    ~ParSolver();

    ParSolver copy();

    std::vector<double> solve_A(std::vector<double> &rhs,
      const LLINT n_rhs);

    std::vector<double>operator() (std::vector<double> &rhs,
      const LLINT n_rhs);

  private:
    ParSolver(ParSolver *Parent);  // for copying
    void factorize();
};

#endif

Then the implementation, "intelParSolver.cpp":

#include "intelParSolver.h"
#include "mkl.h"
#include <stdexcept>
#include <string>
#include <cmath>

ParSolver::ParSolver(std::vector<double> data_v, std::vector<LLINT> indices_v,
    std::vector<LLINT> indptr_v, LLINT nrows_v, LLINT ncols_v,
    bool is_symmetric = false, bool enable_matching_scaling = true) :
    m_data(data_v), m_indices(indices_v), m_indptr(indptr_v), m_nrows(nrows_v),
    m_ncols(ncols_v), m_maxfct(1), m_mnum(1), m_phase(12), m_error(0),
    m_msglvl(0), m_mtype(0), m_idum(0), m_ddum(0.) {

  if (m_nrows != m_ncols) {
    throw std::length_error("The given matrix is not square");
  }

  int i;
  for (i = 0; i < 64; ++i) {
    m_pt[i] = 0;
    m_iparm[i] = 0;
  }

  if (is_symmetric) {
    m_mtype = 1;
  }
  else {
    m_mtype = 11;
  }

  m_iparm[0] = 1; m_iparm[1] = 3; m_iparm[7] = 2;
  m_iparm[9] = (m_mtype == 11) ? 13 : 8;
  if ((m_mtype == 11) || ((m_mtype == 1) && enable_matching_scaling)) {
    m_iparm[10] = 1;
    m_iparm[12] = 1;
  }
  m_iparm[23] = (m_mtype == 11) ? 10 : 1;
  m_iparm[34] = 1;

  ParSolver::factorize();
}

// for copying
ParSolver::ParSolver(ParSolver *Parent) :
  m_data(Parent->m_data), m_indices(Parent->m_indices), m_indptr(Parent->m_indptr),
  m_nrows(Parent->m_nrows), m_ncols(Parent->m_ncols), m_maxfct(Parent->m_maxfct),
  m_mnum(Parent->m_mnum), m_phase(Parent->m_phase), m_error(Parent->m_error),
  m_msglvl(Parent->m_msglvl), m_mtype(Parent->m_mtype), m_idum(Parent->m_idum),
  m_ddum(Parent->m_ddum) {
  for (auto i = 0; i < 64; ++i) {
    m_pt[i] = Parent->m_pt[i];
    m_iparm[i] = Parent->m_iparm[i];
  }
}

ParSolver::~ParSolver() {
  m_phase = -1;
  pardiso_64(m_pt, &m_maxfct, &m_mnum, &m_mtype, &m_phase, &m_nrows,
    m_data.data(), m_indptr.data(), m_indices.data(), &m_idum, &m_idum,
    m_iparm, &m_msglvl, &m_ddum, &m_ddum, &m_error);
  if (m_error != 0) {
    printf("Error number %lld when terminating the object.", m_error);
  }
}

void ParSolver::factorize() {
  m_phase = 12;
  pardiso_64(m_pt, &m_maxfct, &m_mnum, &m_mtype, &m_phase, &m_nrows,
    m_data.data(), m_indptr.data(), m_indices.data(), &m_idum, &m_idum,
    m_iparm, &m_msglvl, &m_ddum, &m_ddum, &m_error);
  if (m_error != 0) {
    throw std::runtime_error("Error number " + std::to_string(m_error) +
      " in analysis and numerical factorization phase.");
  }
}

std::vector<double> ParSolver::solve_A(std::vector<double> &rhs,
    const LLINT n_rhs=1) {
  double n_rhs_rows = (double) rhs.size()/(double) n_rhs;
  if (std::abs(n_rhs_rows - (double) m_nrows) > 1E-15) {
    throw std::length_error("The number of rows of the rhs (" + 
      std::to_string((int)n_rhs_rows) + ") is not " "equal to the size of the "
      "factored matrix (" + std::to_string(m_nrows) + ").");
  }

  std::vector<double> x = std::vector<double>(rhs.size(), 0.);

  m_phase = 33;
  m_iparm[11] = 0;

  pardiso_64(m_pt, &m_maxfct, &m_mnum, &m_mtype, &m_phase, &m_nrows,
    m_data.data(), m_indptr.data(), m_indices.data(), &m_idum, &n_rhs,
    m_iparm, &m_msglvl, rhs.data(), x.data(), &m_error);
  if (m_error != 0) {
    throw std::runtime_error("Error number " + std::to_string(m_error) +
      " in solution phase.");
  }

  return x;
}

std::vector<double> ParSolver::operator()(std::vector<double>& rhs,
  const LLINT n_rhs=1) {
  return this->solve_A(rhs, n_rhs);
}

ParSolver ParSolver::copy() {
  return ParSolver(this);
}

And now the main function with an example, "main.cpp":

#include "intelParSolver.h"
#include <vector>
#include "mkl_service.h"
#include "omp.h"
#include <iostream>
#include <format>  // C++20

int main() {
	MKL_Set_Dynamic(0);
	MKL_Set_Num_Threads(4);
	omp_set_dynamic(0);
	omp_set_num_threads(1);

	auto data = std::vector<double>{10., -2., 8., -3., -2., 9., -3., 7.};
	auto indices = std::vector<long long int>{0, 2, 1, 3, 0, 2, 1, 3};
	auto indptr = std::vector<long long int>{ 0, 2, 4, 6, 8 };
	long long int nrows = 4, ncols = 4;
	bool is_symmetric = true, enable_maching_scaling = false;

	auto factor = ParSolver(data, indices, indptr, nrows, ncols,
                          is_symmetric, enable_maching_scaling);

	auto b = std::vector<double>{ 0., 1., 2., 3., 4., 5., 6., 7. };

	long long int nrhs = 2;
	auto x = factor(b, nrhs);

	std::cout << "Solution matrix:\n";
	for (auto i = 0; i < nrows; ++i) {
		for (auto j = 0; j < nrhs; ++j) {
			std::cout << std::format(" {:4.3f},", x[i + nrows*j]);
		}
		std::cout << '\n';
	}

	auto factor2 = factor.copy();

	auto b2 = std::vector<double>(b);
	for (auto& b2i : b2) {
		b2i += 10.;
	}

	auto x2 = factor2(b2, nrhs);
	std::cout << "\nSolution matrix 2:\n";
	for (auto i = 0; i < nrows; ++i) {
		for (auto j = 0; j < nrhs; ++j) {
			std::cout << std::format(" {:4.3f},", x2[i + nrows*j]);
		}
		std::cout << '\n';
	}
	return 0;
}

The created matrix is:

10., 0.,-2., 0.
 0., 8., 0.,-3.
-2., 0., 9., 0.
 0.,-3., 0., 7.

The results are:

Solution matrix:
 0.047, 0.558,
 0.340, 1.191,
 0.233, 0.791,
 0.574, 1.511,

Solution matrix 2:
 1.326, 1.837,
 2.468, 3.319,
 1.628, 2.186,
 2.915, 3.851,

ShanmukhS_Intel · ‎08-22-2022

Hi BrenoA,

Thanks for sharing the reproducer and necessary steps.

We have tried compiling and running the source code. However, unfortunately, we faced an exception while running it and we are working on the same. We will get back to you soon with an update on this.

Best Regards,

Shanmukh.SS

BrenoA · ‎08-24-2022

Hello ShanmukhS,

I can try helping you to debug. Could you share the exception/error message, after compiling in debug mode with /Od?

I should warn you that I haven't tested whether the exceptions I throw in the above code work.

Best regards,

Breno

ShanmukhS_Intel · ‎09-01-2022

Hi Breno,

Please find the screenshot for exception raised below.

Regrets for the delay in response due to a technical issue.

Best Regards,

Shanmukh.SS

BrenoA · ‎09-01-2022

Hello Shanmukh,

I cannot load the image! When trying to load it by right-clicking and selecting "Open image in a new tab", I get a "Single Sign-On Error" by salesforce, with the error message:

"We can't log you in because of an issue with single sign-on. Contact your Salesforce admin for help."

Kind regards,

Breno

ShanmukhS_Intel · ‎09-05-2022

Hi Breno,

Regrets for the inconvenience caused.

I'm attaching the exception captured screenshot with this thread. Could you please let me know if you are able to open and view it?

Best Regards,

Shanmukh.SS

BrenoA · ‎09-05-2022

Hello Shanmukh,

Yes, I have been able to open and view it.

Indeed, this is exactly the problem that I am having. If you set a breakpoint in "return 0;" (line 51 of main.cpp) and run in debug mode, no exceptions will be thrown and you will see the desired output.

The exception occurs at the ending of main.cpp, since this is when the objects become out-of-scope and their destructors are called. Specifically, the exception occurs after the destructor of the second object (the "copy", or factor2) is called. I believe this happens because a deep copy is not performed, only a shallow copy.

I think that after the first object's destructor is called, some of the arrays (maybe the pt array?) actually shares data with its copy. So calling the second object's destructor is like trying to call pardiso_64 with phase=-1 twice.

If you comment out lines 36 to 50 in main.cpp, no exceptions are thrown.

So the question is then, how to perform an actual deep copy of the proposed object? In such a manner that clearing the pardiso handler pt of one object does not influence its copies.

Kind regards,

Breno

ShanmukhS_Intel · ‎09-13-2022

Hi Breno,

Could you please try compiling and executing the code on Intel oneAPI command prompt and let us know if this works fine? Kindly let us know if you need any details regarding the same.

Best Regards.

Shanmukh.SS

BrenoA · ‎09-13-2022

Hello ShanmukhS,

I have performed what you asked. As expected, it doesn't do anything, the problem still persists.

To compile and execute the code on Intel oneAPI x64 command prompt, you have to:

rename "intelParSolver.h" to "intelParSolver.hpp"
change #include "intelParSolver.h" to #include "intelParSolver.hpp" in both "intelParSolver.cpp" and "main.cpp"
Put the files in a new folder Open the intelOneApi x64 command prompt and cd into the new folder
Compile with "icx /Qstd=c++20 /Qxhost /Qiopenmp /DMKL_ILP64 -I"%MKLROOT%\include" intelParSolver.hpp intelParSolver.cpp main.cpp /link mkl_intel_ilp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib"
Run with "intelParSolver"

The only difference is that the command prompt doesn't show the error.

If you have trouble visualizing the problem, use the modified attached files.

In the modified file, I change the destructor to the following:

ParSolver::~ParSolver() {
  m_phase = -1;
  pardiso_64(m_pt, &m_maxfct, &m_mnum, &m_mtype, &m_phase, &m_nrows,
    m_data.data(), m_indptr.data(), m_indices.data(), &m_idum, &m_idum,
    m_iparm, &m_msglvl, &m_ddum, &m_ddum, &m_error);
  if (m_error != 0) {
    printf("Error number %lld when terminating the object.\n", m_error);
  }
  printf("Called a desctructor\n");
}

Now the output after running the executable file intelParSolver should be:

Solution matrix:
 0.047, 0.558,
 0.340, 1.191,
 0.233, 0.791,
 0.574, 1.511,

 Solution matrix 2:
 1.326, 1.837,
 2.468, 3.319,
 1.628, 2.186,
 2.915, 3.851, 
Called a desctructor

Theoretically, "Called a destructor" should have appeared twice, not only once. And the string "Finished successfully" should have been shown.

It can be seen that, even with a try-catch block in main.cpp, the program terminates prematurely. This probably means some form of segfault is occuring. As explained previously, this means that the new factor is not deep copying the original factor. After the original factor is destroyed, all information is lost in both the original and the copied factor, and pardiso_64 is called with phase=-1 for the same pt array.

This can all be much more easily seen in Visual Studio, as originally posted.

The question, then, is how to "copy" a pt array, without sharing information of the original factor.

Regards,

Breno

BrenoA · ‎09-13-2022

Hello ShanmukhS,

I have performed what you asked. As expected, it doesn't do anything, the problem still persists.

To compile and execute the code on Intel oneAPI x64 command prompt, you have to:

rename "intelParSolver.h" to "intelParSolver.hpp"
change #include "intelParSolver.h" to #include "intelParSolver.hpp" in both "intelParSolver.cpp" and "main.cpp"
Put the files in a new folder
Open the intelOneApi x64 command prompt and cd into the new folder
Compile with "icx /Qstd=c++20 /Qxhost /Qiopenmp /DMKL_ILP64 -I"%MKLROOT%\include" intelParSolver.hpp intelParSolver.cpp main.cpp /link mkl_intel_ilp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib"
Run with "intelParSolver"

The only difference is that the command prompt doesn't show the error.

If you have trouble visualizing the problem, use the modified attached files.

In the modified file, I change the destructor to the following:

ParSolver::~ParSolver() {
  m_phase = -1;
  pardiso_64(m_pt, &m_maxfct, &m_mnum, &m_mtype, &m_phase, &m_nrows,
    m_data.data(), m_indptr.data(), m_indices.data(), &m_idum, &m_idum,
    m_iparm, &m_msglvl, &m_ddum, &m_ddum, &m_error);
  if (m_error != 0) {
    printf("Error number %lld when terminating the object.\n", m_error);
  }
  printf("Called a desctructor\n");  // NEW LINE ADDED
}

Now the output after running the executable file intelParSolver should be:

Solution matrix:
 0.047, 0.558,
 0.340, 1.191,
 0.233, 0.791,
 0.574, 1.511,

Solution matrix 2:
 1.326, 1.837,
 2.468, 3.319,
 1.628, 2.186,
 2.915, 3.851,
Called a desctructor

Theoretically, "Called a destructor" should have appeared twice, not only once. And the string "Finished successfully" should have been shown.

It can be seen that, even with a try-catch block in main.cpp, the program terminates prematurely. This probably means some form of segfault is occuring. As explained previously, this means that the new factor is not deep copying the original factor. After the original factor is destroyed, all information is lost in both the original and the copied factor, and pardiso_64 is called with phase=-1 for the same pt array.

This can all be much more easily seen in Visual Studio, as originally posted.

The question, then, is how to "copy" a pt array, without sharing information of the original factor.

Regards,

Breno

Kirill_V_Intel · ‎09-23-2022

Hi!

Unfortunately for me, this issue caught my eye and to clear my conscience, as a former Intel MKL developer I feel the urge to answer the question. It appears to be rather a simple question in fact.

Short answer: no, deep copy is not possible except by a terrible way described below.

Longer answer: deep copy can be done through dumping the handle onto the disk pardiso_handle_store() + copying files + pardiso_handle_restore() via https://www.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/sparse-solver-routines/onemkl-pardiso-parallel-direct-sparse-solver-iface/pardiso-handle-store.html

I don't think this is a good solution so I'd think twice before using it.

General comment: I don't think that deep copying the handle is a reasonable idea, this object is not really meant to be deep-copied except if you have some really complicated workflows.

Best,
Kirill

BrenoA · ‎09-23-2022

Hello Kirill,

Thank you for your answer! Indeed, I have already tried this out and was trying to avoid writing to disk.

In this case, I believe a reference counting scheme could work. Do you know if a smart pointer could be used for this? For example, a std::shared_pointer. Though, I do not know how to call the destructor (i.e. phase=-1) only when the reference counter reaches 0. And I do not know if there is a way for the copy() method to return the std::shared_pointer automatically.

I'll have to study a bit.

Thank you again for your insight.

Best,

Breno

Kirill_V_Intel · ‎09-23-2022

Hi!

Not sure about the very last question, but as for doing a custom operation when the reference counter reaches 0, std::shared_ptr has a "Deleter" per https://en.cppreference.com/w/cpp/memory/shared_ptr/shared_ptr which should do what you want.

Best,
Kirill

ShanmukhS_Intel · ‎09-21-2022

Hi,

Thanks for the details. We are discussing the issue internally. We will get back to you soon with an update.

Best Regards,

shanmukh.SS

ShanmukhS_Intel · ‎10-03-2022

Hi,

A gentle reminder:

As we haven't heard back from you for a while, could you please let us know if there is any update regarding the query or else, kindly let is know if we could close this thread at our end.

Best Regards,

Shanmukh.SS

BrenoA · ‎10-03-2022

Hello,

I am still trying to implement the solution discussed with Kirill.

And as a gentle reminder, please post what you have discussed internally.

The issue still hasn't been solved.

The kindest regards,

Breno

ShanmukhS_Intel · ‎10-06-2022

Hi,

A gentle reminder:

I am still trying to implement the solution discussed with Kirill. And as a gentle reminder, please post what you have discussed internally.

>>Sure, Thanks for the update. Regarding the discussion, We have forwarded your query details and our findings to the corresponding team and the developer has reached out to you suggesting workarounds. Hence, we would like to know if there is any update regarding the issue.

Best Regards,

Shanmukh.SS

ShanmukhS_Intel · ‎10-26-2022

Hi,

We assume that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.

Best Regards,

Shnmukh.SS