Solved: Some questions in use cluster pardiso OOC mode iparm[59]=2

Liwufan · ‎05-26-2025

Hi, all

I recently tried to use Parallel Direct Sparse Solver for Clusters Interface to solve the Ax=b linear equation system. Due to my RAM memory limit, the size of the matrix I can calculate is limited, so I want to try to use OOC mode to store L and U on disk to reduce RAM usage. I set iparm[59]=2, and set iparm[10] and iparm[12] to 0 according to the prompt. However, when I run it with multiple MPI processes and one Openmp process, it will exit directly during the LU factorization process.

I would like to know what causes this and its solution, and I look forward to your reply.

My environment variables are set to:

MKL_PARDISO_OOC_PATH=D:\msvc_project\code_zp\GeoAdaptiveRefine\demo1\ooctemp
MKL_PARDISO_OOC_MAX_CORE_SIZE=10240
MKL_PARDISO_OOC_MAX_SWAP_SIZE=10240
MKL_PARDISO_OOC_KEEP_FILE=0

Here is my error message:

Memory allocated on phase 22 on Rank # 0 10242.0785 MB
Memory allocated on phase 22 on Rank # 1 10242.0785 MB
Memory allocated on phase 22 on Rank # 2 10242.0785 MB
Memory allocated on phase 22 on Rank # 3 10242.0785 MB
Memory allocated on phase 22 on Rank # 4 10242.0785 MB
Memory allocated on phase 22 on Rank # 5 10242.0785 MB
Memory allocated on phase 22 on Rank # 6 10242.0785 MB
Memory allocated on phase 22 on Rank # 7 10242.0785 MB

Percentage of computed non-zeros for LL^T factorization
1 %
2 %
4 %
11 %
24 %
25 %
27 %
28 %
29 %
32 %
46 %
66 %

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 20524 RUNNING AT WIN-BB34P4BOBS1
= EXIT STATUS: -1 (ffffffff)
===================================================================================

Thank you again

Shiquan_Su · ‎05-27-2025

The MKL pardiso solver is a shared-memory multiprocessing parallel direct sparse solver. It should work in any one of the MPI ranks. Would you please provide your test code(source code and build command/script/instructions)? And also, please provide your hardware and software env details, so we can reproduce. Does your code work in another setup? Such as one MPI rank case? Turn off OpenMP?

View solution in original post

Shiquan_Su · ‎05-27-2025

The MKL pardiso solver is a shared-memory multiprocessing parallel direct sparse solver. It should work in any one of the MPI ranks. Would you please provide your test code(source code and build command/script/instructions)? And also, please provide your hardware and software env details, so we can reproduce. Does your code work in another setup? Such as one MPI rank case? Turn off OpenMP?

Liwufan · ‎06-03-2025

First of all, thank you very much for your reply and I'm sorry for my late reply.

I used a simple test code to try to use OOC mode to calculate a 4*4 matrix linear equation system. The code is as follows:

I used Visual Studio 2022 and used Intel mpiexec to execute：

#include <iostream>
#include <mpi.h>
#include "mkl_cluster_sparse_solver.h"
#include "mkl_types.h"
#include <vector>

int main(int argc, char* argv[]) {
MPI_Init(&argc, &argv);
int myrank;
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);

// Problem size
MKL_INT n = 4; // Small example; for true OOC test, use a much larger system.
// SPD CSR
MKL_INT ia[5] = { 1, 3, 6, 9, 11 };
MKL_INT ja[10] = { 1, 2, 1, 2, 3, 2, 3, 4, 3, 4 };
double a[10] = { 4, -1, -1, 4, -1, -1, 4, -1, -1, 3 };
double b[4] = { 1.0, 2.0, 3.0, 4.0 };
double x[4] = { 0.0 };

// Pardiso internal data
void* pt[64] = { 0 };
MKL_INT iparm[64] = { 0 };
MKL_INT maxfct = 1, mnum = 1, phase, error = 0, msglvl = 1;
MKL_INT mtype = 2; // Real symmetric positive definite
MKL_INT nrhs = 1;

// Set iparm values
for (int i = 0; i < 64; i++) iparm[i] = 0;
iparm[0] = 1; // No solver default
iparm[1] = 2; // Fill-in reordering from METIS
iparm[7] = 0; // Max number of iterative refinement steps
iparm[59] = 2; // Enable OOC mode
iparm[10] = 0;
iparm[12] = 0;

std::cout << "2" << std::endl;
MPI_Comm comm = MPI_COMM_WORLD;
// Phase 11: Reordering and Symbolic Factorization
phase = 11;
cluster_sparse_solver(pt, &maxfct, &mnum, &mtype, &phase,
&n, a, ia, ja, NULL, &nrhs,
iparm, &msglvl, b, x, &comm, &error);

std::cout << "3" << std::endl;
// Phase 22: Numerical factorization
phase = 22;
cluster_sparse_solver(pt, &maxfct, &mnum, &mtype, &phase,
&n, a, ia, ja, NULL, &nrhs,
iparm, &msglvl, b, x, &comm, &error);

std::cout << "4" << std::endl;
// Phase 33: Back substitution
phase = 33;
cluster_sparse_solver(pt, &maxfct, &mnum, &mtype, &phase,
&n, a, ia, ja, NULL, &nrhs,
iparm, &msglvl, b, x, &comm, &error);

if (myrank == 0) {
std::cout << "Solution x:\n";
for (int i = 0; i < n; i++) std::cout << x[i] << " ";
std::cout << "\n";
}

std::cout << "5" << std::endl;
// Phase -1: Release internal memory
phase = -1;
cluster_sparse_solver(pt, &maxfct, &mnum, &mtype, &phase,
&n, a, ia, ja, NULL, &nrhs,
iparm, &msglvl, b, x, &comm, &error);

std::cout << "6" << std::endl;
MPI_Finalize();
return 0;
}

I tried running with 1 mpi process and closing the openmp thread, but it didn't work. The error is still as follows, and the size of the _lnz_0_0.bin generated in the folder is 0kb.

2
Memory allocated on phase 11 0.0014 MB
3
Memory allocated on phase 22 3072.0024 MB

Percentage of computed non-zeros for LL^T factorization
25 %
50 %
100 %

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 28436 RUNNING AT DESKTOP-4UVO226
= EXIT STATUS: -1073740791 (c0000409)
===================================================================================

Looking forward to your next reply！

Liwufan · ‎06-03-2025

First of all, I'm glad you responded and I'm sorry for my late response.
I used a simple test code to calculate a 4*4 linear equation system in OOC mode. The code is as follows:
I used Visual Studio 2022 and used Intel mpiexec to execute:

#include <iostream>
#include <mpi.h>
#include "mkl_cluster_sparse_solver.h"
#include "mkl_types.h"
#include <vector>

int main(int argc, char* argv[]) {
MPI_Init(&argc, &argv);
int myrank;
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);

// Problem size
MKL_INT n = 4; // Small example; for true OOC test, use a much larger system.
// SPD CSR
MKL_INT ia[5] = { 1, 3, 6, 9, 11 };
MKL_INT ja[10] = { 1, 2, 1, 2, 3, 2, 3, 4, 3, 4 };
double a[10] = { 4, -1, -1, 4, -1, -1, 4, -1, -1, 3 };
double b[4] = { 1.0, 2.0, 3.0, 4.0 };
double x[4] = { 0.0,0.0,0.0,0.0 };

// Pardiso internal data
void* pt[64] = { 0 };
MKL_INT iparm[64] = { 0 };
MKL_INT maxfct = 1, mnum = 1, phase, error = 0, msglvl = 1;
MKL_INT mtype = 2; // Real symmetric positive definite
MKL_INT nrhs = 1;

// Set iparm values
for (int i = 0; i < 64; i++) iparm[i] = 0;
iparm[0] = 1; // No solver default
iparm[1] = 2; // Fill-in reordering from METIS
iparm[7] = 0; // Max number of iterative refinement steps
iparm[59] = 2; // Enable OOC mode
iparm[10] = 0;
iparm[12] = 0;

std::cout << "2" << std::endl;
MPI_Comm comm = MPI_COMM_WORLD;
// Phase 11: Reordering and Symbolic Factorization
phase = 11;
cluster_sparse_solver(pt, &maxfct, &mnum, &mtype, &phase,
&n, a, ia, ja, NULL, &nrhs,
iparm, &msglvl, b, x, &comm, &error);

std::cout << "3" << std::endl;
// Phase 22: Numerical factorization
phase = 22;
cluster_sparse_solver(pt, &maxfct, &mnum, &mtype, &phase,
&n, a, ia, ja, NULL, &nrhs,
iparm, &msglvl, b, x, &comm, &error);

std::cout << "4" << std::endl;
// Phase 33: Back substitution
phase = 33;
cluster_sparse_solver(pt, &maxfct, &mnum, &mtype, &phase,
&n, a, ia, ja, NULL, &nrhs,
iparm, &msglvl, b, x, &comm, &error);

if (myrank == 0) {
std::cout << "Solution x:\n";
for (int i = 0; i < n; i++) std::cout << x[i] << " ";
std::cout << "\n";
}

std::cout << "5" << std::endl;
// Phase -1: Release internal memory
phase = -1;
cluster_sparse_solver(pt, &maxfct, &mnum, &mtype, &phase,
&n, a, ia, ja, NULL, &nrhs,
iparm, &msglvl, b, x, &comm, &error);

std::cout << "6" << std::endl;
MPI_Finalize();
return 0;
}

I tried using an MPI process and running with Openmp turned off, but it didn't work.
In addition, the size of the _lnz_0_0.bin generated in my folder is 0kb, and the current error is still as follows:

2
Memory allocated on phase 11 0.0014 MB
3
Memory allocated on phase 22 3072.0024 MB

Percentage of computed non-zeros for LL^T factorization
25 %
50 %
100 %

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 21112 RUNNING AT DESKTOP-4UVO226
= EXIT STATUS: -1073740791 (c0000409)
===================================================================================

Looking forward to your next reply！

c_sim · ‎06-10-2025

Hi,

Thank you for submitting your query. The main issue with the code is that your matrix type is SPD (mtype=2), but you have specified the full matrix as input. For symmetric matrices, both the Cluster Sparse Solver and PARDISO expect only the upper triangular part of the matrix to be specified.

You can find more details in the description of ja in https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-c/2025-1/cluster-sparse-solver.html

Therefore, your CSR matrix input should be:

MKL_INT ia[5] = { 1, 3, 5, 7, 8 };
MKL_INT ja[7] = { 1, 2, 2, 3, 3, 4, 4 };
double a[7] = { 4, -1, 4, -1, 4, -1, 3 };

Hope it helps.

Kind Regards,

Chris