Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
7117 Discussions

Matlab mexfile dynamically linked with Intel em64t MKL Pardiso segmentation faults in numeric factorization phase.

seblmn_pub_ro
Beginner
1,193 Views
Dear readers;

I compiled and linked a Matlab mexfile interface to Intel MKL Pardiso solver (ver 10.1.2.024) to solve some large sparse complex matrices (n ~ 5000 - 100000+). When testing the Compute_One_Frequency.mexa64 file, Intel MKL pardiso reordering works OK, but encountered segmentation fault in the numerical factorization phase (I tested with both real and doublecomplex solvers. Both experience segmentation faults even for small matricesn = 5772, nnz=1266844). Has anyone encountereda similar scenarioas well and could suggest some remedies?

For your reference, compiled using Matlab script:
mex -v -largeArrayDims Compute_One_Frequency.c

which results in gcc compile lines:
-> gcc -c -I/opt/matlab/extern/include -DMATLAB_MEX_FILE -ansi -D_GNU_SOURCE -DMKL_ILP64 -I/opt/intel/mkl/10.1.2.024/include -fexceptions -fPIC -fno-omit-frame-pointer -pthread -O -DNDEBUG "Compute_One_Frequency.c"

linking using:
-> gcc -O -pthread -shared -Wl,--version-script,/opt/matlab/extern/lib/glnxa64/mexFunction.map -Wl,--no-undefined -o "Compute_One_Frequency.mexa64" Compute_One_Frequency.o mexversion.o -Wl,-rpath-link,/opt/matlab/bin/glnxa64 -L/opt/intel/mkl/10.1.2.024/lib/em64t -I/opt/intel/mkl/10.1.2.024/include -lmkl_gf_ilp64 -lmkl_sequential -lmkl_lapack -lmkl_core -L/opt/matlab/bin/glnxa64 -lmx -lmex -lmat -lm -lstdc++

Am i missing a library? or got the link order wrong?

Miscellaneous information:
function prototype used was:
(doublecomplex used instead of double for complex version)
/* PARDISO prototype. */
#if defined(_WIN32) || defined(_WIN64)
#define pardiso_ PARDISO
#else
#define PARDISO pardiso_
#endif
#if defined(MKL_ILP64)
#define MKL_INT long long
#else
#define MKL_INT int
#endif
extern MKL_INT PARDISO
(void *, MKL_INT *, MKL_INT *, MKL_INT *, MKL_INT *, MKL_INT *,
double *, MKL_INT *, MKL_INT *, MKL_INT *, MKL_INT *, MKL_INT *,
MKL_INT *, double *, double *, MKL_INT *);

mkl_set_num_threads(1); /* Threads set to 1 for sequential version */
iparam[59] = 2; /* Using OOC strong. Tried with IN CORE and OOC variable as well. Segmentation fault remains. */

Thank you in advance.

seb
0 Kudos
9 Replies
seblmn_pub_ro
Beginner
1,193 Views

Some additional information from further experimentation results. This is when things get weird.

If i link without -lmkl_lapack (only lmkl_gf_ilp64, lmkl_sequential -lmkl_core), the segmenttion fault occurs during reordering phase. But when linking with -lmkl_lapack and -lmkl_core, thesegmentation fault occurs at numerical factorization phase. Reordering isOK.

Below are my variable declarations and function calls:

MKL_INT mklpardiso_nNonZeros; /* Number of Non Zeros */
MKL_INT idum; /* Dummy integer */
double ddum; /* Dummy complex */

void *mklpardiso_pt[64]; /* Pardiso internal data address pointer. INITIALIZE only ONCE at the FIRST call. */
MKL_INT mklpardiso_maxfct; /* Maximal number of factors to keep in memory with identical non zero sparsity structure. */
MKL_INT mklpardiso_mnum; /* Actual Matrix number for solution phase. */
MKL_INT mklpardiso_mtype; /* Matrix type of A. */
MKL_INT mklpardiso_phase; /* Phase of Pardiso solver operation */
MKL_INT mklpardiso_n; /* Number of equations in linear system Ax=b. A should be a nxn square matrix. */
MKL_INT *pmklpardiso_ia; /* Array of rowIndex [n+1] ALL arrays use ONE BASED ADDRESSING */
MKL_INT *pmklpardiso_ja; /* Array of Column Indices [nNonZeros] */
double *pmklpardiso_a; /* Array of Complex values in matrix A [nNonZeros]. */
MKL_INT *pmklpardiso_perm; /* Array of Permutation vectors */
MKL_INT mklpardiso_nrhs; /* Number of right hand sides in linear system Ax=b. b should be a nxnrhs matrix. */
MKL_INT mklpardiso_iparm[64]; /* Pardiso parameters*/
MKL_INT mklpardiso_msglvl; /* Pardiso message level. 0: Silent, 1: Statistical information printed. */
MKL_INT mklpardiso_error; /* Pardiso error number */

Arrays are allocatedusing the standard calloc (and populated correctly using 1 based addressing):
pmklpardiso_ia = calloc(mklpardiso_n+1, sizeof(MKL_INT)); /* REMEMBER to FREE */
pmklpardiso_ja = calloc(mklpardiso_nNonZeros, sizeof(MKL_INT)); /* REMEMBER to FREE */
pmklpardiso_a = calloc(mklpardiso_nNonZeros, sizeof(double or doublecomplex)); /* REMEMBER to FREE*/

Function calls are:
/* Configuring Pardiso Variables */
mklpardiso_maxfct = 1; /* Factorizing only one matrix */
mklpardiso_mnum = 1; /* Factorizing only one matrix */
mklpardiso_mtype = 11 or 13; /* A is unsymmetric and complex */
mklpardiso_msglvl = 0; /* 0: Silent, 1: Print statistical information to screen. */
mklpardiso_error = 0;
/* Setup pt - ONLY for FIRST call to pardiso */
for (ctr = 0; ctr < 64; ctr++) {
mklpardiso_pt[ctr] = 0;
}
/* Setup iparam */
for (ctr = 0; ctr < 64; ctr++) {
mklpardiso_iparm[ctr] = 0;
}
mklpardiso_iparm[0] = 1; /* 0: use solver default. 1: NOT using solver defaults */
mklpardiso_iparm[1] = 2; /* Fill-in reordering from METIS */
/* Numbers of processors, value of OMP_NUM_THREADS */
mklpardiso_iparm[2] = 1; /* UNUSED input, MUST EQUAL environmental variable OMP_NUM_THREADS */
mklpardiso_iparm[3] = 0; /* No iterative-direct algorithm. Instead computing LU. */
mklpardiso_iparm[4] = 0; /* 0: perm vector not used, 1: use user supplied perm vector, 2: permutation vector is returned into array perm */
mklpardiso_iparm[5] = 0; /* Write solution into x. b is not overwritten. */
mklpardiso_iparm[6] = 0; /* UNUSED input. Used to return ACTUAL number of iterations executed in solve phase for iparam[7] */
mklpardiso_iparm[7] = 2; /* Max numbers of iterative refinement steps */
mklpardiso_iparm[8] = 0; /* UNUSED input. Reserved for future use. */
mklpardiso_iparm[9] = 13; /* Perturb the pivot elements with 1E-13 */
mklpardiso_iparm[10] = 1; /* Use nonsymmetric permutation and scaling MPS */
mklpardiso_iparm[11] = 0; /* UNUSED input. Reserved for future use. MUST be set to 0 */
mklpardiso_iparm[12] = 1; /* Maximum weighted matching algorithm is switched-on (default for non-symmetric) */
mklpardiso_iparm[13] = 0; /* Output: Number of perturbed pivots */
mklpardiso_iparm[14] = 0; /* Output: Peak memory (KB) used in OOC symbolic factorization (phase 1) */
mklpardiso_iparm[15] = 0; /* Output: Permanent memory (KB) used in OOC symbolic factorization (phase 1) */
mklpardiso_iparm[16] = 0; /* Output: Total double precision memory (KB) used in OOC numerical factorization (phase 2)*/
mklpardiso_iparm[17] = -1; /* Input&Output. if < 0 input value number of nonzeros in the factor LU will be returned here. */
mklpardiso_iparm[18] = -1; /* INput&Output. if < 0 input value Mflops rate for LU factorization will be returned here. */
mklpardiso_iparm[19] = 0; /* Output: Numbers of CG Iterations */
mklpardiso_iparm[26] = 1; /* 1: Checks if column indices (ja) are sorted in increasing order within each row (ia), 0: no check */
mklpardiso_iparm[59] = 2; /* 0: IN CORE, 1: OOC variable version, 2: OOC strong */
mklpardiso_iparm[60] = 0; /* Output: Peak memory (KB) used in IN CORE symbolic factorization (phase 1). Similar to mklpardiso_iparm[14] */
mklpardiso_iparm[61] = 0; /* Output: Permanent memory (KB) used in IN CORE symbolic factorization (phase 1). Similar to mklpardiso_iparm[15] */
mklpardiso_iparm[62] = 0; /* Output: Total double precision memory (KB) used in IN CORE numerical factorization (phase 2). Similar to mklpardiso_iparm[16] */

/* Reordering and Symbolic Factorization */
mklpardiso_phase = 11; /* Reordering only */

PARDISO (mklpardiso_pt, &mklpardiso_maxfct, &mklpardiso_mnum, &mklpardiso_mtype, &mklpardiso_phase, &mklpardiso_n, pmklpardiso_a, pmklpardiso_ia, pmklpardiso_ja, &idum, &mklpardiso_nrhs, mklpardiso_iparm,
&mklpardiso_msglvl, &ddum, &ddum, &mklpardiso_error);
if (mklpardiso_error != 0) {
switch (mklpardiso_error) {
case -1: mexErrMsgTxt("MKL Pardiso reordering error. Input inconsistent."); break;
case -2: mexErrMsgTxt("MKL Pardiso reordering error. Not enough memory."); break;
case -3: mexErrMsgTxt("MKL Pardiso reordering error. Reordering problem."); break;
case -4: mexErrMsgTxt("MKL Pardiso reordering error. Zero pivot, numerical factorization or iterative refinement problem."); break;
case -5: mexErrMsgTxt("MKL Pardiso reordering error. Unclassified (internal) error."); break;
case -6: mexErrMsgTxt("MKL Pardiso reordering error. Preordering error (mtype 11, 13 only)."); break;
case -7: mexErrMsgTxt("MKL Pardiso reordering error. Diagonal matrix problem."); break;
case -8: mexErrMsgTxt("MKL Pardiso reordering error. 32 bit integer overflow problem."); break;
case -9: mexErrMsgTxt("MKL Pardiso reordering error. Not enough memory for OOC."); break;
case -10: mexErrMsgTxt("MKL Pardiso reordering error. Error opening OOC files."); break;
case -11: mexErrMsgTxt("MKL Pardiso reordering error. Read/Write problems with OOC data files."); break;
default: mexErrMsgTxt("MKL Pardiso reordering error.");
}
}

For numerical factorization, only the following is changed before a similar call as above:
mklpardiso_phase = 22; /* Numerical Factorization only */

0 Kudos
Gennady_F_Intel
Moderator
1,193 Views
Seb,
1.ease try to slightly change the linking line regarding mkls libraries by the following:
-Wl,--start-group $MKLPATH/libmkl_gf_ilp64.a $MKLPATH/libmkl_sequential.a $MKLPATH/libmkl_core.a -Wl,--end-group -lpthread
whre $MKLPATH, as an example == /opt/intel/mkl/10.1.2.024/lib/em64t

note:
if you are using only Pardiso you dont need to link libmkl_lapack lib

2. Please look at the KB article called
Uing Intel MKL with MATLAB
http://software.intel.com/en-us/articles/using-intel-mkl-with-matlab/
may it will helps
--Gennady
0 Kudos
seblmn_pub_ro
Beginner
1,193 Views
Gennady,

Thanks, the static linking works, i am able to compile and run the static linkedcode without segmentation errors.

However when running the code, the output of the solver x seems to be incompatible with matlab results.Using the same coefficient matrix A (i double checked the matrix fed to Pardiso and Matlab. they are the same), i ran the MexPardiso version and got a set of numbers for x. In matlab, i did x = Ab and got another set of numbers. I am using the doublecomplex solver with nrhs = 2, iparms are unchanged as previous post.

The output from Pardiso is (formatted as real and imaginary column 1 followed by column 2):
Beginning: Computation for frequency 5.000000e+07 Hz.
Reordering ... DONE
Symbolic Factorization Peak Memory: 52408 KB
Permanent Memory used to store factors: 18192 KB
Number of Non Zeros in factors: 29476438
Number of floating point operations required: 122523 Mflop
Numerical Factorization ... DONE
Numerical Factorization Elapsed time is 39.613878 seconds.
Total memory: 33161 KB
Solving ... DONE
Solving Elapsed time is 4.931573 seconds.
The solution of the system is:
< snip for brevity >
x [47710] = 36.053793293050134 0.000000000000000j 0.000000000000000 0.000000000000000j
x [47711] = -265.929975886536681 0.000000000000000j 0.000000000000000 0.000000000000000j
x [47712] = -162.881611509971663 0.000000000000000j 0.000000000000000 0.000000000000000j
x [47713] = -1127.290963816334624 0.000000000000000j 0.000000000000000 0.000000000000000j
x [47714] = -449.080130870961682 0.000000000000000j 0.000000000000000 0.000000000000000j
x [47715] = -28.536136665379306 0.000000000000000j 0.000000000000000 0.000000000000000j
x [47716] = -2627.376110107798922 0.000000000000000j 0.000000000000000 0.000000000000000j
x [47717] = -3823.224846024713770 0.000000000000000j 0.000000000000000 0.000000000000000j
x [47718] = -5037.675608976801414 0.000000000000000j 0.000000000000000 0.000000000000000j
x [47719] = -1024.134378325073840 0.000000000000000j 0.000000000000000 0.000000000000000j
x [47720] = 10.220104283991992 0.000000000000000j 0.000000000000000 0.000000000000000j
x [47721] = 33.441847152485643 0.000000000000000j 1.000000000000000 0.000000000000000j
Number of iterative refinement steps performed: 4 [Max CGS Iterations: 20 ]
Number of perturbed pivots: 1142
CG/CGS Diagnostic: 0
Complete: Computation for frequency 5.000000e+07 Hz.

Matlab output:
x=
< snip for brevity >
x [47710] =-0.000005285358658 + 0.000000061066601i -0.000002389934058 - 0.000000039014830i
x [47711] =-0.000010621537059 + 0.000000179357816i -0.000008273371138 - 0.000000107321997i
x [47712] =-0.000000195500458 + 0.000000029687253i -0.000001768610367 - 0.000000015518635i
x [47713] =0.000008985740121 + 0.000000071698765i-0.000006687420318 - 0.000000023892635i
x [47714] =0.000004900928237 + 0.000000005152095i -0.000001567320502 + 0.000000004464986i
x [47715] =0.000000320543354 + 0.000000000115894i -0.000000088965442 + 0.000000000406412i
x [47716] =0.000031975266188 - 0.000000047723402i -0.000005242455774 + 0.000000071222683i
x [47717] =0.000037723116389 - 0.000000106316376i -0.000003118730192 + 0.000000111466970i
x [47718] =0.000029871504619 - 0.000000099491930i -0.000001529691858 + 0.000000098148170i
x [47719] =0.000011305058224 - 0.000000039864505i -0.000000442880818 + 0.000000039418336i
x [47720] =0.000000683300311 + 0.000138152994315i -0.000000610540021 - 0.000131494857307i
x [47721] =-0.000000610540021 - 0.000131494857307i 0.001378181053266 + 0.000129012810116i


Observations:
1) The complex part does not seem to get factored. b and x were allocated with the same calloc(n x nrhs, sizeof(doublecomplex)).This was then populated (Colum major population i.ecolumn 1 elements first followed bycolumn 2 elements)and passed to Pardiso with nrhs = 2. Please advise if i have made a wrong call etc.

2) The real parts which got factored seem incompatible with Matlab output and is sensitive to iparm[9]. if i change iparm[9] between 6-13, i get a set of different results. In all cases, the CGS diagnostic is 0 and iterative steps performed is 4. Only number of perturbed pivots change when iparm[9] is varied.


I will try the following:
a) Use Pardiso doublecomplex solver with nrhs = 1.
b) Solve the real andimaginary parts seperately as two reals by calling real Pardiso twice.

Thank you,

seb

0 Kudos
Sergey_K_Intel1
Employee
1,193 Views

Pardiso uses interleaved complex, MATLAB direct solvers accept split complex natively. I might be mistaken but it looks like you have been using MATLAB type of split complex for PARDISO and this causes different outputs.

All the best
Sergey
0 Kudos
seblmn_pub_ro
Beginner
1,193 Views
Sergey,

Yes, I'm aware of that. Matlab stores the real and imaginary parts in two seperate contiguous arrays.What i did was todefine a doublecomplex structure with real &imaginary elements of type double. Then, allocated new memory using Callocfor doublecomplex structure, then copy the elements from the Matlab seperate contigous arrays to the interleaved format of Pardiso. This calloc doublecomplex structure was then passed to Pardiso, not the original Matlab contigous arrays.

I did some further experimentation for items a and b of my previous post. The results are:

a) Setting nrhs=1 instead of 2 has no effect on doublecomplex solver, the complex parts remain unfactored and unsolved.Weird values for real parts.

b) Solving the real and imaginary parts seperately as two reals resulted in non convergence even for max iterative steps (iparm[7])set to a large number 200 - 500.A Not a Number (NaN)value was returned for all x. mtype was set to 11 for this. The output is:

Beginning: Computation for frequency 5.000000e+07 Hz.
Reordering ... DONE
Symbolic Factorization Peak Memory: 59270 KB
Permanent Memory used to store factors: 24343 KB
Number of Non Zeros in factors: 43414236
Number of floating point operations required: 68963 Mflop
Numerical Factorization ... DONE
Numerical Factorization Elapsed time is 29.028952 seconds.
Total memory: 25125 KB
Solving ... DONE
Solving Elapsed time is 143.555830 seconds.
The solution of the system is:

x [47710] = nan
x [47711] = nan
x [47712] = nan
x [47713] = nan
x [47714] = nan
x [47715] = nan
x [47716] = nan
x [47717] = nan
x [47718] = nan
x [47719] = nan
x [47720] = nan
x [47721] = nan
Number of iterative refinement steps performed: 200 [Max CGS Iterations: 200 ]
Number of perturbed pivots: 4565
CG/CGS Diagnostic: 0
Complete: Computation for frequency 5.000000e+07 Hz.

Misc weirdness encountered:

1) When statically linked with libmkl_gnu_thread.a instead of libmkl_sequential.a, the segmentation fault previously encountered with dynamic linking returns. The relevent linkflags used is:

-Wl,--start-group $MKLPATH/libmkl_gf_ilp64.a $MKLPATH/libmkl_gnu_thread.a $MKLPATH/libmkl_core.a -Wl,--end-group -liomp5

If i use -lpthread instead of -liomp5, there is a link error stating some omp files not found. There is no linkerrorusing -liomp5, but when it runs, there is segmentation fault.

seb


0 Kudos
Gennady_F_Intel
Moderator
1,193 Views
Seb,
We met such problem some times ago and have fixed it already, so Can you please get us the reproducible test case?
--Gennady
0 Kudos
seblmn_pub_ro
Beginner
1,193 Views

Gennady,

Yes, I too am puzzled by this. I have attached the neccesary files. They are named appropriately. But, just to be sure, the .doublecomplex_nrhs file solves a complex A, the .real file solves a real A (complex part is ignored). Link scipt used is mexopts.sh. There are some miscellaneous .m files to read and write matrices. The matrix file (.mat) which was used is attached also (.mat). The first part of the C file just reads thematrices from matfile and assembles A. Then a Calloc is done for a Pardiso structure which is then populated. The pardiso structure is passed toPardiso.Basically all one needs to do is call the function Compute_One_Frequency with f=50000000 and 'matfilename'. Compiler and linker used was gcc 4.3.1. If you need further information, just let me know.

seb
0 Kudos
Sergey_K_Intel1
Employee
1,193 Views
Quoting - seblmn.pub.ro
Sergey,

Yes, I'm aware of that. Matlab stores the real and imaginary parts in two seperate contiguous arrays.What i did was todefine a doublecomplex structure with real &imaginary elements of type double. Then, allocated new memory using Callocfor doublecomplex structure, then copy the elements from the Matlab seperate contigous arrays to the interleaved format of Pardiso. This calloc doublecomplex structure was then passed to Pardiso, not the original Matlab contigous arrays.

I did some further experimentation for items a and b of my previous post. The results are:

a) Setting nrhs=1 instead of 2 has no effect on doublecomplex solver, the complex parts remain unfactored and unsolved.Weird values for real parts.

b) Solving the real and imaginary parts seperately as two reals resulted in non convergence even for max iterative steps (iparm[7])set to a large number 200 - 500.A Not a Number (NaN)value was returned for all x. mtype was set to 11 for this. The output is:

Beginning: Computation for frequency 5.000000e+07 Hz.
Reordering ... DONE
Symbolic Factorization Peak Memory: 59270 KB
Permanent Memory used to store factors: 24343 KB
Number of Non Zeros in factors: 43414236
Number of floating point operations required: 68963 Mflop
Numerical Factorization ... DONE
Numerical Factorization Elapsed time is 29.028952 seconds.
Total memory: 25125 KB
Solving ... DONE
Solving Elapsed time is 143.555830 seconds.
The solution of the system is:

x [47710] = nan
x [47711] = nan
x [47712] = nan
x [47713] = nan
x [47714] = nan
x [47715] = nan
x [47716] = nan
x [47717] = nan
x [47718] = nan
x [47719] = nan
x [47720] = nan
x [47721] = nan
Number of iterative refinement steps performed: 200 [Max CGS Iterations: 200 ]
Number of perturbed pivots: 4565
CG/CGS Diagnostic: 0
Complete: Computation for frequency 5.000000e+07 Hz.

Misc weirdness encountered:

1) When statically linked with libmkl_gnu_thread.a instead of libmkl_sequential.a, the segmentation fault previously encountered with dynamic linking returns. The relevent linkflags used is:

-Wl,--start-group $MKLPATH/libmkl_gf_ilp64.a $MKLPATH/libmkl_gnu_thread.a $MKLPATH/libmkl_core.a -Wl,--end-group -liomp5

If i use -lpthread instead of -liomp5, there is a link error stating some omp files not found. There is no linkerrorusing -liomp5, but when it runs, there is segmentation fault.

seb


Dear Seb

It looks like there exists an inconsistency between MKL_INT used at compile time and the libraries used for linking. For example behaviour described above isusually observed when MKL_INT is decalred as int and ILP64 libaries are used for linking.

Please insert

#define MKL_INT int

in your program and please use the following sequence of MKL libs for linking

$MKLPATH/libmkl_intel_lp64.a -Wl,--start-group $MKLPATH/libmkl_sequential.a $MKLPATH/libmkl_core.a -Wl,--end-group -liomp5

Does your code work under these conditions?

All the best
Sergey

0 Kudos
seblmn_pub_ro
Beginner
1,193 Views

Dear Sergey;

Thank you for the suggestions. It has been implemented. However, i am still getting NaN results for real solver and weird results for doublecomplex solver.

The compile line used is: (-DMKL_ILP64 removed)
gcc -c -I/opt/matlab/extern/include -DMATLAB_MEX_FILE -ansi -D_GNU_SOURCE -I/opt/intel/mkl/10.1.2.024/include -fexceptions -fPIC -fno-omit-frame-pointer -pthread -O -DNDEBUG "Compute_One_Frequency.c"

The link line used is:
gcc -O -pthread -shared -Wl,--version-script,/opt/matlab/extern/lib/glnxa64/mexFunction.map -Wl,--no-undefined -o "Compute_One_Frequency.mexa64" Compute_One_Frequency.o mexversion.o -Wl,-rpath-link,/opt/matlab/bin/glnxa64 -L/opt/intel/mkl/10.1.2.024/lib/em64t -I/opt/intel/mkl/10.1.2.024/include /opt/intel/mkl/10.1.2.024/lib/em64t/libmkl_intel_lp64.a -Wl,--start-group /opt/intel/mkl/10.1.2.024/lib/em64t/libmkl_sequential.a /opt/intel/mkl/10.1.2.024/lib/em64t/libmkl_core.a -Wl,--end-group -liomp5 -L/opt/matlab/bin/glnxa64 -lmx -lmex -lmat -lm -lstdc++

The above link line is the only static linking which runs without segmentation fault (but weird results). If libmkl_sequential is replaced with libmkl_gnu_thread.a or libmkl_intel_thread.a, it links ok, but when run results in segmentaton fault. If libmkl_intel_lp64.a is replaced with libmkl_gf_lp64, there is no change in outcome already described.

seb
0 Kudos
Reply