segment error on mkl_sparse_?_ev when the matrix has degenerate eigenvalues

hu__yi · ‎10-01-2018

Hi everyone,

I am trying to calculate eigenvalues of some real symmetric sparse matrices. What I notice is that the program crashes occasionally. I found a minimum working example:

#include <stdio.h>
#include <mkl.h>
#include <mkl_spblas.h>
#include <mkl_solvers_ee.h>

int main()
{
	int i;
    MKL_INT pm[128];
    mkl_sparse_ee_init(pm);
    // feastinit(pm);  // Using FEAST is fine
    char ls_which = 'L';
    sparse_matrix_t mat_handle;

    double vals[] = {3, 1, 1, 5, 1, 5, 5};
    MKL_INT cols[] = {0, 1, 3, 1, 2, 2, 3};
    MKL_INT rows[] =  {0, 3, 5, 6, 7};

    double vals2[] = {3, 1, 1, 1, 5, 5, 5};
    MKL_INT cols2[] = {0, 1, 2, 3, 1, 2, 3};
    MKL_INT rows2[] =  {0, 4, 5, 6, 7};

    MKL_INT ret_k;
    double ret_E[4], ret_X[4*4], ret_res[4];
    struct matrix_descr descs = {.type = SPARSE_MATRIX_TYPE_SYMMETRIC,
                               .mode = SPARSE_FILL_MODE_UPPER,
                               .diag = SPARSE_DIAG_NON_UNIT
    };
    int csize = 4;
    sparse_status_t spflag = mkl_sparse_d_create_csr(&mat_handle, SPARSE_INDEX_BASE_ZERO, csize, csize,
                           rows, rows+1, cols, vals);
    printf("create_csr return: %d\n", (int)spflag );
    spflag = mkl_sparse_d_ev(&ls_which, pm, mat_handle, descs, csize, &ret_k, ret_E, ret_X, ret_res);
    printf("sparse_ev return: %d\n", (int)spflag );
    for(i=0; i<4; ++i)
    {
    	printf("%f ", ret_E);
    }
    printf("\n");
    spflag = mkl_sparse_d_create_csr(&mat_handle, SPARSE_INDEX_BASE_ZERO, csize, csize,
                           rows2, rows2+1, cols2, vals2);
printf("create_csr return: %d\n", (int)spflag );
    spflag = mkl_sparse_d_ev(&ls_which, pm, mat_handle, descs, csize, &ret_k, ret_E, ret_X, ret_res); // *Crash* here
    printf("sparse_ev return: %d\n", (int)spflag );
    for(i=0; i<4; ++i)
    {
    	printf("%f ", ret_E);
    }

    printf("\n");
}

The first matrix is

[3, 1, 0, 1;
1, 5, 1, 0;
0, 1, 5, 0;
1, 0, 0, 5

The second matrix is

[3, 1, 1, 1;
1, 5, 0, 0;
1, 0, 5, 0;
1, 0, 0, 5

which has two degenerated eigenvalue 5. I am wondering what happens.

Also attach the simple Makefile:

INTEL_ROOT = /opt/intel/compilers_and_libraries/mac
MKLROOT = $(INTEL_ROOT)/mkl
FLAGS =   -L${MKLROOT}/lib -Wl,-rpath,${MKLROOT}/lib -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -ldl
mkl_test: test.c
	icc -o mkl_test test.c -I$(MKLROOT)/include $(FLAGS)

Choi__Sunghwan · ‎06-03-2020

Did you solve this problem? if not, why don't you try to change the type to General matrix?

I am now suffering from symmetric diagonalization using mkl_sparse_d_ev function. In my case, if I use the general format (even though the matrix is still symmetric), the function works perfectly.

Vishnu · ‎06-25-2020

I have a positive semi-definite matrix with two zero-eigenvalues. I'm trying to find its smallest, non-zero eigenvalue. I seem to be having the same trouble when I use SPARSE_MATRIX_TYPE_SYMMETRIC. This is the backtrace:

#0  0x00007ffff591c0e7 in mkl_sparse_d_transpose_matrix_i4 () from /opt/intel/mkl/lib/intel64/libmkl_intel_thread.so
#1  0x00007ffff597eda2 in mkl_sparse_d_transposeMatrix_i4 () from /opt/intel/mkl/lib/intel64/libmkl_intel_thread.so
#2  0x00007ffff593eaba in mkl_sparse_d_add_i4 () from /opt/intel/mkl/lib/intel64/libmkl_intel_thread.so
#3  0x00007ffff597b06b in create_gen_from_sym () from /opt/intel/mkl/lib/intel64/libmkl_intel_thread.so
#4  0x00007ffff597c730 in mkl_sparse_d_optimize_csr_mv_i4 () from /opt/intel/mkl/lib/intel64/libmkl_intel_thread.so
#5  0x00007fffef3e575b in mkl_sparse_optimize_i4_avx2 () from /opt/intel/mkl/lib/intel64/libmkl_avx2.so
#6  0x00007ffff68e7fe1 in mkl_sparse_d_ev_i4 () from /opt/intel/mkl/lib/intel64/libmkl_intel_thread.so

EDIT: Never-mind; I have made a trivial mistake. It works for me with either setting, and the SYMMETRIC option is slightly faster.

fanzhenhao · ‎05-26-2021

I encounter the same problem with you. I am solving eigen problem for a symmetric sparse matrix with dim=10353252 and nnz=2837797476. I need k0=100 smallest eigenvalues and corresponding eigenvectors. Since it is a huge matrix , time cost for feast algorithm is unaffordable , so the only choice is Krylov-Schur algorithm. But I find that mkl_sparse_?_ev end with a segment error when the matrix has degenerate eigenvalues.I provid a simple example here:

#include <iomanip>
#include <iostream>
#include <string>

#include "mkl.h"
#include "mkl_solvers_ee.h"

void printmat(const std::string &name, int m, int n, int lda, const double *mat, int precision);

int main() {
    int N = 8;
    int ia[9] = {0, 1, 2, 3, 4, 5, 6, 7, 8};
    int ja[8] = {0, 1, 2, 3, 4, 5, 6, 7};
    double a[8] = {1, 2, 3, 4, 4, 3, 2, 1};

    sparse_matrix_t A;
    struct matrix_descr descr = {SPARSE_MATRIX_TYPE_SYMMETRIC, SPARSE_FILL_MODE_UPPER, SPARSE_DIAG_NON_UNIT};
    mkl_sparse_d_create_csr(&A, SPARSE_INDEX_BASE_ZERO, N, N, ia, ia + 1, ja, a);

    double denA[64] = {1, 0, 0, 0, 0, 0, 0, 0,
                       0, 2, 0, 0, 0, 0, 0, 0,
                       0, 0, 3, 0, 0, 0, 0, 0,
                       0, 0, 0, 4, 0, 0, 0, 0,
                       0, 0, 0, 0, 4, 0, 0, 0,
                       0, 0, 0, 0, 0, 3, 0, 0,
                       0, 0, 0, 0, 0, 0, 2, 0,
                       0, 0, 0, 0, 0, 0, 0, 1};

    double egvalues[8];

    LAPACKE_dsyev(LAPACK_ROW_MAJOR, 'V', 'U', N, denA, N, egvalues);
    printmat("exact eigen values:", 1, N, N, egvalues, 15);
    printmat("exact eigen vectors:", N, N, N, denA, 15);

    double spE[8];
    double spEv[64];
    double Res[8];
    int k0 = 5;
    int k;
    int pm[128];
    mkl_sparse_ee_init(pm);
    pm[2] = 1;
    pm[6] = 1;
    pm[7] = 1;

    int info = mkl_sparse_d_ev("S", pm, A, descr, k0, &k, spE, spEv, Res);
    if (info != 0) {
        printf("Routine mkl_sparse_d_ev returns code of ERROR: %i", (int)info);
        return 1;
    }

    printf("#mode found/subspace %d %d \n", k, k0);
    printmat("estimated eigen values:", 1, k, k, spE, 15);
    printmat("estimated eigen vectors:", k, N, N, spEv, 15);
}

void printmat(const std::string &name, int m, int n, int lda, const double *mat, int prec) {
    std::ios::fmtflags OldFlags = std::cout.flags();
    std::cout << name << " :" << std::endl;
    std::cout.unsetf(std::ios::floatfield);
    std::cout.setf(std::ios::fixed);
    std::cout.precision(prec);

    int width = prec + 3;
    for (int i = 0; i < m; i++) {
        for (int j = 0; j < n; j++) {
            std::cout << std::setw(width) << mat[i * lda + j];
            if (j != n - 1) std::cout << "   ";
        }
        std::cout << std::endl;
    }

    std::cout.flags(OldFlags);
    std::cout.precision(6);
}

and Makefile:

CC=icpc -std=c++11 -O3
CFLAGS=-I"${MKLROOT}/include"
LDFLAGS=-L${MKLROOT}/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl

minimal: minimal.cpp
	$(CC) $(CFLAGS) $^ ${LDFLAGS} -o $@

result is

exact eigen values: :
 1.000000000000000    1.000000000000000    2.000000000000000    2.000000000000000    3.000000000000000    3.000000000000000    4.000000000000000    4.000000000000000
exact eigen vectors: :
 1.000000000000000    0.000000000000000   -0.000000000000000   -0.000000000000000   -0.000000000000000   -0.000000000000000   -0.000000000000000   -0.000000000000000
 0.000000000000000    0.000000000000000   -0.000000000000000    1.000000000000000   -0.000000000000000   -0.000000000000000   -0.000000000000000   -0.000000000000000
 0.000000000000000    0.000000000000000   -0.000000000000000    0.000000000000000   -0.000000000000000    1.000000000000000   -0.000000000000000   -0.000000000000000
 0.000000000000000    0.000000000000000   -0.000000000000000    0.000000000000000   -0.000000000000000    0.000000000000000   -0.000000000000000    1.000000000000000
 0.000000000000000    0.000000000000000   -0.000000000000000    0.000000000000000   -0.000000000000000    0.000000000000000    1.000000000000000    0.000000000000000
 0.000000000000000    0.000000000000000   -0.000000000000000    0.000000000000000    1.000000000000000    0.000000000000000    0.000000000000000    0.000000000000000
 0.000000000000000    0.000000000000000    1.000000000000000    0.000000000000000    0.000000000000000    0.000000000000000    0.000000000000000    0.000000000000000
 0.000000000000000    1.000000000000000    0.000000000000000    0.000000000000000    0.000000000000000    0.000000000000000    0.000000000000000    0.000000000000000
Segmentation fault (core dumped)

can any one help me?

Kirill_V_Intel · ‎05-26-2021

Hi @fanzhenhao !

Which version of MKL/oneMKL are you using? I suspect, not the latest oneMKL 2021.2.

Can you try with oneMKL 2021.2?

I've checked your reproducer and I see the crash for oneMKL 2021.1 but not for oneMKL 2021.2. Between 2021.1 and 2021.2 we have fixed an issue for mkl_sparse_?_ev which could cause the crash. The issue was not directly related to the fact of multiple eigenvalues being equal.

Best,
Kirill

Gennady_F_Intel · ‎05-26-2021

I also checked this example on lin/win os with the latest mkl 2021.2. Everything works. Please find out the output results I obtained ( i only added the mkl version into your 8x8 example)

$ ./a.out
Major version: 2021
Minor version: 0
Update version: 2
Product status: Product
Platform: Intel(R) 64 architecture
Processor optimization: Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors
exact eigen values: :
1.000000000000000 1.000000000000000 2.000000000000000 2.000000000000000 3.000000000000000 3.000000000000000 4.000000000000000 4.000000000000000
exact eigen vectors: :
1.000000000000000 0.000000000000000 -0.000000000000000 -0.000000000000000 -0.000000000000000 -0.000000000000000 -0.000000000000000 -0.000000000000000
0.000000000000000 0.000000000000000 -0.000000000000000 1.000000000000000 -0.000000000000000 -0.000000000000000 -0.000000000000000 -0.000000000000000
0.000000000000000 0.000000000000000 -0.000000000000000 0.000000000000000 -0.000000000000000 1.000000000000000 -0.000000000000000 -0.000000000000000
0.000000000000000 0.000000000000000 -0.000000000000000 0.000000000000000 -0.000000000000000 0.000000000000000 -0.000000000000000 1.000000000000000
0.000000000000000 0.000000000000000 -0.000000000000000 0.000000000000000 -0.000000000000000 0.000000000000000 1.000000000000000 0.000000000000000
0.000000000000000 0.000000000000000 -0.000000000000000 0.000000000000000 1.000000000000000 0.000000000000000 0.000000000000000 0.000000000000000
0.000000000000000 0.000000000000000 1.000000000000000 0.000000000000000 0.000000000000000 0.000000000000000 0.000000000000000 0.000000000000000
0.000000000000000 1.000000000000000 0.000000000000000 0.000000000000000 0.000000000000000 0.000000000000000 0.000000000000000 0.000000000000000
#mode found/subspace 5 5
estimated eigen values: :
-38.417097479072368 -32.130736776317789 1.917149680662010 2.069736657079272 2.078685516112698
estimated eigen vectors: :
0.288169623788171 0.084613141773030 0.002296639905500 0.102474464750271 -0.041051918962990 -0.028604202552957 -0.097697834031295 -0.791422317063795
0.274591207424757 -0.066020473633912 -0.039981838675984 0.009812577438120 -0.005807601128427 -0.027234252718160 0.387776058744686 -0.028155758328511
0.171773849005140 -0.195145536917235 -0.071734618996850 -0.011425401968798 -0.038679952834560 -0.102127481931804 -0.646123212257801 0.252186135848476
-0.696702821790961 0.124452896982733 0.043654897995420 0.025202102997553 0.005029418411288 0.041915089373964 0.072836574705455 -0.140420651126935
-0.184858373956651 -0.235909544774302 0.224689111751713 0.131370188779722 -0.006464048518765 0.226302939614696 -0.075046316413411 -0.081881213817877

fanzhenhao · ‎05-26-2021

Thank you for checking the example. But the result seems totally wrong. The exact eigenvalues are [1,1,2,2,3,3,4,4] but the mkl_sparse_d_ev returns the smallest 5 eigenvalues are [-38.417097479072368, -32.130736776317789, 1.917149680662010, 2.069736657079272, 2.078685516112698].

I have updated my mkl to 2021.2.0.296, and supplement a pm[9] code check.

#include <iomanip>
#include <iostream>
#include <string>

#include "mkl.h"
#include "mkl_solvers_ee.h"

void printmat(const std::string &name, int m, int n, int lda, const double *mat, int precision);

int main() {
    int N = 8;
    int ia[9] = {0, 1, 2, 3, 4, 5, 6, 7, 8};
    int ja[8] = {0, 1, 2, 3, 4, 5, 6, 7};
    double a[8] = {1, 2, 3, 4, 4, 3, 2, 1};

    sparse_matrix_t A;
    struct matrix_descr descr = {SPARSE_MATRIX_TYPE_SYMMETRIC, SPARSE_FILL_MODE_UPPER, SPARSE_DIAG_NON_UNIT};
    mkl_sparse_d_create_csr(&A, SPARSE_INDEX_BASE_ZERO, N, N, ia, ia + 1, ja, a);

    double denA[64] = {1, 0, 0, 0, 0, 0, 0, 0,
                       0, 2, 0, 0, 0, 0, 0, 0,
                       0, 0, 3, 0, 0, 0, 0, 0,
                       0, 0, 0, 4, 0, 0, 0, 0,
                       0, 0, 0, 0, 4, 0, 0, 0,
                       0, 0, 0, 0, 0, 3, 0, 0,
                       0, 0, 0, 0, 0, 0, 2, 0,
                       0, 0, 0, 0, 0, 0, 0, 1};

    double egvalues[8];

    LAPACKE_dsyev(LAPACK_ROW_MAJOR, 'V', 'U', N, denA, N, egvalues);
    printmat("exact eigen values:", 1, N, N, egvalues, 15);
    printmat("exact eigen vectors:", N, N, N, denA, 15);

    double spE[8];
    double spEv[64];
    double Res[8];
    int k0 = 5;
    int k;
    int pm[128];
    mkl_sparse_ee_init(pm);
    pm[2] = 1;
    pm[6] = 1;
    pm[7] = 1;

    int info = mkl_sparse_d_ev("S", pm, A, descr, k0, &k, spE, spEv, Res);
    if (info != 0) {
        printf("Routine mkl_sparse_d_ev returns code of ERROR: %i", (int)info);
        return 1;
    }

    switch (pm[9]) {
        case 0:
            std::cout << "The iterations stopped since convergence has been detected." << std::endl;
            break;
        case -1:
            std::cout << "Maximum number of iterations has been reached and even the \
residual norm estimates have not converged." << std::endl;
            break;
        case -2:
            std::cout << "maximum number of iterations has been reached despite the \
residual norm estimates have converged \
(but the true residuals for eigenpairs have not)." << std::endl;
            break;
        case -3:
            std::cout << "the iterations stagnated and even the residual norm estimates \
have not converged." << std::endl;
            break;
        case -4:
            std::cout << "The iterations stagnated while the eigenvalues have converged \
(but the true residuals for eigenpairs do not)." << std::endl;
            break;
        default:
            std::cout << "Unknow errors occurs." << std::endl;
    };

    printf("#mode found/subspace %d %d \n", k, k0);
    printmat("estimated eigen values:", 1, k, k, spE, 15);
    printmat("estimated eigen vectors:", k, N, N, spEv, 15);
}

void printmat(const std::string &name, int m, int n, int lda, const double *mat, int prec) {
    std::ios::fmtflags OldFlags = std::cout.flags();
    std::cout << name << " :" << std::endl;
    std::cout.unsetf(std::ios::floatfield);
    std::cout.setf(std::ios::fixed);
    std::cout.precision(prec);

    int width = prec + 3;
    for (int i = 0; i < m; i++) {
        for (int j = 0; j < n; j++) {
            std::cout << std::setw(width) << mat[i * lda + j];
            if (j != n - 1) std::cout << "   ";
        }
        std::cout << std::endl;
    }

    std::cout.flags(OldFlags);
    std::cout.precision(6);
}

it shows that

exact eigen values: :
 1.000000000000000    1.000000000000000    2.000000000000000    2.000000000000000    3.000000000000000    3.000000000000000    4.000000000000000    4.000000000000000
exact eigen vectors: :
 1.000000000000000    0.000000000000000   -0.000000000000000   -0.000000000000000   -0.000000000000000   -0.000000000000000   -0.000000000000000   -0.000000000000000
 0.000000000000000    0.000000000000000   -0.000000000000000    1.000000000000000   -0.000000000000000   -0.000000000000000   -0.000000000000000   -0.000000000000000
 0.000000000000000    0.000000000000000   -0.000000000000000    0.000000000000000   -0.000000000000000    1.000000000000000   -0.000000000000000   -0.000000000000000
 0.000000000000000    0.000000000000000   -0.000000000000000    0.000000000000000   -0.000000000000000    0.000000000000000   -0.000000000000000    1.000000000000000
 0.000000000000000    0.000000000000000   -0.000000000000000    0.000000000000000   -0.000000000000000    0.000000000000000    1.000000000000000    0.000000000000000
 0.000000000000000    0.000000000000000   -0.000000000000000    0.000000000000000    1.000000000000000    0.000000000000000    0.000000000000000    0.000000000000000
 0.000000000000000    0.000000000000000    1.000000000000000    0.000000000000000    0.000000000000000    0.000000000000000    0.000000000000000    0.000000000000000
 0.000000000000000    1.000000000000000    0.000000000000000    0.000000000000000    0.000000000000000    0.000000000000000    0.000000000000000    0.000000000000000
maximum number of iterations has been reached despite the residual norm estimates have converged (but the true residuals for eigenpairs have not).
#mode found/subspace 5 5 
estimated eigen values: :
-38.417097479072368   -32.130736776317789    1.917149680662010    2.069736657079272    2.078685516112698
estimated eigen vectors: :
 0.288169623788171    0.084613141773030    0.002296639905500    0.102474464750271   -0.041051918962990   -0.028604202552957   -0.097697834031295   -0.791422317063795
 0.274591207424757   -0.066020473633912   -0.039981838675984    0.009812577438120   -0.005807601128427   -0.027234252718160    0.387776058744686   -0.028155758328511
 0.171773849005140   -0.195145536917235   -0.071734618996850   -0.011425401968798   -0.038679952834560   -0.102127481931804   -0.646123212257801    0.252186135848476
-0.696702821790961    0.124452896982733    0.043654897995420    0.025202102997553    0.005029418411288    0.041915089373964    0.072836574705455   -0.140420651126935
-0.184858373956651   -0.235909544774302    0.224689111751713    0.131370188779722   -0.006464048518765    0.226302939614696   -0.075046316413411   -0.081881213817877

it means that maximum number of iterations has been reached despite the residual norm estimates have converged (but the true residuals for eigenpairs have not). Is it a programming bug or the Krylov-Schur Algorithm can not handle matrices with degenerate eigenvalues?

Gennady_F_Intel · ‎05-27-2021

yes, I see the same outputs and it could be the problem with this EV solver. We will check and keep this thread updated.

fanzhenhao · ‎05-27-2021

Thank you very much!