Solved: Re: Correct way to use the two-stage algorithm mkl_sparse_sypr

haoyang_liu · ‎04-07-2022

Hi all,

I have an application to compute a series of sparse matrix products A' * A repeatedly. Each matrix A has the same non-zero pattern but different numeric values. After searching the document, the two-stage algorithm mkl_sparse_sypr seems to be helpful in this scenario. However, I failed to figure out the correct way to use the routine:

method 1:

1) Call mkl_sparse_sypr with request = SPARSE_STAGE_NNZ_COUNT and request = SPARSE_STAGE_FINALIZE_MULT_NO_VAL for the first A.

2) Call mkl_sparse_sypr with request = SPARSE_STAGE_FINALIZE_MULT for the first and the rest matrices.

The call to mkl_sparse_sypr with request = SPARSE_STAGE_FINALIZE_MULT_NO_VAL fails with error code "SPARSE_STATUS_NOT_SUPPORTED"

method 2:

1) Call mkl_sparse_sypr with request = SPARSE_STAGE_NNZ_COUNT for the first A.

2) Call mkl_sparse_sypr with request = SPARSE_STAGE_FINALIZE_MULT for the first and the rest matrices.

The results are correct for each A and the computation cost is reduced from the second matrix. However, it seems that there are memory leakages as more matrices are processed.

I am using oneAPI 2022.0. Any suggestions?

My testing codes:

#include <stdio.h>
#include <stdlib.h>
#include "mkl_spblas.h"

#define M 20
#define N 10

int main(){
    MKL_INT ir[] = {0, 1, 2, 3, 5, 6, 7, 9, 9, 9, 11, 12, 13, 14, 16, 17, 19, 19, 19, 20, 20};
    MKL_INT jc[] = {6, 0, 3, 3, 9, 5, 7, 4, 5, 0, 2, 5, 1, 2, 3, 7, 5, 2, 8, 9};
    double pr[M];

    sparse_matrix_t A, B, C;
    sparse_status_t stat;
    struct matrix_descr descrB;
    descrB.type = SPARSE_MATRIX_TYPE_DIAGONAL;
    descrB.diag = SPARSE_DIAG_UNIT;

    int init = 0;

    while (1){
        // create random pr
        for (int i = 0; i < M; ++i){
            pr[i] = 1.0 * rand() / RAND_MAX;
        }

        // create A
        stat = mkl_sparse_d_create_csr(
                &A,
                SPARSE_INDEX_BASE_ZERO,
                M,
                N,
                ir,
                ir + 1,
                jc,
                pr);

        if (stat != SPARSE_STATUS_SUCCESS){
            printf("create csr error.\n");
            return 1;
        }

        if (!init){
            stat = mkl_sparse_sypr(SPARSE_OPERATION_TRANSPOSE, A, B, descrB, &C, SPARSE_STAGE_NNZ_COUNT);
            if (stat != SPARSE_STATUS_SUCCESS){
                printf("sypr stage nnz_count error.\n");
                return 1;
            }

            //stat = mkl_sparse_sypr(SPARSE_OPERATION_TRANSPOSE, A, B, descrB, &C, SPARSE_STAGE_FINALIZE_MULT_NO_VAL);
            //if (stat != SPARSE_STATUS_SUCCESS){
            //    printf("sypr stage finalize_mult_no_val error.\n");
            //    return 1;
            //}

            init = 1;
        }

        stat = mkl_sparse_sypr(SPARSE_OPERATION_TRANSPOSE, A, B, descrB, &C, SPARSE_STAGE_FINALIZE_MULT);
        if (stat != SPARSE_STATUS_SUCCESS){
            printf("sypr stage finalize_mult error.\n");
            return 1;
        }

        // destroy
        mkl_sparse_destroy(A);
        A = NULL;
    }

    return 0;
}

Ruqiu_C_Intel · ‎04-21-2022

Hi Haoyang,

Thank you again for posting your concern in the community. Please check the release notes for this fixed when future release is available.

This issue is closing and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.

Have a nice day!

Regards,

Ruqiu

View solution in original post

VidyalathaB_Intel · ‎04-08-2022

Hi,

Thanks for reaching out to us.

Could you please let us know the environment details (OS) on which you are running your application so that we can check this issue from our end as well?

Please let us know how did you observe the memory leaks? Is it like, after running for some time is it getting killed?

From the provided code, we could see that the mkl calls are placed in while(1) (infinite loop), Could you please confirm whether your application needed to be run continuously and there is no certain count on how many times the process should be done?

Regards,

Vidya.

haoyang_liu · ‎04-08-2022

Hi Vidya,

My OS is ubuntu 20.04 (gcc 9.3). My application is a iterative numerical method that requires the computation of A' * A in each iteration. Thus the process is performed finite times and there is no need to run continuously. However, the matrix A could be large in many cases. If there are memory leaks the application is very likely to be killed before producing the final output.

The provided codes are just for illustration purpose. Just compile the codes and run on a computer you can observe the memory is rapidly consumed. Several remarks on the codes:

1) I hope to reuse the storage of the matrix C in each loop. Therefore I do not call mkl_sparse_destroy(C).

2) Upon calling mkl_sparse_sypr with request = SPARSE_STAGE_FINALIZE_MULT and a precomputed C, I expect that MKL only updates the numeric value of C. In other words, in this case MKL thinks the symbolic multiplication is done and won't touch the non-zero pattern.

I have no idea how MKL manages its internal memory. Thus the program may not work as I expect in remark 2).

Thanks,

Haoyang.

Spencer_P_Intel · ‎04-08-2022

I am a little confused why you are using the symmetric product (C = A' * B * A where B is symmetric) mkl_sparse_sypr() to do the C = A'*A operation. It would be better to use the sparse matric multiplication (mkl_sparse_sp2m() C = op(A) * op(B) ) which also has the two stage support as in symmetric product . Technically we also have mkl_sparse_syrk() which directly does your operation C = op(A) * A but it doesn't have the same two stage support.

haoyang_liu · ‎04-09-2022

Hi Spencer,

I've tried the routine mkl_sparse_sp2m() and it works flawlessly in my application - no memory leaks are observed.

My only concern is that mkl_sparse_sp2m() is designed for general sparse marix multiplication thus the output C is usually a general matrix. However A' * A results in a symmetric matrix and there is no need to compute half of the pattern/values. I am not sure whether mkl_sparse_sp2m() would detect the symmetry and automatically reduce the computational cost. On the other hand, mkl_sparse_sypr() calculates symmetric product and I expect to save some time.

The reason not to use mkl_sparse_syrk() is that it does not support the two-stage algorithm.

Thanks,

Haoyang

VidyalathaB_Intel · ‎04-10-2022

Hi Haoyang,

Thanks for sharing the details.

We are able to reproduce the issue with respect to mkl_sparse_sypr() function call and could see that the program is getting killed after some time. (will let you know regarding this)

>>I've tried the routine mkl_sparse_sp2m() and it works flawlessly in my application - no memory leaks are observed

Meanwhile, could you please provide us with the sample reproducer with respect to mkl_sparse_sp2m() function if possible? (Just want to see what actually differs and causes the memory leaks)

Regards,

Vidya.

haoyang_liu · ‎04-10-2022

Hi Vidya,

Here is my sample code:

#include <stdio.h>
#include <stdlib.h>
#include "mkl_spblas.h"

#define M 20
#define N 10

int main(){
    MKL_INT ir[] = {0, 1, 2, 3, 5, 6, 7, 9, 9, 9, 11, 12, 13, 14, 16, 17, 19, 19, 19, 20, 20};
    MKL_INT jc[] = {6, 0, 3, 3, 9, 5, 7, 4, 5, 0, 2, 5, 1, 2, 3, 7, 5, 2, 8, 9};
    double pr[M];

    sparse_matrix_t A, C;
    sparse_status_t stat;
    struct matrix_descr descrA;
    descrA.type = SPARSE_MATRIX_TYPE_GENERAL;

    int init = 0;

    while (1){
        // create random pr
        for (int i = 0; i < M; ++i){
            pr[i] = 1.0 * rand() / RAND_MAX;
        }

        // create A
        stat = mkl_sparse_d_create_csr(
                &A,
                SPARSE_INDEX_BASE_ZERO,
                M,
                N,
                ir,
                ir + 1,
                jc,
                pr);

        if (stat != SPARSE_STATUS_SUCCESS){
            printf("create csr error.\n");
            return 1;
        }

        if (!init){
            stat = mkl_sparse_sp2m(
                    SPARSE_OPERATION_TRANSPOSE, descrA, A,
                    SPARSE_OPERATION_NON_TRANSPOSE, descrA, A,
                    SPARSE_STAGE_NNZ_COUNT, &C);
            if (stat != SPARSE_STATUS_SUCCESS){
                printf("sypr stage nnz_count error.\n");
                return 1;
            }

            // note: if switch to sypr, the following operation
            // fails with stat = SPARSE_STATUS_NOT_SUPPORTED
            stat = mkl_sparse_sp2m(
                    SPARSE_OPERATION_TRANSPOSE, descrA, A,
                    SPARSE_OPERATION_NON_TRANSPOSE, descrA, A,
                    SPARSE_STAGE_FINALIZE_MULT_NO_VAL, &C);
            if (stat != SPARSE_STATUS_SUCCESS){
                printf("sypr stage finalize_mult_no_val error.\n");
                return 1;
            }

            init = 1;
        }

        stat = mkl_sparse_sp2m(
                SPARSE_OPERATION_TRANSPOSE, descrA, A,
                SPARSE_OPERATION_NON_TRANSPOSE, descrA, A,
                SPARSE_STAGE_FINALIZE_MULT, &C);
        if (stat != SPARSE_STATUS_SUCCESS){
            printf("sypr stage finalize_mult error.\n");
            return 1;
        }

        // destroy
        mkl_sparse_destroy(A);
        A = NULL;
    }

    return 0;
}

Thanks,

Haoyang

VidyalathaB_Intel · ‎04-12-2022

Hi Haoyang,

Thanks for the information.

We are working on your issue, we will get back to you soon with an update.

Regards,

Vidya.

Ruqiu_C_Intel · ‎04-19-2022

Hello Haoyang,

Thank you for contacting us.

The root cause for memory leakage has been found, we are fixing it now. The fixed will be available in our future release.

Currently, sp2m would be the best choice if 2-stage approach is required, and indeed it costs more computing time, since the full matrix is produced, not only one triangular (as it should be in sypr/syrk).

Best Regards,

Ruqiu

haoyang_liu · ‎04-19-2022

Hi Ruqiu

Thanks for the update. Look forward to the future release!

Haoyang

Ruqiu_C_Intel · ‎04-21-2022

Hi Haoyang,

Thank you again for posting your concern in the community. Please check the release notes for this fixed when future release is available.

This issue is closing and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.

Have a nice day!

Regards,

Ruqiu