In-place scaling of sparse matrix

imcinerney · ‎11-15-2022

I am trying to write a method that can do in-place scaling of a sparse matrix. I give the matrix values to MKL as a CSC matrix when creating the sparse matrix, and then I want to scale each entry by the same amount (essentially multiplying by a scaled identity matrix).

Currently, I have this implemented by storing a copy of the value vector I passed in when creating the sparse matrix and then iterating through that vector, and applying the scaling to the elements. That seems to work (as in the sparse matrix reflects the changed values), but I am not doing any optimizations on the matrix right now. I eventually want to do optimizations to improve the matrix-vector multiplication performance, so I would like to transform the scaling operation into an actual MKL operation (since my understanding of the optimize routines is that they can create new internal copies of the value vector in other formats, and this direct access won't update the internal copies).

Since it is a simple scaling, the structure of the matrix is preserved. I am calling the mkl_sparse_sp2m function, and am using only the SPARSE_STAGE_FINALIZE_MULT stage to multiply by a scaled identity matrix. The values inside the CSC vectors of the original matrix are in fact updated and scaled appropriately after that call (I can see that when I print out the matrix contents), but I am unfortunately getting a segfault when I free the matrix. Is there a proper way to handle the memory for this operation to prevent the segfult, or is there another way that I can do this scaling operation using the MKL API (and have the result in A again)?

An example program that shows this is

#include <stdio.h>
#include "mkl_types.h"
#include "mkl_spblas.h"
#include <mkl.h>

#define blas_malloc(alloc_size)  mkl_malloc(alloc_size, 64)
#define blas_free                mkl_free

void print_csc_matrix(      int      num_cols,
                      const MKL_INT* row_ind,
                      const MKL_INT* col_ptr,
                      const double*  val,
                      const char*    name)
{
  MKL_INT j, i, row_start, row_stop;
  MKL_INT k = 0;

  // Print name
  printf("%s :\n", name);

  for (j = 0; j < num_cols; j++) {
    row_start = col_ptr[j];
    row_stop  = col_ptr[j + 1];

    if (row_start == row_stop) continue;
    else {
      for (i = row_start; i < row_stop; i++) {
        printf("\t[%3u,%3u] = %.3g\n", (int)row_ind[i], (int)j, val[k++]);
      }
    }
  }
}

int main () {
    //*******************************************************************************
    //     Definition arrays for sparse representation of the matrix A in
    //     the coordinate format:
    //*******************************************************************************
#define M 5    /* nrows = ncols = M */
#define NNZ 13
    MKL_INT m = M;

    MKL_INT colIndex[M+1] = {   0,               3,              6,              9,        11,       13};
    MKL_INT rows[NNZ]     = {   0,   1,    3,    0,   1,   4,    0,   2,   3,    2,   3,    2,   4};
    double values[NNZ]    = { 1.0 -2.0, -4.0, -1.0, 5.0, 8.0, -3.0, 4.0, 2.0,  6.0, 7.0, 4.0, -5.0};

    //*******************************************************************************
    //    Declaration of local variables :
    //*******************************************************************************

    MKL_INT     i, j;
    sparse_matrix_t cscA;

    sparse_status_t status;
    int exit_status = 0;

    printf( "\n Test program to do in-place CSC matrix scaling\n" );
    printf( "-------------------------------------------------------\n" );

    //*******************************************************************************
    //   Create CSC sparse matrix handle for matrix to scale
    //*******************************************************************************
    status = mkl_sparse_d_create_csc(&cscA,
                                     SPARSE_INDEX_BASE_ZERO,
                                     m,    // number of rows
                                     m,    // number of cols
                                     colIndex,
                                     colIndex+1,
                                     rows,
                                     values);

    if (status != SPARSE_STATUS_SUCCESS) {
        printf(" Error in mkl_sparse_d_create_csc: %d \n", status);
        exit_status = 1;
        goto exit;
    }

    print_csc_matrix(m, rows, colIndex, values, "A_orig");

    //*******************************************************************************
    //   Create an identity to do a left-multiply to scale */
    //*******************************************************************************
    sparse_matrix_t I;

    MKL_INT* Ip = blas_malloc((m+1) * sizeof(MKL_INT));
    MKL_INT* Ii = blas_malloc(m * sizeof(MKL_INT));
    double*  Ix = blas_malloc(m * sizeof(double));

    if(!Ip || !Ii || !Ix){
        printf("Failed to create CSC vectors to perform diagonal scaling.");
        goto exit;
    }

    for(i=0; i < m; i++){
        Ip[i] = i;
        Ii[i] = i;
        Ix[i] = 3.0;
    }
    Ip[m] = m;

    status = mkl_sparse_d_create_csc(&I,
                                     SPARSE_INDEX_BASE_ZERO,
                                     m,      /* Number of rows */
                                     m,      /* Number of columns */
                                     Ip,     /* Array of column start indices (this will only look at the first n entries in p, skipping the last one) */
                                     Ip+1,   /* Array of column end indices (this will skip the first entry to only look at the last n) */
                                     Ii,     /* Array of row indices */
                                     Ix);    /* The actual data */

    if (status != SPARSE_STATUS_SUCCESS) {
        printf("Failed to create identity matrix to perform diagonal scaling.");
        goto exit1;
    }

    print_csc_matrix(m, Ii, Ip, Ix, "S");

    /* Only possible to do general matrix-matrix multiplication, diagonal is not implemented */
    struct matrix_descr descr;
    descr.type = SPARSE_MATRIX_TYPE_GENERAL;

    status = mkl_sparse_sp2m(SPARSE_OPERATION_NON_TRANSPOSE,
                             descr,
                             cscA,
                             SPARSE_OPERATION_NON_TRANSPOSE,
                             descr,
                             I,
                             SPARSE_STAGE_FINALIZE_MULT,
                             &cscA);

    if (status != SPARSE_STATUS_SUCCESS) {
        printf("Failed to perform matrix scaling.");
        exit_status = 1;
        goto exit1;
    }

    print_csc_matrix(m, rows, colIndex, values, "A_scaled");

exit1:
    // Free identity matrix
    mkl_sparse_destroy(I);

    // Free arrays for identity matrix
    blas_free(Ip);
    blas_free(Ii);
    blas_free(Ix);

exit:
    // Release matrix handle and deallocate matrix
    mkl_sparse_destroy(cscA);

    return exit_status;
}

The output of this program when running under GDB is:

 Test program to do in-place CSC matrix scaling
-------------------------------------------------------
A_orig :
	[  0,  0] = -1
	[  1,  0] = -4
	[  3,  0] = -1
	[  0,  1] = 5
	[  1,  1] = 8
	[  4,  1] = -3
	[  0,  2] = 4
	[  2,  2] = 2
	[  3,  2] = 6
	[  2,  3] = 7
	[  3,  3] = 4
	[  2,  4] = -5
	[  4,  4] = 0
S :
	[  0,  0] = 3
	[  1,  1] = 3
	[  2,  2] = 3
	[  3,  3] = 3
	[  4,  4] = 3
[New Thread 0x7fffe88fb6c0 (LWP 1287870)]
[New Thread 0x7fffe3ffe740 (LWP 1287871)]
[New Thread 0x7fffe37fc7c0 (LWP 1287872)]
A_scaled :
	[  0,  0] = -3
	[  1,  0] = -12
	[  3,  0] = -3
	[  0,  1] = 15
	[  1,  1] = 24
	[  4,  1] = -9
	[  0,  2] = 12
	[  2,  2] = 6
	[  3,  2] = 18
	[  2,  3] = 21
	[  3,  3] = 12
	[  2,  4] = -15
	[  4,  4] = 0

Thread 1 "sp2m_test" received signal SIGSEGV, Segmentation fault.
0x00007ffff2821c6e in mkl_serv_free () from /opt/intel/oneapi/mkl/2022.2.0/lib/intel64/libmkl_core.so.2
Missing separate debuginfos, use: dnf debuginfo-install intel-oneapi-openmp-2022.2.0-2022.2.0-8734.x86_64
(gdb) bt
#0  0x00007ffff2821c6e in mkl_serv_free () from /opt/intel/oneapi/mkl/2022.2.0/lib/intel64/libmkl_core.so.2
#1  0x00007fffec449680 in mkl_sparse_d_do_destroy_i4_avx512 ()
   from /opt/intel/oneapi/mkl/2022.2.0/lib/intel64/libmkl_avx512.so.2
#2  0x00000000004016d2 in main () at sp2m_test.c:252

(note line 252 is actually the call to mkl_sparse_destroy(cscA), so it is crashing trying to free the CSC matrix).

Spencer_P_Intel · ‎11-15-2022

Hi imcinerney,

Can I ask some clarifying questions before we dive into a technical discussion of how to do this?

1. Do you want to change the matrix values (using scaling factor) in the middle of your runs? That is, do you need to scale the matrix after you have used the original matrix for some sparse blas operations? Also, is the scale factor known at the start or is it something that is discovered through your run?

2. Which sparse blas operations do you need the scaled matrix (and the unscaled matrix) for? Many of the apis accept an "alpha" scaling factor on the matrix operation which can take care of this.

3. How often are you changing the data (but keeping the same sparsity pattern )? Have you thought about making two matrix handles: one with original data and the other with same sparsity pattern (might even get away with using the same Ip/Ii arrays in both handles since it is read only unless you are using a ) but different Ix data ? Is it a problem to allocate two copies of the values array?

Answering these can help us better understand your needs. Unfortunately, I don't see this sp2m hack ever working and hopefully there are simpler ways to meet your needs

Best,

Spencer

ShanmukhS_Intel · ‎11-22-2022

Hi,

Thanks for posting on Intel Communities.

A gentle reminder:

Has the information provided helped? Could you please get back to us with the queries mentioned in the earlier post so that we could look into your issue further?

Best Regards,

Shanmukh.SS

ShanmukhS_Intel · ‎11-28-2022

Hi Ian,

We assume that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.

Best Regards,

Shanmukh.SS