Solved: mkl_cspblas_dcsrgemv: heap corruption when transposing

Bartosz_W_ · ‎09-09-2014

When I'm using the mkl_cspblas_dcsrgemv function, if I request the input matrix to be transposed, the program heap gets corrupted.

This happens in threaded and sequential MKL, this example program uses sequential to get a cleaner valgrind report, and it illustrates the problem:

//icc dcsrgemv_bug.c -mkl=sequential -g

#include <stdio.h>
#include <stdlib.h>
#include <mkl.h>


void dump_vec(char *s, double *t, MKL_INT n)
{
    printf("SUM %s [ %.1f ... %7.1f ] = %9.1f\n", s, t[0], t[n-1], cblas_dasum(n, t, 1));
}

int main(int argc, char *argv[])
{
    MKL_INT  r = 750;
    MKL_INT  c = 250;

    MKL_INT  v = c*2;

    double  *V = calloc(v,   sizeof(double));
    MKL_INT *C = calloc(v,   sizeof(double));
    MKL_INT *R = calloc(r+1, sizeof(double));

    double  *X = calloc(c,   sizeof(double));
    double  *Y = calloc(r,   sizeof(double));
    double  *Z = calloc(c,   sizeof(double));

    MKL_INT  i;

/*  Matrix for (r, c)

    1  _  _  _  ....  _
    _  2  _  _  ....  _
    _  _  3  _  ....  _
    _  _  _  4  ....  _
    :  :  :  :  .     :
    :  :  :  :     .  :
    _  _  _  _  ....  c

    1  _  _  _  ....  _
    _  1  _  _  ....  _
    _  _  1  _  ....  _
    _  _  _  1  ....  _
    :  :  :  :  .     :
    :  :  :  :     .  :
    _  _  _  _  ....  1

    _  _  _  _  ....  _
    :  :  :  :  .     :
    :  :  :  :     .  :
    _  _  _  _  ....  _
*/

    for (i = 0; i < c; ++i)
    {
        V = i + 1;
        V[i + c] = 1;
        C = i;
        C[i + c] = i;
    }

    for (i = 0; i < v; ++i)
    {
        R = i;
    }

    for (i = v; i <= r; ++i)
    {
        R = v;
    }

    for (i = 0; i < c; ++i)
    {
        X = 1.0;
    }

    dump_vec("X", X, c);
    mkl_cspblas_dcsrgemv("N", &r, V, R, C, X, Y);
    dump_vec("Y", Y, r);
    mkl_cspblas_dcsrgemv("T", &r, V, R, C, Y, Z);
    dump_vec("Z", Z, c);

    return 0;
}

This program crashes on exit due to heap corruption.

When I run this through valgrind, I get the following report:

bawr@core:~/SVDLIBT$ valgrind ./a.out
==1591== Memcheck, a memory error detector
==1591== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==1591== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info
==1591== Command: ./a.out
==1591==
SUM X [ 1.0 ...     1.0 ] =     250.0
SUM Y [ 1.0 ...     0.0 ] =   31625.0
==1591== Invalid write of size 4
==1591==    at 0x9C75CE6: mkl_spblas_lp64_def_dcsr0tg__c__mvout_par (in /opt/intel/composer_xe_2015.0.090/mkl/lib/intel64/libmkl_def.so)
==1591==    by 0x593180A: mkl_spblas_lp64_dcsr0tg__c__mvout_omp (in /opt/intel/composer_xe_2015.0.090/mkl/lib/intel64/libmkl_sequential.so)
==1591==    by 0x58AEA99: mkl_spblas_lp64_mkl_cspblas_dcsrgemv (in /opt/intel/composer_xe_2015.0.090/mkl/lib/intel64/libmkl_sequential.so)
==1591==    by 0x4012B5: main (dcsrgemv_bug.c:78)
==1591==  Address 0x8771f50 is 0 bytes after a block of size 2,000 alloc'd
==1591==    at 0x4C2CC70: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1591==    by 0x401038: main (dcsrgemv_bug.c:24)
==1591==
SUM Z [ 2.0 ... 62501.0 ] = 5239875.0
==1591==
==1591== Process terminating with default action of signal 11 (SIGSEGV)
==1591==  Access not within mapped region at address 0x28
==1591==    at 0x4009F7E: do_lookup_x (dl-lookup.c:98)
==1591==    by 0x400A990: _dl_lookup_symbol_x (dl-lookup.c:737)
==1591==    by 0x400F556: _dl_fixup (dl-runtime.c:111)
==1591==    by 0x4016514: _dl_runtime_resolve (dl-trampoline.S:45)
==1591==    by 0x81DE75F: __call_tls_dtors (cxa_thread_atexit_impl.c:83)
==1591==    by 0x81DE086: __run_exit_handlers (exit.c:40)
==1591==    by 0x81DE194: exit (exit.c:104)
==1591==    by 0x81C3ECB: (below main) (libc-start.c:321)
==1591==  If you believe this happened as a result of a stack
==1591==  overflow in your program's main thread (unlikely but
==1591==  possible), you can try to increase the size of the
==1591==  main thread stack using the --main-stacksize= flag.
==1591==  The main thread stack size used in this run was 8388608.
==1591==
==1591== Process terminating with default action of signal 11 (SIGSEGV)
==1591==  Access not within mapped region at address 0x28
==1591==    at 0x4009F7E: do_lookup_x (dl-lookup.c:98)
==1591==    by 0x400A990: _dl_lookup_symbol_x (dl-lookup.c:737)
==1591==    by 0x400F556: _dl_fixup (dl-runtime.c:111)
==1591==    by 0x4016514: _dl_runtime_resolve (dl-trampoline.S:45)
==1591==    by 0x4A256BC: _vgnU_freeres (in /usr/lib/valgrind/vgpreload_core-amd64-linux.so)
==1591==  If you believe this happened as a result of a stack
==1591==  overflow in your program's main thread (unlikely but
==1591==  possible), you can try to increase the size of the
==1591==  main thread stack using the --main-stacksize= flag.
==1591==  The main thread stack size used in this run was 8388608.
==1591==
==1591== HEAP SUMMARY:
==1591==     in use at exit: 25,640 bytes in 12 blocks
==1591==   total heap usage: 12 allocs, 0 frees, 25,640 bytes allocated
==1591==
==1591== LEAK SUMMARY:
==1591==    definitely lost: 24,208 bytes in 9 blocks
==1591==    indirectly lost: 0 bytes in 0 blocks
==1591==      possibly lost: 0 bytes in 0 blocks
==1591==    still reachable: 1,432 bytes in 3 blocks
==1591==         suppressed: 0 bytes in 0 blocks
==1591== Rerun with --leak-check=full to see details of leaked memory
==1591==
==1591== For counts of detected and suppressed errors, rerun with: -v
==1591== ERROR SUMMARY: 592 errors from 1 contexts (suppressed: 2 from 1)
Segmentation fault (core dumped)

Am I doing something wrong, or is this an MKL bug?

Ying_H_Intel · ‎09-10-2014

Hello,

According to the documentation, the function is supposed the A is a spare square matrix, thus x , y, z should be at least m length.

The mkl_cspblas_?csrgemvroutine performs a matrix-vector operation defined as
y := A*x
or
y := A'*x,
where:
xand yare here:
xand yare vectors,
Ais an m-by-msparse square matrix in the CSR format (3-array variation) with zero-based indexing, A'is
the transpose of A.

the problem looks be here :

In your example, the effective A is 500x250 and other element are zero. It is true that the effective result in Y is first 500 element and in Z , it is the first 250 element, but the function will suppose there are 750x750, and allowed access space of x, y , z should be at least double *Z = calloc(750, sizeof(double));

How about if you change them to 750 ?

and one more problem

	
				MKL_INT *C = calloc(r,   sizeof(MKL_INT));

		
				22
				MKL_INT *R = calloc(r+1, sizeof(MKL_INT));
			
					double  *V = calloc(750,   sizeof(double));
				
					21
					    MKL_INT *C = calloc(750,   sizeof(MKL_INT));
				
					22
					    MKL_INT *R = calloc(r+1, sizeof(MKL_INT));
				
					23
					 
					24
					    double  *X = calloc(750,   sizeof(double));
				
					25
					    double  *Y = calloc(750,   sizeof(double));
				
					26
					    double  *Z = calloc(750,   sizeof(double));
				
Best Regards,
Ying

View solution in original post

Bartosz_W_ · ‎09-09-2014

(If I run this with threaded MKL, I get the same invalid write happening just out of bounds of the Z array.)

Ying_H_Intel · ‎09-10-2014

Hello,

According to the documentation, the function is supposed the A is a spare square matrix, thus x , y, z should be at least m length.

The mkl_cspblas_?csrgemvroutine performs a matrix-vector operation defined as
y := A*x
or
y := A'*x,
where:
xand yare here:
xand yare vectors,
Ais an m-by-msparse square matrix in the CSR format (3-array variation) with zero-based indexing, A'is
the transpose of A.

the problem looks be here :

In your example, the effective A is 500x250 and other element are zero. It is true that the effective result in Y is first 500 element and in Z , it is the first 250 element, but the function will suppose there are 750x750, and allowed access space of x, y , z should be at least double *Z = calloc(750, sizeof(double));

How about if you change them to 750 ?

and one more problem

	
				MKL_INT *C = calloc(r,   sizeof(MKL_INT));

		
				22
				MKL_INT *R = calloc(r+1, sizeof(MKL_INT));
			
					double  *V = calloc(750,   sizeof(double));
				
					21
					    MKL_INT *C = calloc(750,   sizeof(MKL_INT));
				
					22
					    MKL_INT *R = calloc(r+1, sizeof(MKL_INT));
				
					23
					 
					24
					    double  *X = calloc(750,   sizeof(double));
				
					25
					    double  *Y = calloc(750,   sizeof(double));
				
					26
					    double  *Z = calloc(750,   sizeof(double));
				
Best Regards,
Ying

Bartosz_W_ · ‎09-10-2014

Thanks!

I got confused about the fact that the simplified interface operates on a square matrix only, whereas the general interface allows a rectangular matrix (and got further off-track because the non-transposing version did appear to work correctly). The invalid sizeof didn't happen in my actual code, and in testing it got hidden dur to me using the ILP64 interface.

Instead of resizing the vectors I used the general interface with mkl_dcsrmv (and matrix property string of "G__C__"), and that fixed my problem.

Ying_H_Intel · ‎09-11-2014

Hi Bartosz,

Thanks for sharing. Right, the general interface with mkl_dcsrmv can be a good replacement.

Best Regards,

Ying