Solved: mkl_cspblas_dcsrgemv gives wrong results

Hanqing_W_ · ‎03-21-2016

I was trying to compute the sparse matrix-vector multiplication using mkl_cspblas_dcsrgemv() in the MKL.

The c compiler comes from composer_xe_2013.0.079 in the parallelstudio_2013 version with all the environment variables set up. (by sourcing the iccvars.sh)

The strange thing is that if I compile the code with the switch:

icc test1.c -g -DMKL_ILP64 -openmp -mkl=parallel -lpthread -lm

mkl_cspblas_dcsrgemv() gives the wrong answer.

However, if I compile the code with:

icc test1.c -g -DMKL_ILP64 -openmp -I${MKLROOT}/include -L${MKLROOT}/lib/intel64 -lmkl_intel_ilp64 -lmkl_core -lmkl_intel_thread -lpthread -lm

the answer is correct.

The source file of test1.c is:

// test1.c
#include <mkl.h>
#include <stdio.h>

int main(int argc, char **argv)
{
MKL_INT n = 5 ;
MKL_INT Ap [ ] = {0, 2, 5, 9, 10, 12} ;
MKL_INT Ai [ ] = { 0, 1, 0, 2, 4, 1, 2, 3, 4, 2, 1, 4} ;
double Ax [ ] = {2., 3., 3., -1., 4., 4., -3., 1., 2., 2., 6., 1.} ;
double b [ ] = {8., 45., -3., 3., 19.} ;
double x [5] ;

printf("sizeof long: %lu\n",sizeof(long));
printf("sizeof MKL_INT: %lu\n",sizeof(MKL_INT));

mkl_cspblas_dcsrgemv("T",&n,Ax,Ap,Ai,b,x);

printf("%f  %f  %f  %f  %f\n",x[0],x[1],x[2],x[3],x[4]);

return 0;
}

More strange thing is that if I compile the whole program with the latter set of switches, the umfpack does not work. To test it, the code test2.c gives an example from the manual of umfpack.

// test2.c
#include <stdio.h>
#include "umfpack.h"

int n = 5 ;
int Ap [ ] = {0, 2, 5, 9, 10, 12} ;
int Ai [ ] = { 0, 1, 0, 2, 4, 1, 2, 3, 4, 2, 1, 4} ;
double Ax [ ] = {2., 3., 3., -1., 4., 4., -3., 1., 2., 2., 6., 1.} ;
double b [ ] = {8., 45., -3., 3., 19.} ;
double x [5] ;

int main (void)
{
double *null = (double *) NULL ;
int i ;
void *Symbolic, *Numeric ;
(void) umfpack_di_symbolic (n, n, Ap, Ai, Ax, &Symbolic, null, null) ;
(void) umfpack_di_numeric (Ap, Ai, Ax, Symbolic, &Numeric, null, null) ;
umfpack_di_free_symbolic (&Symbolic) ;
(void) umfpack_di_solve (UMFPACK_A, Ap, Ai, Ax, x, b, Numeric, null, null) ;
umfpack_di_free_numeric (&Numeric) ;
for (i = 0 ; i < n ; i++) printf ("x [%d] = %g\n", i, x ) ;
return (0) ;
}

If I compile test2.c using (same as the latter set of switches above that can make mkl_cspblas_dcsrgemv work):

icc test14.c -I/mnt/home/software/SuiteSparse/include -I${MKLROOT}/include -Wall -g -openmp -DMKL_ILP64 -L./lib -L/mnt/home/software/SuiteSparse/lib -L${MKLROOT}/lib/intel64 -Wl,-Bstatic -lumfpack -lamd -lsuitesparseconfig -lcholmod -lcolamd -Wl,-Bdynamic -lm -lrt -ldl -lmkl_intel_ilp64 -lmkl_core -lmkl_intel_thread

The umfpack will get SIGSEGV when it calls mkl_blas_xdgemv().

However, with the set of MKL switch that makes mkl_cspblas_dcsrgemv() give wrong answer:

icc test14.c -I/mnt/home/software/SuiteSparse/include -L/mnt/home/software/SuiteSparse/lib -Wl,-Bstatic -lumfpack -lamd -lsuitesparseconfig -lcholmod -lcolamd -Wl,-Bdynamic -lm -lrt -lpthread -ldl

The umfpack works well.

Same problem also happens on Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.1.0.233 Build 20110811.

Since my program needs to have both umfpack and sparse matrix-vector multiplication work together, I am stuck at this point. Could you help locate where the problem is and how to solve it? Is it a bug of umfpack or MKL?

Thank you very much for your help!

Ying_H_Intel · ‎03-21-2016

Hi,

The problem seems be unmatched between the ILP64 interface (MKL_INT is 64bit integer, Long and pointer are 64bit) and LP64 library. Could you please tell us which model you perfer?

Regarding the test1.c

icc test1.c -g -DMKL_ILP64 -openmp -mkl=parallel -lpthread -lm. you have "-DMKL_ILP64", so you hope that MKL_INT is 64bit integer, right?

But -mkl=parallel means -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core

According to the MKL link advisor https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/, the correct one are

cc test1.c -g -DMKL_ILP64 -openmp -I${MKLROOT}/include -L${MKLROOT}/lib/intel64 -lmkl_intel_ilp64 -lmkl_core -lmkl_intel_thread -lpthread -lm

So the answer is correct.

Regarding the // test2.c.

You have int type in the code , which may pass 32bit integer to 64bit ILP model. mkl_blas_xdgemv()

As i guess, one of solution may be just use lp64 model. for example, mkl_intel_lp64

icc test14.c -I/mnt/home/software/SuiteSparse/include -I${MKLROOT}/include -Wall -g -openmp -L./lib -L/mnt/home/software/SuiteSparse/lib -L${MKLROOT}/lib/intel64 -Wl,-Bstatic -lumfpack -lamd -lsuitesparseconfig -lcholmod -lcolamd -Wl,-Bdynamic -lm -lrt -ldl -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread.

Would you like to try this and let us know the result?

BTW, the latest MKL release is MKL 11.3 update 2. get from http://software.intel.com/en-us/articles/intel-mkl/

Best Regards,

Ying H.

View solution in original post

Ying_H_Intel · ‎03-21-2016

Hi,

The problem seems be unmatched between the ILP64 interface (MKL_INT is 64bit integer, Long and pointer are 64bit) and LP64 library. Could you please tell us which model you perfer?

Regarding the test1.c

icc test1.c -g -DMKL_ILP64 -openmp -mkl=parallel -lpthread -lm. you have "-DMKL_ILP64", so you hope that MKL_INT is 64bit integer, right?

But -mkl=parallel means -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core

According to the MKL link advisor https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/, the correct one are

cc test1.c -g -DMKL_ILP64 -openmp -I${MKLROOT}/include -L${MKLROOT}/lib/intel64 -lmkl_intel_ilp64 -lmkl_core -lmkl_intel_thread -lpthread -lm

So the answer is correct.

Regarding the // test2.c.

You have int type in the code , which may pass 32bit integer to 64bit ILP model. mkl_blas_xdgemv()

As i guess, one of solution may be just use lp64 model. for example, mkl_intel_lp64

icc test14.c -I/mnt/home/software/SuiteSparse/include -I${MKLROOT}/include -Wall -g -openmp -L./lib -L/mnt/home/software/SuiteSparse/lib -L${MKLROOT}/lib/intel64 -Wl,-Bstatic -lumfpack -lamd -lsuitesparseconfig -lcholmod -lcolamd -Wl,-Bdynamic -lm -lrt -ldl -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread.

Would you like to try this and let us know the result?

BTW, the latest MKL release is MKL 11.3 update 2. get from http://software.intel.com/en-us/articles/intel-mkl/

Best Regards,

Ying H.

Hanqing_W_ · ‎03-22-2016

Hi, thank you for your informative advice!

You are right. The problem is indeed caused by the ILP64 and LP64. I was hoping to use the ILP64 interface to compile the code, but it seems that the UMFpack cannot work with ILP64. For the switches "-DMKL_ILP64 -openmp -mkl=parallel -lpthread -lm", I got these from the linking advisor as what you just mentioned. But at that time I chose the Intel product as "Intel Composer XE 2013" instead of "Intel MKL 11.0".

So another question is what should I choose for "Intel product" in the linking advisor if I want to link to MKL using icc and gcc in Linux? In other word, when should I use the full set of switches "-I${MKLROOT}/include -L${MKLROOT}/lib/intel64 -lmkl_intel_ilp64 -lmkl_core -lmkl_intel_thread" instead of simply "-mkl=parallel"?

The full set of switches will make mkl_cspblas_dcsrgemv() gives correct results while the shorter switches gives wrong answer. (e.g. in the simple testing of test1.c). What is the reason behind the choice of those linking switches?

I will try to compile the code with LP64 and see if there is any problem.

Thank you.

TimP · ‎03-22-2016

gcc doesn't apply -mkl options, so you must specify -L and -l settings, even if you want a basic dynamic library option. Also, for gcc, you should either choose mkl_gnu_thread in place of mkl_intel_thread (using gcc's own -lgomp), or add -liomp5, as the Link Advisor says. libiomp5 supports OpenMP function calls from linux gcc as well as icc and libmkl_intel_thread.

Link Advisor gives the command line for either icc or linux gcc, and, in the latter case, for either libgomp or libiomp5.

You have also the possibility of using the TBB threading layer in case your application uses either Cilk(tm) Plus or TBB threads but not OpenMP. Documentation about that option looks scanty.

You probably noticed that the preferred OpenMP switch for current icc is -qopenmp, while for gcc it's -fopenmp. When such a switch is passed at link time, it's equivalent to -liomp5 or -lgomp respectively, followed by -lpthread.

Likewise, -mkl=parallel is equivalent to the MKL options you quoted (with lp64), together with -qopenmp.

You've probably figured out ilp64 vs. lp64 by now; all your code would need to use long int rather than int data types in the MKL function calls in order to use ilp64 and support arrays of size more than 2 billion.

Hanqing_W_ · ‎03-22-2016

Thank you for your reply. I tried to compile the code with LP64 interface and it works fine. From what I found in the Suitesparse 4.5.1, it can only work with LP64. So currently it is impossible to use ILP64 interface with UMFpack, which limits the MKL_INT to 32bit integer.

Anyway I think the problem is solved at this moment. Thank all of you for your help.