Bug in mkl_sparse_d_mm

Mishra__Ashirbad · ‎07-18-2019

Hi All,

I think that the result of "mkl_sparse_d_mm" is incorrect as evident for the following code. The code basically tries to multiply the Identity matrix stored in CSR format with a dense matrix and checks whether the result and the input matrix are the same. This also the case in ROW MAJOR layout as well.

int main() {

   MKL_INT n = 5;
   MKL_INT rows_start[5] = {0,1,2,3,4};
   MKL_INT rows_end[5]   = {1,2,3,4,5};
   MKL_INT col_indx[5]   = {0,1,2,3,4};
   double values[5]     = {1,1,1,1,1};
   sparse_matrix_t       csrA = NULL;
   sparse_index_base_t    indexing;
   struct matrix_descr    descr_type_gen;
   descr_type_gen.type = SPARSE_MATRIX_TYPE_GENERAL;
   mkl_sparse_d_create_csr ( &csrA, SPARSE_INDEX_BASE_ZERO, n, n, rows_start, rows_end, col_indx, values);
   MKL_INT row, col;
   sparse_index_base_t indextype;
   MKL_INT * bi, *ei, *indx;
   double *rv;
   mkl_sparse_d_export_csr(csrA, &indextype, &row, &col, &bi, &ei, &indx, &rv);
   printf("Input csr matrix\n");
   for(long i=0; i < n; i++) {
        printf("%ld ----> ",i);
        for(long l = 0; l < n; l++) {
                bool flag = false;
                for(long j=bi; j < ei; j++) {
                        if(indx == l) {
                                flag = true;
                                printf("%.0lf ",rv);
                                break;
                        }
                }
                if(!flag)
                        printf("%d ",0);

        }       printf("\n");
    }
    double *matrix = (double *) mkl_malloc ( n*2*sizeof(double), ALIGN);
    double *result = (double *) mkl_calloc ( n*2, sizeof(double), ALIGN);
    for(long i=0; i < n; i++) {
        matrix = i+1;
        matrix[i + n] = n-i;
    }
    printf("Input dense matrix\n");
    for(long i=0; i < n; i++) {
        for(long j=0 ; j < 2; j++) {
                printf("%.2lf ", matrix[i + j*n]);
        }
        printf("\n");
    }
    mkl_sparse_d_mm( SPARSE_OPERATION_NON_TRANSPOSE, 1, csrA, descr_type_gen, SPARSE_LAYOUT_COLUMN_MAJOR, matrix, 2, n, 0, result, n);
  printf("Outpur result matrix\n");
  for(long i=0; i < n; i++) {
        for(long j=0 ; j < 2; j++) {
                printf("%.2lf ", result[i + j*n]);
        }
        printf("\n");
  }
  mkl_free(matrix);
  mkl_free(result);

  return 0;
}

The output of the above code is :-

Input csr matrix
0 ----> 1 0 0 0 0
1 ----> 0 1 0 0 0
2 ----> 0 0 1 0 0
3 ----> 0 0 0 1 0
4 ----> 0 0 0 0 1
Input dense matrix
1.00 5.00
2.00 4.00
3.00 3.00
4.00 2.00
5.00 1.00
Output result matrix
1.00 5.00
0.00 0.00
1.00 5.00
0.00 0.00
2.00 4.00

I run the above with the following options:

icc -I. -I/opt/intel/compilers_and_libraries_2019.3.199/linux/mkl/include -DMKL_ILP64 -Wall -O3 -qopenmp -qopenmp-simd -mkl=parallel -std=c++11 -Wno-attributes mkl_sparse_d_mm.cpp -o mkl_sparse_mm -L/opt/intel/compilers_and_libraries_2019.3.199/linux/mkl/lib/intel64_lin -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl

and I also tried with a different intel mkl version as follows:

icc -I. -I/opt/intel/compilers_and_libraries_2018.5.274/linux/mkl/include -DMKL_ILP64 -Wall -O3 -qopenmp -qopenmp-simd -mkl=parallel -std=c++11 -Wno-attributes mkl_sparse_d_mm.cpp -o mkl_sparse_mm -L/opt/intel/compilers_and_libraries_2018.5.274/linux/mkl/lib/intel64_lin -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl

Mishra__Ashirbad · ‎07-18-2019

This is also the case in ROW MAJOR layout.

Ruqiu_C_Intel · ‎07-24-2019

Hi,

I have double checked with your sample code to verify, please use -DMKL_LP64 to replace -DMKL_ILP64, then the results will be your expectation. Or link library with libmkl_intel_ilp64.a for ILP.

There is an article about Using the ILP64 Interface vs. LP64 Interface at the link below.

https://software.intel.com/en-us/mkl-macos-developer-guide-using-the-ilp64-interface-vs-lp64-interface

Hopefully it's useful for you.

Best Regards,

Ruqiu

Kirill_V_Intel · ‎07-24-2019

Hello,

Let me slightly correct the explanation from the previous answer (there is no use of -DMKL_LP64 flag) and clarify what happened.

When you are linking MKL, you have several options:

1) Use MKL Link Line Advisor (https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor). There, you can choose LP64 or ILP64 interface (32-bit or 64-bit integers as an interface layer).
In this case, you don't use flags -mkl=parallel but rather what the Advisot tells you.
For example, it could look like
-Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a -Wl,--end-group -liomp5 -lpthread -lm -ldl
for the link line (this one for static linking and LP64 interface but you can choose whatever you want there).

2) Or, you use a compiler option -mkl=parallel, following https://software.intel.com/en-us/mkl-linux-developer-guide-using-the-mkl-compiler-option. This option always links against the LP64 interface under the hood. In this case, you don't link explicitly against other libs.

So, what happened in your case is that option -mkl=parallel was linking against LP64 while you also manually added ILP64 libs to the link line and these options simply contradicted each other.

I hope this helps.

Best,
Kirill

Mishra__Ashirbad · ‎07-24-2019

Thanks, Ruqiu.

Thanks, Kirill.

I need the ILP64 interface for long storage type for the indices.

The solution with the static library works and the results seem correct.

Thanks a lot.

Mishra__Ashirbad · ‎07-24-2019

Voronin, Kirill (Intel) wrote:
Hello,
Let me slightly correct the explanation from the previous answer (there is no use of -DMKL_LP64 flag) and clarify what happened.
When you are linking MKL, you have several options:
1) Use MKL Link Line Advisor (https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor). There, you can choose LP64 or ILP64 interface (32-bit or 64-bit integers as an interface layer).
In this case, you don't use flags -mkl=parallel but rather what the Advisot tells you.
For example, it could look like
-Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a -Wl,--end-group -liomp5 -lpthread -lm -ldl
for the link line (this one for static linking and LP64 interface but you can choose whatever you want there).
2) Or, you use a compiler option -mkl=parallel, following https://software.intel.com/en-us/mkl-linux-developer-guide-using-the-mkl.... This option always links against the LP64 interface under the hood. In this case, you don't link explicitly against other libs.
So, what happened in your case is that option -mkl=parallel was linking against LP64 while you also manually added ILP64 libs to the link line and these options simply contradicted each other.
I hope this helps.
Best,
Kirill

Hi Kirill,

Should I use the -mkl flag at all? The sample code that I had mentioned was only for testing. Nevertheless, I use the mkl routines of sparse blas, cblas and eigensolvers for a large codebase. I use openmp with simd options and looking at performance testing.

The Intel MKL advisor doesn't mention using -qopenmp and/or -qopenmp-simd or optimization flags such as -O2 while compiling with icc.

Is there a way to know which options might conflict with the ILP64 interface which need for the larger storage indices.

Thanks,

Ashirbad

Kirill_V_Intel · ‎07-24-2019

Hi Ashirbad,

No, you shouldn't use -mkl flag, since it works with LP64 interface only.

Basically, you need to link against the ILP64 interface so that MKL internally can find correct functionality working with 64-bit integers and you need -DMKL_ILP64 flag to have MKL_INT as 64-bit integer type. So, compiler options you have mentioned work with ILP64 interface without any trouble. For example, -O2 option will affect only how your code is compiled and not the performance of Intel MKL, since you have Intel MKL as a library and just link against it.
MKL Link Line Advisor thus only shows what you should add to your compile and link line in order to use MKL.

Also, note that there are a lot of environment settings which can affect performance. For example, for optimal performance with OpenMP threading, you need also to think about setting correct KMP_AFFINITY and numa control options.

Best,
Kirill

Mishra__Ashirbad · ‎07-25-2019

Voronin, Kirill (Intel) wrote:
Hi Ashirbad,
No, you shouldn't use -mkl flag, since it works with LP64 interface only.
Basically, you need to link against the ILP64 interface so that MKL internally can find correct functionality working with 64-bit integers and you need -DMKL_ILP64 flag to have MKL_INT as 64-bit integer type. So, compiler options you have mentioned work with ILP64 interface without any trouble. For example, -O2 option will affect only how your code is compiled and not the performance of Intel MKL, since you have Intel MKL as a library and just link against it.
MKL Link Line Advisor thus only shows what you should add to your compile and link line in order to use MKL.
Also, note that there are a lot of environment settings which can affect performance. For example, for optimal performance with OpenMP threading, you need also to think about setting correct KMP_AFFINITY and numa control options.

Best,
Kirill

Thanks a lot, Kirill, this is helpful.