- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi all,

the program below implements the two stage sparse matrix multiplication in mkl_sparse_sp2m. After stage 1 the handle of C is destroyed and C is rebuild as a csr matrix with all necessary memory allocated. Then stage 2 is called. For some unknown reason the column pointer of C is all zero after the second stage although I understand the manual such that stage 2 actually generates the column indices. Further, the values in the output array do not seem to be sorted within row ........ which hampers any subsequent use in mkl sparse solvers.

Any comment elucidating the cause of the problem is much appreciated.

Operation system was linux, compiler and mkl version were 2020, update 2, compiler line was

ifort -fpp -i8 -mkl test.f90

Thanks a lot.

```
#include "mkl_spblas.f90"
program test
use mkl_spblas
use, intrinsic :: iso_fortran_env, only: int64, real64,int32
use iso_c_binding, only: c_ptr, c_int, c_long_long, c_double,&
& c_f_pointer
implicit none
!!the matrix
!!0.5,0.0,0.1
!!0.0,0.4,0.3
!!0.4,0.1,0.2
!!colum positions (/1,3,2,3,1,2,3/)
!!row positions (/1,3,5,8/)
!!values (/0.5,0.1,0.4,0.3,0.4,0.1,0.2/)
type(sparse_matrix_t), allocatable :: a,b,c
type(matrix_descr), allocatable :: a_d,b_d
integer(int64), allocatable :: rp1(:),rp2(:),rp3(:), cp1(:),cp2(:),cp3(:)
real(real64), allocatable :: val1(:), val2(:),val3(:)
integer(int32) :: stat,index,a1,a2
integer :: nrows,ncols
integer(int64), pointer, contiguous :: rpp1(:)=>null(),rpp2(:)=>null()
type(c_ptr) :: rows_start,rows_end,col_indx,values
real(real64), allocatable :: rt(:,:)
!!@@@@@@@@@@@@@@@@@
!!build a dense matrix for comparison
allocate(rt(3,3))
rt(1,:)=(/0.5,0.0,0.1/)
rt(2,:)=(/0.0,0.4,0.3/)
rt(3,:)=(/0.4,0.1,0.2/)
rt=matmul(rt,rt)
!!@@@@@@@@@@@@@@@@@@@@@@
!!initialize a an b and descriptors a_d and b_d
allocate(a,b,c,a_d,b_d)
a_d%type=sparse_matrix_type_general
a_d%mode=SPARSE_FILL_MODE_FULL
a_d%diag=sparse_diag_non_unit
b_d%type=sparse_matrix_type_general
b_d%mode=SPARSE_FILL_MODE_FULL
b_d%diag=sparse_diag_non_unit
!!@@@@@@@@@@@@@@@@@@@@@@@@@y
!!create matrices a and b
rp1=(/1,3,5,8/);rp2=(/1,3,5,8/)
cp1=(/1,3,2,3,1,2,3/);cp2=(/1,3,2,3,1,2,3/)
val1=(/0.5,0.1,0.4,0.3,0.4,0.1,0.2/)
val2=(/0.5,0.1,0.4,0.3,0.4,0.1,0.2/)
stat=MKL_SPARSE_D_CREATE_CSR(&
&a,&
&sparse_index_base_one,&
&3,&
&3,&
&rp1(1:(size(rp1)-1)),&
&rp1(2:size(rp1)),&
&cp1,&
&val1)
if(stat/=0) call se("err_1")
stat=MKL_SPARSE_D_CREATE_CSR(&
&b,&
&sparse_index_base_one,&
&3,&
&3,&
&rp2(1:(size(rp2)-1)),&
&rp2(2:size(rp2)),&
&cp2,&
&val2)
if(stat/=0) call se("err_2")
!!@@@@@@@@@@@@@@@@@@@@@@@@@
!!get nnz
stat=mkl_sparse_sp2m(&
&opA=sparse_operation_non_transpose,&
&descrA=a_d,&
&A=a,&
&opB=sparse_operation_non_transpose,&
&descrB=b_d,&
&B=b,&
&req=SPARSE_STAGE_NNZ_COUNT,&
&C=c)
if(stat/=0) call se("err_2")
index=sparse_index_base_one
stat=MKL_SPARSE_D_EXPORT_CSR(&
&c,&
&index,&
&nrows,&
&ncols,&
&rows_start,&
&rows_end,&
&col_indx,&
&values)
if(stat/=0) call se("err_3")
call c_f_pointer(rows_start,rpp1,[nrows])
call c_f_pointer(rows_end,rpp2,[nrows])
write(*,"(*(g0:"",""))") rpp1
write(*,"(*(g0:"",""))") rpp2
!!@@@@@@@@@@@@@@@@@@@@
!!rebuild a new C with all necessary memory
stat=mkl_sparse_destroy(c)
if(stat/=0) call se("err_4")
deallocate(c);allocate(c)
allocate(rp3(size(rpp1)+1));rp3(1:size(rpp1))=rpp1
rp3(size(rp3))=rpp2(size(rpp2))
allocate(cp3(rpp2(size(rpp2))-1),source=0_int64)
allocate(val3(rpp2(size(rpp2))-1),source=0._real64)
stat=MKL_SPARSE_D_CREATE_CSR(&
&c,&
&sparse_index_base_one,&
&3,&
&3,&
&rp3(1:(size(rp3)-1)),&
&rp3(2:size(rp3)),&
&cp3,&
&val3)
if(stat/=0) call se("err_5")
!!@@@@@@@@@@@@@@@@@@@@@@@
!!perform multiplication using preallocated memory
stat=mkl_sparse_sp2m(&
&opA=sparse_operation_non_transpose,&
&descrA=a_d,&
&A=a,&
&opB=sparse_operation_non_transpose,&
&descrB=b_d,&
&B=b,&
&req=SPARSE_STAGE_FINALIZE_MULT,&
&C=c)
if(stat/=0) call se("err_6")
!!column pointer contains zeros only
write(*,"(*(g0:"",""))") cp3
!!result vector seems to be unordered within row
write(*,"(*(g0:"",""))") val3
!!correct result from dense matrix operation
write(*,"(*(g0:"",""))") ((rt(a1,a2),a2=1,size(rt,2)),a1=1,size(rt,1))
!!@@@@@@@@@@@@@@@@@@@@@
contains
subroutine se(c0e)
character(*), intent(in) :: c0e
write(*,*) c0e
call exit(1)
end subroutine se
end program test
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello @may_ka,

Current API of sp2m does not allow your usage model. Specifically, it is not allowed to allocate memory for the output matrix on the side of the user.

This is why you see that column indices are not computed in your example since sp2m detects an already existing pointer for the column indices (which is the one you allocated). Also, if you allow MKL to allocate the memory for the output (remove your export, destroy and create_csr calls inbetween the phases), then the results will be the same for parallel vs sequential code and the same which you called "correct".

The API of sp2m however allows you to recompute the values if the structure remain unchanged (via SPARSE_STAGE_FINALIZE_MULT).

Notes:

1) I looked at the docs and would agree that it is not clear, we will try to improve it.

2) We already have a request for allowing the user to allocate all memory for the output matrix. You can submit a feature request if you want to increase priority of this task.

3) If you were a C user, I would suggest looking at mkl_graph_mxm https://software.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/graph-routines/graph-functionality/graph-operations/mkl-graph-mxm.html which covers most of the functionality of mkl_sparse_sp2m and allows the user to allocate the output memory, via a multistage executon. It is supported starting with oneMKL beta-08 (and the coming MKL 2020u3).

I hope this helps. Let me know if you have further questions/comments.

Best,

Kirill

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Further to my above post, the above program yields different results when linked against threaded and non-threaded version of the mkl. with makefile

```
INTELMKL=/opt/intel/compilers_and_libraries_2020.2.254/linux/
MKLROOT=$(INTELMKL)/mkl
MKLPATH=$(MKLROOT)/lib/intel64
MKLPARA=$(MKLPATH)/libmkl_blas95_ilp64.a \
$(MKLPATH)/libmkl_lapack95_ilp64.a -Wl,--start-group \
$(MKLPATH)/libmkl_intel_ilp64.a \
$(MKLPATH)/libmkl_core.a \
$(MKLPATH)/libmkl_intel_thread.a -Wl,--end-group -lpthread -lm -ldl
MKLSEQ=$(MKLPATH)/libmkl_blas95_ilp64.a \
$(MKLPATH)/libmkl_lapack95_ilp64.a -Wl,--start-group \
$(MKLPATH)/libmkl_intel_ilp64.a \
$(MKLPATH)/libmkl_core.a \
$(MKLPATH)/libmkl_sequential.a -Wl,--whole-archive -lpthread -Wl,--no-whole-archive -lm -ldl
source:=test.f90
all: para seq
para:
ifort -i8 -mkl=parallel -fpp -o para test.f90 $(MKLPARA)
seq:
ifort -i8 -mkl=sequential -fpp -o seq test.f90 $(MKLSEQ)
clean:
rm para seq
```

version "seq" yields a result vector (the value vector of C) of:

```
.2900000011920929
.7000000134110451E-01
.1000000029802323E01
.3100000129640104
.1800000098347665
.000000000000000
.3400000071525574
.1100000040233136
.000000000000000
```

whereas version "para" yields:

```
.2900000011920929
.7000000134110451E-01
.1000000029802323E01
.1900000064074994
.1800000098347665
.1200000065565110
.2800000053644180
.1100000040233136
.6000000178813936E-01
```

where the latter lines up with the results from dense multiplication.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello @may_ka,

Current API of sp2m does not allow your usage model. Specifically, it is not allowed to allocate memory for the output matrix on the side of the user.

This is why you see that column indices are not computed in your example since sp2m detects an already existing pointer for the column indices (which is the one you allocated). Also, if you allow MKL to allocate the memory for the output (remove your export, destroy and create_csr calls inbetween the phases), then the results will be the same for parallel vs sequential code and the same which you called "correct".

The API of sp2m however allows you to recompute the values if the structure remain unchanged (via SPARSE_STAGE_FINALIZE_MULT).

Notes:

1) I looked at the docs and would agree that it is not clear, we will try to improve it.

2) We already have a request for allowing the user to allocate all memory for the output matrix. You can submit a feature request if you want to increase priority of this task.

3) If you were a C user, I would suggest looking at mkl_graph_mxm https://software.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/graph-routines/graph-functionality/graph-operations/mkl-graph-mxm.html which covers most of the functionality of mkl_sparse_sp2m and allows the user to allocate the output memory, via a multistage executon. It is supported starting with oneMKL beta-08 (and the coming MKL 2020u3).

I hope this helps. Let me know if you have further questions/comments.

Best,

Kirill

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

This issue has been resolved and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page