mkl csrcoo question

Mukundhan_S_ · ‎10-07-2016

Greetings,

I would like to convert a csr format to coo storage format.

https://software.intel.com/en-us/node/520849 ;

I am familiar with this and have used this. My question is, how can i convert csr to coo generic format (both upper and lower triangular matrix are stored, generic format or full matrix ). However, I am able to convert from csr to coo symmetric(upper) matrix.

Also, please provide detailed explanation for the below parameter while converting from csr to coo.

job[4]

job[4]=nzmax - maximum number of the non-zero elements allowed if job[0]=0.

Thank you.

Zhen_Z_Intel · ‎10-07-2016

Hi Mukundhan,

The nzmax also could be the length of input array acsr . You do not need to worry about the matrix is upper /lower triangular, you should make sure the correctness of three array variation of csr format which are stored in acsr, ja, ia. Lean more information, please refer to:

Sparse BLAS CSR Matrix Storage Format

Here's an example of how to convert csr to coo format which is described in ${MKLROOT}/examples/examples_core_c/spblasc/source/?converters.c

Hope this information would be helpful to you. Thank you.

Best regards,
Fiona

Mukundhan_S_ · ‎10-10-2016

Hello Fiona,

Thank you much for the quick response. However, i was asking about the upper and lower triangular matrix in the output coo format, not the csr.

I am able to use mkl_Xcsrcoo to convert from csr to coo, but the resulting coo output contains the upper triangular matrix and how do i get a generic/full sparse storage matrix in coo.?

Please advise.

mecej4 · ‎10-10-2016

I do not understand what the difficulty is. Show us an example where you input an unsymmetric matrix in CSC format and convert to COO format, with the COO data indicating a zero lower triangle when the CSC data contains entries in both lower and upper triangles.

C
      program X_CSC2COO
      IMPLICIT NONE
      integer m,n,lda,nzmax,nnz,nn,mn,k
      parameter ( m=4, n=4,lda=4, nzmax=8)
      real*8 Acsr(nzmax), Acoo(nzmax)
      integer AI(m+1), AJ(nzmax), IR(nzmax), JC(nzmax)
      data Acsr / 5.d0, 8.d0, 9.d0, 2.d0, 3.d0, 6.d0, 1.d0, 4.d0/
      data AI / 1, 3, 5, 7, 9/
      data AJ / 1, 2, 1, 2, 3, 4, 3, 4/

      integer i,j,info
      integer nr,ibase1,ibase2,locat
      integer job(8)

      print *, 'EXAMPLE PROGRAM FOR CSC to COO CONVERSION'

      nnz=ai(n+1)-1
      job(1)=0
      job(2)=1
      job(3)=1
      job(5)=nzmax
      job(6)=3

      call mkl_dcsrcoo (job,n, Acsr, AJ,AI,nnz,Acoo, ir,jc,info)
      if (info.ne.0) stop 'info /= 0 from csrcoo'

      write(*,'(2i3,F7.0)')(ir(k),jc(k),Acoo(k),k=1,nnz)

      end

Mukundhan_S_ · ‎10-11-2016

mecej4 wrote:

I do not understand what the difficulty is. Show us an example where you input an unsymmetric matrix in CSC format and convert to COO format, with the COO data indicating a zero lower triangle when the CSC data contains entries in both lower and upper triangles.

C
      program X_CSC2COO
      IMPLICIT NONE
      integer m,n,lda,nzmax,nnz,nn,mn,k
      parameter ( m=4, n=4,lda=4, nzmax=8)
      real*8 Acsr(nzmax), Acoo(nzmax)
      integer AI(m+1), AJ(nzmax), IR(nzmax), JC(nzmax)
      data Acsr / 5.d0, 8.d0, 9.d0, 2.d0, 3.d0, 6.d0, 1.d0, 4.d0/
      data AI / 1, 3, 5, 7, 9/
      data AJ / 1, 2, 1, 2, 3, 4, 3, 4/

      integer i,j,info
      integer nr,ibase1,ibase2,locat
      integer job(8)

      print *, 'EXAMPLE PROGRAM FOR CSC to COO CONVERSION'

      nnz=ai(n+1)-1
      job(1)=0
      job(2)=1
      job(3)=1
      job(5)=nzmax
      job(6)=3

      call mkl_dcsrcoo (job,n, Acsr, AJ,AI,nnz,Acoo, ir,jc,info)
      if (info.ne.0) stop 'info /= 0 from csrcoo'

      write(*,'(2i3,F7.0)')(ir(k),jc(k),Acoo(k),k=1,nnz)

      end

Well, the problem is, i have a symmetric csr matrix (upper since c format) and i would like to get it converted to generic coo format.

Thank you for your time, please advise if this is possible or any tweaks to get it easily. I have a program to convert it but it's of the O(n^2)

Zhen_Z_Intel · ‎10-11-2016

Hi Mukundhan,

I think the complexity should be O(2*n) which n means number of non-zero elements. The complexity is determined by the number of non-zero elements you have in input matrix. Seems there's no easily way to convert csr to coo. I wonder do you have any further calculation for sparse matrix after converting into coo format. Normally, the csr format could support more calculation methods than coo. What I recommend is, keep using csr or dss format to process and convert to coo format at the end.

Best regards,
Fiona

mecej4 · ‎10-11-2016

As Fiona Z. advised, what you are asking for has the undesirable effect of erasing the advantages that come with a symmetric matrix. If you wish to go ahead with this, here is an easy way of augmenting the COO arrays that MKL returned. You will have to make sure that the COO arrays are large enough to hold the lower triangle that you are going to add. Assuming that IR, JC and ACOO are dimensioned with upper bound NZMAXC, this code will do the task in O(n_nz) operations.

      ka=nnz
      do k=1,nnz
         i=ir(k); j=jc(k)
         if(i.eq.j)cycle
         ka=ka+1
         if(ka.gt.nzmaxc)STOP 'COO array sizes insufficient'
         ir(ka)=j; jc(ka)=i; Acoo(ka)=Acoo(k)   ! reflection in diagonal
      end do
      nnz=ka                                    ! update count of non-zeros

Mukundhan_S_ · ‎10-11-2016

Fiona Z. (Intel) wrote:

Hi Mukundhan,

I think the complexity should be O(2*n) which n means number of non-zero elements. The complexity is determined by the number of non-zero elements you have in input matrix. Seems there's no easily way to convert csr to coo. I wonder do you have any further calculation for sparse matrix after converting into coo format. Normally, the csr format could support more calculation methods than coo. What I recommend is, keep using csr or dss format to process and convert to coo format at the end.

Best regards,
Fiona

Hi Fiona,

yes, this is an input for a newly developed distributed memory sparse direct solver where it stores/requires full coo matrix for factorization. In my conversion program, the order is of n^2 (n- no. of non-zero's), so i wanted to find any other optimal methods or use mkl, as the non-zeros elements can easily reach a billion in our simulations.

Thank you.

Mukundhan_S_ · ‎10-11-2016

mecej4 wrote:

As Fiona Z. advised, what you are asking for has the undesirable effect of erasing the advantages that come with a symmetric matrix. If you wish to go ahead with this, here is an easy way of augmenting the COO arrays that MKL returned. You will have to make sure that the COO arrays are large enough to hold the lower triangle that you are going to add. Assuming that IR, JC and ACOO are dimensioned with upper bound NZMAXC, this code will do the task in O(n_nz) operations.
      ka=nnz
      do k=1,nnz
         i=ir(k); j=jc(k)
         if(i.eq.j)cycle
         ka=ka+1
         if(ka.gt.nzmaxc)STOP 'COO array sizes insufficient'
         ir(ka)=j; jc(ka)=i; Acoo(ka)=Acoo(k)   ! reflection in diagonal
      end do
      nnz=ka                                    ! update count of non-zeros

Hi Mecej,

Thank you.!. It does augment, but it will not appear in a sorted order and that is the reason my program is of O(n^2, n =nonz) where i have put a nested for/do loop to sort the elements. A full matrix/generic storage format matrix will have appear as below

0,0
0,1
0,2
0,5
1,0 (lower)
1,1
1,5
2,0 (lower)
2,2 
and so on.

Let me know if you have any other optimal suggestion please.

mecej4 · ‎10-11-2016

You are adding requirements piece by piece. This time, I want to know what your final objectives are, and why you require the COO data to be sorted in a particular way, before offering more suggestions.

Sorting is usually a N lg N operation, not N². Again, without knowing typical values of N and the reasons for sorting, I do not want to suggest changes at this point.

Mukundhan_S_ · ‎10-12-2016

mecej4 wrote:

You are adding requirements piece by piece. This time, I want to know what your final objectives are, and why you require the COO data to be sorted in a particular way, before offering more suggestions.

Sorting is usually a N lg N operation, not N². Again, without knowing typical values of N and the reasons for sorting, I do not want to suggest changes at this point.

I am sorry to disagree when you meant i am adding piece by piece. General storage format / non-symmetric form itself means the desirability of the matrix is essentially sorted. If not mistaken, here

Due to the input requirement of a third party solver which is a distributed memory (MPI) it involves overlaps/interfaces of boundaries of data between processes and updates them periodically and so on. Hence, the generalized input format is needed at this time, so i am looking a way to optimally make this conversion. Allocating the non-zero's for generalized matrix is always of the form (2*nonz-n) elements.

I have also used a python package namely pysparse to get the generalized form, which is fast enough. Earlier, at that time i was using a python subprocess to launch this from a script, but now i cannot utilize these python packages in our commercial codes, hence i wrote my own naive conversion with an O(nonz^2) . It works but absolutely not efficient.
Thank you for your time again.

mecej4 · ‎10-12-2016

It is not usual to impose an order requirement in the COO representation. In fact, not having such a requirement is one of the main reasons to use this less efficient representation instead of CSR, CSC, etc. The page that you referred to in #11 has the following statement prominently displayed:

values: A real or complex array that contains the non-zero elements of A in any order.

You may see if you can adapt one of the converter routines provided in Sparsekit at http://www-users.cs.umn.edu/~saad/software/SPARSKIT/index.html or http://people.sc.fsu.edu/~jburkardt/f_src/sparsekit/sparsekit.html.

If you have billions of nonzero entries in your matrices, using an O(n_nz²) reordering algorithm is not wise. You need to use a stable sorting algorithm, such as a stable version of quicksort, mergesort or heapsort, all of which have average complexity n_nz lg n_nz.

If you are going to ignore the symmetry, perhaps you can do so even when assembling the CSR representation. Furthermore, why use the CSR representation at all? Why not directly put the matrix elements into a COO representation as they are generated?

You may also ask yourself if, considering the high price to be paid for sorting, you really must have the sorting done. Perhaps there is some aspect of the distributed solver that gives you some control/choice over this. Is COO the only representation accepted by the distributed solver? Sounds dubious.

Mukundhan_S_ · ‎10-13-2016

Hi Mecej4,

Thank you much, appreciated for your suggestions and thoughts. I am unable to use any of the sparsekit routines for my case as i want to convert a symmetric csr to non-symmetric coordinate format. However, i have thought further last eve. and working on another algorithm to do this conversion efficiently (O(nnz)), will post back if successful.

Thank you.

Mukundhan_S_ · ‎10-25-2016

Hi all,

Below is a snippet of the algorithm (to give an idea), tested with million rows/columns square matrix with more than billion of non-zero's, very memory efficient.

It performs a conversion of symmetric csr to non-symmetric coordinate format, also i am partitioning the data for different mpi processes (which you can ignore by modifying the code, as you understand the algorithm) and fortran to c index format.

if (mpirank != 0){
  // allocate two-dimensional vectors on heap through vectors
  std::vector < vector <Int> > sortrow_(n1), sortcol_(n1);
  std::vector < vector <double> > sortval_(n1);
  //loop over nnz (read in from your input file)
  for(Int i=0; i<nnz; i++){
    if ( col>= start+1 and col<=stop+1 and val!=0 ){ //start and stop are an mpi variable 
      //thing for partitioning data in my case
      if ( col != row){ //when indexes are unequal, it stores in a vectors.
        sri = col-1; 
        jxx = row-1 ; //Adding '-1' to the indices to match C format.
	ixx = col-1 ;
	nzvalue = val;
	sortrow_[sri].push_back(ixx);
	sortcol_[sri].push_back(jxx);
     	sortval_[sri].push_back(nzvalue);
      }
      if ( col == row){ //when indexes are equal(diagonal)
        sci = col-1;
	for (Int jj=0 ; jj<sortrow_[sci].size(); ++jj){
          ixx1 = sortrow_[sci][jj];
	  jxx1 = sortcol_[sci][jj];    //pushes the data of the lower triangular matrix
	  nzvalue1 = sortval_[sci][jj];
	  newsources_[mpirank].push_back(ixx1);
	  newtargets_[mpirank].push_back(jxx1);
	  newnzvals_[mpirank].push_back(nzvalue1);
	}
        /* To be more memory efficient, sortrow_[sci], sortcol_[sci],sortval_[sci] can be emptied using erase()
           Will not matter much because you are declaring sortrow_,sortcol_,sortval_ inside here, 
           once you go out of scope, everything is destroyed, so still mem. efficient when you have billions of nnz */
	jxx = row-1 ; //Adding '-1' to the indices to match C format.
	ixx = col-1 ;
	nzvalue = val;
	newsources_[mpirank].push_back(ixx);
	newtargets_[mpirank].push_back(jxx);
     	newnzvals_[mpirank].push_back(nzvalue);  //Here i am constructing a distributed sparse matrix which is efficiently passed into the solver.
        //shall be input as needed. 
      }
      if ( col!= row and row>=start+1){
        ixx = row-1;
	jxx = col-1;
	nzvalue = val;
	newsources_[mpirank].push_back(ixx);
	newtargets_[mpirank].push_back(jxx);
     	newnzvals_[mpirank].push_back(nzvalue); 
      }
    }
  }
}