Solved: Sparse BLAS CSR Matrix Storage Format

kup__benny · ‎07-09-2018

After looking here, I can't understand the compressed row format.

Specifically, I don't understand pointerE (and OK with the rest): it should be the index in the valuesarray that is the last non zero element in each row, if I look on the example they give for zero based indexing, the last non zero element of the first row is -3 and it is the third element in the values array but in zero indexing it should be element #2. The last non zero element of the second row is 5 which is the fifth element in values but in zero indexing it should be #4 so pointerE should be [2,4,7,10,12] but in the example it shows [3,5,8,11,13]

what am I missing here?

Once again I am using zero based indexing so where does this bias come from?

Spencer_P_Intel · ‎07-09-2018

The 4 array compressed sparse row (CSR) format uses the following definitions (0 based indexing):

pointerB = first element of row i in columns and values arrays
pointerE = one past the last element of row i in columns and values arrays
columns = column index value
values = value of matrix

If you are using the most simple case of CSR (which is the 3 array CSR format) with contiguous memory storage of all elements then you can think of pointerE as pointerB+1 (That is, pointerE = pointerB[i+1] ) using pointer arithmetic.

This is typically sufficient for most cases, but there are times when you want to do more complicated operations like pass in a sub-matrix or do a permutation of a matrix rows, in which case using the two row pointers (beginning (B) and one past ending (E) ) makes this possible. Notice that the Intel Math Kernel Library does support the sub-matrix usage, but currently it does guarantee correctness if the pointer arrays are used to pass in a row permutation of the matrix.

For me it helps to think of the pointerB/E arrays as defining half open half closed intervals for the rows:

CSR 3 array:

row i is found on [ pointerB, pointerB[i+1] ) in columns and values arrays

CSR 4 array: (more general)

row i is found on [ pointerB, pointerE ) in columns and values arrays

As you note, it does get a little more complicated with one based indexing. If we use the variable ind=0 or 1 to denote the indexing, then the general form is as follows:

pointerB-ind = first element of row i in columns and values arrays
pointerE-ind = one past the last element of row i in columns and values arrays
columns = ind-based column index value (columns-ind is 0 based column index value)
values = value of matrix

so that

CSR 3 array:

row i is found on [ pointerB-ind, pointerB[i+1]-ind ) in columns and values arrays (with columns stored in base ind fomat)

CSR 4 array: (more general)

row i is found on [ pointerB-ind, pointerE-ind ) in columns and values arrays (with columns stored in base ind fomat)

View solution in original post

Spencer_P_Intel · ‎07-09-2018

The 4 array compressed sparse row (CSR) format uses the following definitions (0 based indexing):

pointerB = first element of row i in columns and values arrays
pointerE = one past the last element of row i in columns and values arrays
columns = column index value
values = value of matrix

If you are using the most simple case of CSR (which is the 3 array CSR format) with contiguous memory storage of all elements then you can think of pointerE as pointerB+1 (That is, pointerE = pointerB[i+1] ) using pointer arithmetic.

This is typically sufficient for most cases, but there are times when you want to do more complicated operations like pass in a sub-matrix or do a permutation of a matrix rows, in which case using the two row pointers (beginning (B) and one past ending (E) ) makes this possible. Notice that the Intel Math Kernel Library does support the sub-matrix usage, but currently it does guarantee correctness if the pointer arrays are used to pass in a row permutation of the matrix.

For me it helps to think of the pointerB/E arrays as defining half open half closed intervals for the rows:

CSR 3 array:

row i is found on [ pointerB, pointerB[i+1] ) in columns and values arrays

CSR 4 array: (more general)

row i is found on [ pointerB, pointerE ) in columns and values arrays

As you note, it does get a little more complicated with one based indexing. If we use the variable ind=0 or 1 to denote the indexing, then the general form is as follows:

pointerB-ind = first element of row i in columns and values arrays
pointerE-ind = one past the last element of row i in columns and values arrays
columns = ind-based column index value (columns-ind is 0 based column index value)
values = value of matrix

so that

CSR 3 array:

row i is found on [ pointerB-ind, pointerB[i+1]-ind ) in columns and values arrays (with columns stored in base ind fomat)

CSR 4 array: (more general)

row i is found on [ pointerB-ind, pointerE-ind ) in columns and values arrays (with columns stored in base ind fomat)

kup__benny · ‎07-10-2018

Thank you Spencer,

Now everything is clear