I have used mkl_dcsrmultcsr in my research. However, it is performing double pass to compute sparse*sparse matrix product. For small size problems, this is not a problem, however for large size problems (e.g. matrices of size ½ billion by ½ billion) this is time consuming and it would be better if MKL can do this multiplication in a single pass in a parallel setup.
Most (if not all) of the sparse*sparse CSR matrix multiplications algorithm use Gustavson Algorithm (ACM 1978) and there is no reason why this algorithm cannot be parallelized and do calculations in a single pass. I understand that the performance of a single pass parallelization would depend on pre-allocating the space non-zero values, which I think can be reasonably given in most situations and even if this does not work the algorithm should be able to adjust the buffer size (if required).
Similarly, it would be useful to only compute lower/upper triangular portion of the output matrix (of course the output matrix have to be symmetric).
Application domain: Statistics, PDE’s, Inverse Problems, Weather Prediction.
After spending considerable amount of time last week I was able to come up with my own single pass parallel solution which is ~ 30 to 35% faster than mkl_dcsrmultcsr. It can also work in a hybrid setup (MPI+Openmp) and gives the output matrix in a sorted form. Any help (if possible ; no compulsions) in optimizing the attached code would be greatly appreciated as for large matrices it can save days worth of work.
30-35% of speedup - thanks for sharing this. Are these any specific input matrixes? what is the typical size, nnz and type of this matrices? what is the CPU you are running this code?
I have tested the code with matrices 1070*1000000 (nzmax = 85534075) multiplied by a matrix 1000000*1070 (nzmax = 85534075) on (I do not know generation info for them):
You can test the code for your own matrices. I have to refactor the code .
thanks Vineet. Your suggestions looks reasonable to implement. I will bring this request to the MKL release board for consideration and may be for the implementation into one of the future versions.