Are COO and CSC SpMV's implementations parallel?

nvenkov1 · ‎08-07-2023

If you attempt to develop your own code for SpMV using different sparse storage formats, you should see that SpMV is straightforward to parallelize with OpenMP for CSR and BSR. For COO and CSC, however, racing conditions make the parallelization difficult. After several attempts which included the use of (a) atomic operations, (b) OpenMP's built-in vector reduction and (c) my own implementation of vector reduction with OpenMP, I find it impossible to speedup SpMV by multithreading when using the COO and CSC formats.

I compared my implementations with state-of-the-art calls to mkl_sparse_d_mv from MKL. For multiple matrices tested, it seems that there is no speedup whatsoever of SpMV when using COO with multithreading. For the SpMV using the CSC format, there seems however to be a light speedup, but nothing comparable to the speedup obtained when using CSR.

My questions are the following:

1. Is SpMV (i.e., mkl_sparse_d_mv with operation=SPARSE_OPERATION_NON_TRANSPOSE) not parallel when using the COO format?

2. Is SpMV (i.e., mkl_sparse_d_mv with operation=SPARSE_OPERATION_NON_TRANSPOSE) not parallel when using the CSC format?

My impression is that the answer to 1. is yes, it is not parallelized, and the answer to 2. is no, it is parallelized, but the scaling obtained is not nearly as good as when using CSR. I would like to get definite answers to these questions. If the answer to 2. is no, is it possible to know how SpMV is parallelized in this case? Is it with OpenMP, or maybe TBB?

ShanmukhS_Intel · ‎08-09-2023

Hi Nicolas,

Thanks for posting in Intel Communities.

We would like to request you to share with us a MKL version, OS details, sample reproducer and steps to reproduce(if any. It helps us recreate the issue at our end and help you accordingly.

Besides this, could you please share with us the performance characteristics observed by you?

Best Regards,

shanmukh.SS

nvenkov1 · ‎08-15-2023

Hello,

I am using intel-oneAPI 2022 on Linux.

For example, using the ecology1 matrix from the SuiteSparse Matrix Collection, the average runtime of mkl_sparse_d_mv over 100 calls is given as follows:

in CSR format: 8.1 ms for 1 thread, 4.2 ms for 2 threads, 1.8 ms for 4 threads, 1.2 ms for 8 threads, 0.6 ms for 16 threads.

in COO format: 8.1 ms for 1 thread, 8.1 ms for 2 threads, 9.1 ms for 4 threads, 8.1 ms for 8 threads, 9.3 ms for 16 threads.

in CSC format: 9.3 ms for 1 thread, 6.2 ms for 2 threads, 4.1 ms for 4 threads, 13.6 ms for 8 threads, 27.0 ms for 16 threads.

I compiled my code as follows: icc -o main -DMKL_ILP64 -I${MKLROOT}/include -fopenmp main.c -L${MKLROOT}/lib/intel64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl

where the icc version is 2021.6.

ShanmukhS_Intel · ‎08-21-2023

Hi Nicolas,

Thanks for sharing the details. Could you please share with us the sample reproducer as well? It helps us recreate the issue at our environment and help you accordingly. Based on this, we will share the performance details with the concerned team and will raise a request to provide support for OpenMP threading for CSC and COO data format.

Best Regards,

Shanmukh.SS

nvenkov1 · ‎08-21-2023

Hello,

My question is about the general behavior of mkl_sparse_d_mv. I'm not searching for support to solve a problem with a specific matrix. So I'm not sure why you need a sample code. But here it is, I attach the file mkl_test.c. You'll have to go download the .mtx file for the ecology1 matrix on the website of the SuiteSparse Matrix Collection.

The code is compiled as follows using icc:

icc -o main -DMKL_ILP64 -I${MKLROOT}/include -fopenmp mkl_test.c -L${MKLROOT}/lib/intel64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl

When you run the compiled program, you'll be asked to prompt the desired number of threads.

Sincerely,

Nicolas.

ShanmukhS_Intel · ‎08-29-2023

Hi Nicolas,

Thanks for sharing the details. Yes, regarding the initial queries, COO and CSC both are not threaded.

Thanks for helping us improve our products! We’ve submitted the feature request to the dev team, they will consider it based on multiple factors including, but not limited to priority and criticality of the feature. Once it is included in an upcoming release, it would be documented in the release notes.

Best Regards,

Shanmukh.SS

nvenkov1 · ‎08-29-2023

Thank you! That's all I wanted to know :-).