topic Re: Sharing sparse_matrix_t struct across threads in Intel® oneAPI Math Kernel Library

Sharing sparse_matrix_t struct across threads

nacho_libre — Thu, 16 May 2024 16:53:06 GMT

Hi, is sharing a sparse_matrix_t handle across threads safe? I would like to multiply matrices using mkl_sparse_d_mm in parallel Thanks

Re: Sharing sparse_matrix_t struct across threads

noffermans — Fri, 17 May 2024 14:48:42 GMT

Hi,

This is a good question, but the answer is no, sharing the handle across threads is not safe.
Out of curiosity, what would be the motivation for calling mkl_sparse_d_mm across multiple threads? The easiest way to exploit parallelism is to use oneMKL's built-in parallelism, which is enabled by linking to the OpenMP or TBB libs.

Best,
Nicolas

Re: Sharing sparse_matrix_t struct across threads

nacho_libre — Fri, 17 May 2024 15:45:14 GMT

Hi,

Here is a short pseudocode of my problem:

// Create single handle for big matrix in CSR format

mkl_sparse_d_create_csr(handle, c_style_indexing, nrows ncols, rows_start, rows_end, values);

// Parallel for loop

for (problem in problems) {

// do stuff and calculate dense matrix b which is different for each problem

mkl_sparse_d_mm(..., handle, ..., b, output_buffer_c)

// do other stuff with output_buffer_c

}

As a quick fix, should I instead create a new handle in each thread that references the same rows_start, rows_end, values data? I read in some header file that these variables aren't modified if mkl_sparse_order or mkl_sparse_?_set_values aren't called. I also thought of

mkl_dcsrmm but the sparse blas interface is deprecated.

Thanks

Re: Sharing sparse_matrix_t struct across threads

noffermans — Fri, 17 May 2024 19:24:10 GMT

I see. I have two more questions:
1. Is the sparse matrix the same for all problems (i.e. same rows_start and rows_end arrays), as the pseudocode seems to suggest?
2. Do you already have a working version of the code, where the loop over the problems is done sequentially, but where the calls to mkl_sparse_d_mm are executed in parallel?

Re: Sharing sparse_matrix_t struct across threads

nacho_libre — Fri, 17 May 2024 21:34:22 GMT

1. Yes, the sparse matrix components are immutable

2. It works when the calls to mkl_sparse_d_mm are in parallel and these libraries are included

mkl_intel_thread_dll
libiomp5md (so parallel operation I guess)
mkl_intel_ilp64_dll (I use the 64 bit index interface)

I want to be sure I am not misusing the library in any way that would cause me problems later on.

Thanks

Re: Sharing sparse_matrix_t struct across threads

noffermans — Wed, 22 May 2024 22:02:34 GMT

Hi again,

Apologies for the late reply. Here's our recommendation on how to use the library properly (credit to my colleague @Spencer_P_Intel for the below explanation).

When the user data is provided to the matrix handle, we promise to take care of that data and not disturb it unless it is explicitly told us through calling an API that says that can be done. When we need to, for performance, we can add extra stuff into the handle that helps with that performance. Because of the way we have implemented that extra stuff, the handle is not safe to use simultaneously on separate threads, but we do provide a threading layer to enable multiple threads to collaborate (via TBB or OpenMP) to achieve the operation requested. So the best case is if multiple input/output vectors can be combined into a set of vectors, i.e. a dense matrix system, but if that is not possible in the application, then barring the avoidance of calls to APIs that explicitly change user provided data (like mkl_sparse_order or format conversions or mkl_sparse_update_values) it is possible to put the same data arrays into multiple handles and use those separate handles simultaneously on separate threads (with each handle only being used on single thread).

So in your case, you can indeed create a new handle in each thread, as long as you do not call any routine like mkl_sparse_order or mkl_sparse_?_set_values.
Ideally though, it might be best for performance to merge all b arrays for the various problems into a single one (all of it contiguous in memory). Coming back to the pseudocode for your example, it might look something like this (assuming OpenMP threading):

// Create single handle for big matrix in CSR format mkl_sparse_d_create_csr(handle, c_style_indexing, nrows ncols, rows_start, rows_end, values); // Parallel for loop to create one big matrix B=[b0, b1,... bn] #pragma omp for for (problem in problems) { // do stuff and calculate dense matrix b which is different for each problem // build matrix B[index_to_b[problem]] = b } mkl_sparse_d_mm(..., handle, ..., B, C) // Parallel for loop to postprocess output_buffer_c in parallel #pragma omp for for (problem in problems) { // output_buffer_c = C[index_to_output_buffer_c[problem]] }

Hope this helps.

Best.
Nicolas