<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Sharing sparse_matrix_t struct across threads in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sharing-sparse-matrix-t-struct-across-threads/m-p/1598665#M36136</link>
    <description>&lt;P&gt;I see. I have two more questions:&lt;BR /&gt;1. Is the sparse matrix the same for all problems (i.e. same &lt;EM&gt;rows_start&lt;/EM&gt; and &lt;EM&gt;rows_end&lt;/EM&gt; arrays), as the pseudocode seems to suggest?&lt;BR /&gt;2. Do you already have a working version of the code, where the loop over the problems is done sequentially, but where the calls to&amp;nbsp; &lt;EM&gt;mkl_sparse_d_mm&amp;nbsp;&lt;/EM&gt;are executed in parallel?&lt;/P&gt;</description>
    <pubDate>Fri, 17 May 2024 19:24:10 GMT</pubDate>
    <dc:creator>noffermans</dc:creator>
    <dc:date>2024-05-17T19:24:10Z</dc:date>
    <item>
      <title>Sharing sparse_matrix_t struct across threads</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sharing-sparse-matrix-t-struct-across-threads/m-p/1598318#M36130</link>
      <description>Hi, is sharing a sparse_matrix_t handle across threads safe? I would like to multiply matrices using mkl_sparse_d_mm in parallel Thanks</description>
      <pubDate>Thu, 16 May 2024 16:53:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sharing-sparse-matrix-t-struct-across-threads/m-p/1598318#M36130</guid>
      <dc:creator>nacho_libre</dc:creator>
      <dc:date>2024-05-16T16:53:06Z</dc:date>
    </item>
    <item>
      <title>Re: Sharing sparse_matrix_t struct across threads</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sharing-sparse-matrix-t-struct-across-threads/m-p/1598612#M36134</link>
      <description>&lt;P&gt;Hi,&lt;BR /&gt;&lt;BR /&gt;This is a good question, but the answer is no, sharing the handle across threads is not safe.&lt;BR /&gt;Out of curiosity, what would be the motivation for calling&amp;nbsp;&lt;SPAN&gt;&lt;EM&gt;mkl_sparse_d_mm&lt;/EM&gt; across multiple threads? The easiest way to exploit parallelism is to use oneMKL's built-in parallelism, which is enabled by linking to the OpenMP or TBB libs.&lt;BR /&gt;&lt;BR /&gt;Best,&lt;BR /&gt;Nicolas&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 17 May 2024 14:48:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sharing-sparse-matrix-t-struct-across-threads/m-p/1598612#M36134</guid>
      <dc:creator>noffermans</dc:creator>
      <dc:date>2024-05-17T14:48:42Z</dc:date>
    </item>
    <item>
      <title>Re: Sharing sparse_matrix_t struct across threads</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sharing-sparse-matrix-t-struct-across-threads/m-p/1598628#M36135</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here is a short pseudocode of my problem:&lt;/P&gt;&lt;P&gt;// Create single handle for big matrix in CSR format&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;mkl_sparse_d_create_csr(handle,&amp;nbsp; c_style_indexing, nrows ncols, rows_start, rows_end, values);&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;// Parallel for loop&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;for (problem in problems) {&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; // do stuff and calculate dense matrix b which is different for each problem&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; mkl_sparse_d_mm(..., handle, ..., b, output_buffer_c)&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;// do other stuff with&amp;nbsp;output_buffer_c&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;As a quick fix, should I instead create a new handle in each thread that references the same rows_start, rows_end, values data? I read in some header file that these variables aren't modified if&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;SPAN&gt;mkl_sparse_order or mkl_sparse_?_set_values aren't called. I also thought of&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;mkl_dcsrmm but the sparse blas interface is deprecated.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Thanks&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 17 May 2024 15:45:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sharing-sparse-matrix-t-struct-across-threads/m-p/1598628#M36135</guid>
      <dc:creator>nacho_libre</dc:creator>
      <dc:date>2024-05-17T15:45:14Z</dc:date>
    </item>
    <item>
      <title>Re: Sharing sparse_matrix_t struct across threads</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sharing-sparse-matrix-t-struct-across-threads/m-p/1598665#M36136</link>
      <description>&lt;P&gt;I see. I have two more questions:&lt;BR /&gt;1. Is the sparse matrix the same for all problems (i.e. same &lt;EM&gt;rows_start&lt;/EM&gt; and &lt;EM&gt;rows_end&lt;/EM&gt; arrays), as the pseudocode seems to suggest?&lt;BR /&gt;2. Do you already have a working version of the code, where the loop over the problems is done sequentially, but where the calls to&amp;nbsp; &lt;EM&gt;mkl_sparse_d_mm&amp;nbsp;&lt;/EM&gt;are executed in parallel?&lt;/P&gt;</description>
      <pubDate>Fri, 17 May 2024 19:24:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sharing-sparse-matrix-t-struct-across-threads/m-p/1598665#M36136</guid>
      <dc:creator>noffermans</dc:creator>
      <dc:date>2024-05-17T19:24:10Z</dc:date>
    </item>
    <item>
      <title>Re: Sharing sparse_matrix_t struct across threads</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sharing-sparse-matrix-t-struct-across-threads/m-p/1598678#M36137</link>
      <description>&lt;P&gt;1. Yes, the sparse matrix components are immutable&lt;/P&gt;&lt;P&gt;2. It works when the calls to&amp;nbsp;&lt;EM&gt;mkl_sparse_d_mm&amp;nbsp;&lt;/EM&gt;are in parallel and these libraries are included&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;mkl_intel_thread_dll&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;libiomp5md (so parallel operation I guess)&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;mkl_intel_ilp64_dll (I use the 64 bit index interface)&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN&gt;I want to be sure I am not misusing the library in any way that would cause me problems later on.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thanks&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 17 May 2024 21:34:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sharing-sparse-matrix-t-struct-across-threads/m-p/1598678#M36137</guid>
      <dc:creator>nacho_libre</dc:creator>
      <dc:date>2024-05-17T21:34:22Z</dc:date>
    </item>
    <item>
      <title>Re: Sharing sparse_matrix_t struct across threads</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sharing-sparse-matrix-t-struct-across-threads/m-p/1599924#M36147</link>
      <description>&lt;P&gt;Hi again,&lt;/P&gt;&lt;P&gt;Apologies for the late reply. Here's our recommendation on how to use the library properly (credit to my colleague&amp;nbsp;&lt;a href="https://community.intel.com/t5/user/viewprofilepage/user-id/96895"&gt;@Spencer_P_Intel&lt;/a&gt;&amp;nbsp;for the below explanation).&lt;BR /&gt;&lt;BR /&gt;When the user data is provided to the matrix handle, we promise to take care of that data and not disturb it unless it is explicitly told us through calling an API that says that can be done. When we need to, for performance, we can add extra stuff into the handle that helps with that performance. Because of the way we have implemented that extra stuff, the handle is not safe to use simultaneously on separate threads, but we do provide a threading layer to enable multiple threads to collaborate (via TBB or OpenMP) to achieve the operation requested. So the best case is if multiple input/output vectors can be combined into a set of vectors, i.e. a dense matrix system, but if that is not possible in the application, then barring the avoidance of calls to APIs that explicitly change user provided data (like mkl_sparse_order or format conversions or mkl_sparse_update_values) it is possible to put the same data arrays into multiple handles and use those separate handles simultaneously on separate threads (with each handle only being used on single thread).&lt;/P&gt;&lt;P&gt;So in your case, you can indeed&amp;nbsp;&lt;SPAN&gt;create a new handle in each thread, as long as you do not call any routine like&amp;nbsp;mkl_sparse_order or mkl_sparse_?_set_values.&amp;nbsp;&lt;BR /&gt;Ideally though, it might be best for performance to merge all b arrays for the various problems into a single one (all of it contiguous in memory). Coming back to the pseudocode for your example, it might look something like this (assuming OpenMP threading):&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="cpp"&gt;// Create single handle for big matrix in CSR format
mkl_sparse_d_create_csr(handle,  c_style_indexing, nrows ncols, rows_start, rows_end, values);
 
// Parallel for loop to create one big matrix B=[b0, b1,... bn]
#pragma omp for
for (problem in problems) {
    // do stuff and calculate dense matrix b which is different for each problem
    // build matrix B[index_to_b[problem]] = b
}

mkl_sparse_d_mm(..., handle, ..., B, C)
 
// Parallel for loop to postprocess output_buffer_c in parallel
#pragma omp for
for (problem in problems) {
    // output_buffer_c = C[index_to_output_buffer_c[problem]]
}&lt;/LI-CODE&gt;&lt;P&gt;&lt;SPAN&gt;Hope this helps.&lt;BR /&gt;&lt;BR /&gt;Best.&lt;BR /&gt;Nicolas&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 22 May 2024 22:02:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Sharing-sparse-matrix-t-struct-across-threads/m-p/1599924#M36147</guid>
      <dc:creator>noffermans</dc:creator>
      <dc:date>2024-05-22T22:02:34Z</dc:date>
    </item>
  </channel>
</rss>

