<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Parallel algorithm used by mkl_?csrmultcsr in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallel-algorithm-used-by-mkl-csrmultcsr/m-p/956915#M15554</link>
    <description>&lt;P&gt;I am wondering which algorithm and parallelization strategy is utilized in &lt;A href="http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/GUID-0358EDA6-0D62-458B-9841-207D25202AAD.htm"&gt;mkl_?csrmultcsr&lt;/A&gt;? Does this function scale well on Intel Xeon Phi architecture?&lt;/P&gt;</description>
    <pubDate>Sun, 23 Mar 2014 05:07:52 GMT</pubDate>
    <dc:creator>kadir</dc:creator>
    <dc:date>2014-03-23T05:07:52Z</dc:date>
    <item>
      <title>Parallel algorithm used by mkl_?csrmultcsr</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallel-algorithm-used-by-mkl-csrmultcsr/m-p/956915#M15554</link>
      <description>&lt;P&gt;I am wondering which algorithm and parallelization strategy is utilized in &lt;A href="http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/GUID-0358EDA6-0D62-458B-9841-207D25202AAD.htm"&gt;mkl_?csrmultcsr&lt;/A&gt;? Does this function scale well on Intel Xeon Phi architecture?&lt;/P&gt;</description>
      <pubDate>Sun, 23 Mar 2014 05:07:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallel-algorithm-used-by-mkl-csrmultcsr/m-p/956915#M15554</guid>
      <dc:creator>kadir</dc:creator>
      <dc:date>2014-03-23T05:07:52Z</dc:date>
    </item>
    <item>
      <title> </title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallel-algorithm-used-by-mkl-csrmultcsr/m-p/956916#M15555</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="color: rgb(31, 73, 125); font-family: &amp;quot;Calibri&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: 11pt; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA;"&gt;Csrmultcsr utilizes simple parallelization strategy (row-wise for the first matrix).&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="color: rgb(31, 73, 125); font-family: &amp;quot;Calibri&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: 11pt; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA;"&gt;Scaling is heavily depend on the matrix structures and sizes, and normally, the scalability is bound by the memory subsystem bandwidth.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="color: rgb(31, 73, 125); font-family: &amp;quot;Calibri&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: 11pt; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA;"&gt;We&amp;nbsp;would expect to see better performance on Xeon Phi vs. Xeon, but scalability would normally be quite modest as memory bandwidth is exhausted pretty quickly.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="color: rgb(31, 73, 125); font-family: &amp;quot;Calibri&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: 11pt; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA;"&gt;--Vipin&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 24 Mar 2014 09:16:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallel-algorithm-used-by-mkl-csrmultcsr/m-p/956916#M15555</guid>
      <dc:creator>VipinKumar_E_Intel</dc:creator>
      <dc:date>2014-03-24T09:16:02Z</dc:date>
    </item>
    <item>
      <title>I suppose that a chunk of</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallel-algorithm-used-by-mkl-csrmultcsr/m-p/956917#M15556</link>
      <description>&lt;P&gt;I suppose that a &lt;STRONG&gt;chunk&lt;/STRONG&gt; of rows of the first matrix is assigned to a thread. Is it possible to set chunk size and OpenMP's scheduling policy?&amp;nbsp; Are there any other user-supplied parameters to reduce run time of &lt;A href="http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/GUID-0358EDA6-0D62-458B-9841-207D25202AAD.htm"&gt;mkl_?csrmultcsr&lt;/A&gt; on MIC? (Parameters other than OMP_NUM_THREADS and KMP_AFFINITY)&lt;/P&gt;</description>
      <pubDate>Sun, 13 Apr 2014 05:18:50 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallel-algorithm-used-by-mkl-csrmultcsr/m-p/956917#M15556</guid>
      <dc:creator>kadir</dc:creator>
      <dc:date>2014-04-13T05:18:50Z</dc:date>
    </item>
    <item>
      <title> </title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallel-algorithm-used-by-mkl-csrmultcsr/m-p/956918#M15557</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;mkl_?csrmultcsr has the following parallelization strategy for non-transposed multiplication of two CSR matrices:first matrix is divided on chunks with more or less equal number of rows, and every chunk is assigned to a thread. Since the&amp;nbsp;number of chunks is equal to number of threads,&amp;nbsp;the chunk size can’t be set by the user outside MKL.&lt;/P&gt;

&lt;P&gt;Could you please provide us with the use case of this function? Matrix sizes, sparsity pattern, input parameters, etc?&lt;/P&gt;

&lt;P&gt;We can take a look at the testcase and then suggest possible steps in further tuning.&lt;/P&gt;

&lt;P&gt;--Vipin&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 14 Apr 2014 07:31:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallel-algorithm-used-by-mkl-csrmultcsr/m-p/956918#M15557</guid>
      <dc:creator>VipinKumar_E_Intel</dc:creator>
      <dc:date>2014-04-14T07:31:31Z</dc:date>
    </item>
  </channel>
</rss>

