<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Performance of mkl_?csrmultcsr in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-mkl-csrmultcsr/m-p/871912#M8598</link>
    <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/431061"&gt;jaewonj&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt; &lt;BR /&gt;Thanks Sergey.&lt;BR /&gt;&lt;BR /&gt;Yes, I called mkl_dcsrmultcsr with trans = 'N'.&lt;BR /&gt;&lt;BR /&gt;I think it's a brilliant idea. I need mkl_dcsrmultcsr to perform sparse matrix triple product. And with the result matrix I only perform sparse-dense level 2 operation, so the output matrix  needs not be sorted.&lt;BR /&gt;&lt;BR /&gt;Jaewon&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;I thought I would mention that I also be interested in an option to provide unsorted output. &lt;BR /&gt;&lt;BR /&gt;Another curiosity on the performance front, I am calling mkl_dcsrmultcsr with both trans='N' and trans='T'. While I was expecting a bit a peformance difference, what I got back was a bit of a shock.&lt;BR /&gt;&lt;BR /&gt;Performing my own matrix transpose call and then calling mkl_dcsrmultcsr completed in 0.07s (incl both functions)&lt;BR /&gt;&lt;BR /&gt;But calling mkl_csrmultcsr with 'T" completed in 3.562s&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;If do the multiply without the transpose it takes 0.015s&lt;BR /&gt;&lt;BR /&gt;Is that the difference I should be expecting? Has anyone else seen similar results?&lt;BR /&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;Ian&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt; &lt;BR /&gt;</description>
    <pubDate>Tue, 27 Oct 2009 23:11:59 GMT</pubDate>
    <dc:creator>Ian_Fraser</dc:creator>
    <dc:date>2009-10-27T23:11:59Z</dc:date>
    <item>
      <title>Performance of mkl_?csrmultcsr</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-mkl-csrmultcsr/m-p/871908#M8594</link>
      <description>Though they say it's difficult to implement a multi-threaded sparse sparse level 3 function that scales  well for all sparsity patterns, in most cases I experience 3 ~ 4 times speedup when I run mkl_?csrmultcsr on my 8-core machine. However, I found some matices for which mkl_dcsrmultcsr shows extremely poor performance (mkl_dcsrmultcsr could be 50 times slower than a simple 3-for-loop implementation). Will this issue be addressed in MKL 10.2 update 2? &lt;BR /&gt;&lt;BR /&gt;Best regards,&lt;BR /&gt;&lt;BR /&gt;Jaewon&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Example : &lt;BR /&gt;&lt;BR /&gt;&lt;A href="http://www.cise.ufl.edu/research/sparse/matrices/Hamm/memplus.html" target="_blank"&gt;http://www.cise.ufl.edu/research/sparse/matrices/Hamm/memplus.html&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;- 17758-by-17758 w/ 126,150 nonzeros&lt;BR /&gt;- fill-in ratio : 5,121,784 / 126,150 = 40.6 &lt;BR /&gt;&lt;BR /&gt;On Intel Xeon E5410 @ 2.33GHz (8 cores) running 64-bit Vista, I need 5.42 seconds to perform sparse sparse level 3 multiplication  using mkl_dcsrmultcsr while I only need 0.12 seconds with a simple 3 for-loop implementation (not threaded).&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 14 Sep 2009 01:00:54 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-mkl-csrmultcsr/m-p/871908#M8594</guid>
      <dc:creator>jaewonj</dc:creator>
      <dc:date>2009-09-14T01:00:54Z</dc:date>
    </item>
    <item>
      <title>Re: Performance of mkl_?csrmultcsr</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-mkl-csrmultcsr/m-p/871909#M8595</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;BR /&gt;Jaewon, &lt;BR /&gt;&lt;BR /&gt;Thanks for the report. We will have further check about the performance problem. &lt;BR /&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;Chao&lt;/DIV&gt;
&lt;BR /&gt;</description>
      <pubDate>Mon, 14 Sep 2009 05:25:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-mkl-csrmultcsr/m-p/871909#M8595</guid>
      <dc:creator>Chao_Y_Intel</dc:creator>
      <dc:date>2009-09-14T05:25:51Z</dc:date>
    </item>
    <item>
      <title>Re: Performance of mkl_?csrmultcsr</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-mkl-csrmultcsr/m-p/871910#M8596</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;BR /&gt;
&lt;P&gt;Hello Jaewon,&lt;/P&gt;
&lt;P&gt;Thank you very much for your report.&lt;/P&gt;
&lt;P&gt;According timing results, I assume that the routine is called with trans='n'. Is it right?&lt;/P&gt;
&lt;P align="left"&gt;The routine is planned as a supporting routine for PARDISO and as you know PARDISO requires that the column indices in each row must be sorted in increasing order. So the routine you tested also does sorting of elements in the result matrix so that PARDISO can be called immediately after this matrix-matrix multiply without any additional computational work. Unfortunately this data preparation for PARDISO and some kind of checking of the output matrix sometimes takes significant amount of time &lt;/P&gt;
&lt;P&gt;Of course we will look at the performance issue reported by you and we try to do our best for further performance improvements of this routine. &lt;BR /&gt;&lt;BR /&gt;The other alternative for improving performance is to introduce an option to turn off sorting in the output matrix. Please let us know if you need this kind of functionality (I mean unsorted output column indices and value arrays) for this routine?&lt;/P&gt;
&lt;P&gt;All the best&lt;/P&gt;
Sergey</description>
      <pubDate>Tue, 15 Sep 2009 05:32:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-mkl-csrmultcsr/m-p/871910#M8596</guid>
      <dc:creator>Sergey_K_Intel1</dc:creator>
      <dc:date>2009-09-15T05:32:35Z</dc:date>
    </item>
    <item>
      <title>Re: Performance of mkl_?csrmultcsr</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-mkl-csrmultcsr/m-p/871911#M8597</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/93645"&gt;Sergey Kuznetsov (Intel)&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt; &lt;BR /&gt;
&lt;P&gt;Hello Jaewon,&lt;/P&gt;
&lt;P&gt;Thank you very much for your report.&lt;/P&gt;
&lt;P&gt;According timing results, I assume that the routine is called with trans='n'. Is it right?&lt;/P&gt;
&lt;P align="left"&gt;The routine is planned as a supporting routine for PARDISO and as you know PARDISO requires that the column indices in each row must be sorted in increasing order. So the routine you tested also does sorting of elements in the result matrix so that PARDISO can be called immediately after this matrix-matrix multiply without any additional computational work. Unfortunately this data preparation for PARDISO and some kind of checking of the output matrix sometimes takes significant amount of time&lt;/P&gt;
&lt;P&gt;Of course we will look at the performance issue reported by you and we try to do our best for further performance improvements of this routine. &lt;BR /&gt;&lt;BR /&gt;The other alternative for improving performance is to introduce an option to turn off sorting in the output matrix. Please let us know if you need this kind of functionality (I mean unsorted output column indices and value arrays) for this routine?&lt;/P&gt;
&lt;P&gt;All the best&lt;/P&gt;
Sergey&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Thanks Sergey.&lt;BR /&gt;&lt;BR /&gt;Yes, I called mkl_dcsrmultcsr with trans = 'N'.&lt;BR /&gt;&lt;BR /&gt;I think it's a brilliant idea. I need mkl_dcsrmultcsr to perform sparse matrix triple product. And with the result matrix I only perform sparse-dense level 2 operation, so the output matrix  needs not be sorted.&lt;BR /&gt;&lt;BR /&gt;Jaewon&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 15 Sep 2009 22:43:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-mkl-csrmultcsr/m-p/871911#M8597</guid>
      <dc:creator>jaewonj</dc:creator>
      <dc:date>2009-09-15T22:43:09Z</dc:date>
    </item>
    <item>
      <title>Re: Performance of mkl_?csrmultcsr</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-mkl-csrmultcsr/m-p/871912#M8598</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/431061"&gt;jaewonj&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt; &lt;BR /&gt;Thanks Sergey.&lt;BR /&gt;&lt;BR /&gt;Yes, I called mkl_dcsrmultcsr with trans = 'N'.&lt;BR /&gt;&lt;BR /&gt;I think it's a brilliant idea. I need mkl_dcsrmultcsr to perform sparse matrix triple product. And with the result matrix I only perform sparse-dense level 2 operation, so the output matrix  needs not be sorted.&lt;BR /&gt;&lt;BR /&gt;Jaewon&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;I thought I would mention that I also be interested in an option to provide unsorted output. &lt;BR /&gt;&lt;BR /&gt;Another curiosity on the performance front, I am calling mkl_dcsrmultcsr with both trans='N' and trans='T'. While I was expecting a bit a peformance difference, what I got back was a bit of a shock.&lt;BR /&gt;&lt;BR /&gt;Performing my own matrix transpose call and then calling mkl_dcsrmultcsr completed in 0.07s (incl both functions)&lt;BR /&gt;&lt;BR /&gt;But calling mkl_csrmultcsr with 'T" completed in 3.562s&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;If do the multiply without the transpose it takes 0.015s&lt;BR /&gt;&lt;BR /&gt;Is that the difference I should be expecting? Has anyone else seen similar results?&lt;BR /&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;Ian&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt; &lt;BR /&gt;</description>
      <pubDate>Tue, 27 Oct 2009 23:11:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-mkl-csrmultcsr/m-p/871912#M8598</guid>
      <dc:creator>Ian_Fraser</dc:creator>
      <dc:date>2009-10-27T23:11:59Z</dc:date>
    </item>
    <item>
      <title>Re: Performance of mkl_?csrmultcsr</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-mkl-csrmultcsr/m-p/871913#M8599</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/446133"&gt;Ian Fraser&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;&lt;BR /&gt;I thought I would mention that I also be interested in an option to provide unsorted output. &lt;BR /&gt;&lt;BR /&gt;Another curiosity on the performance front, I am calling mkl_dcsrmultcsr with both trans='N' and trans='T'. While I was expecting a bit a peformance difference, what I got back was a bit of a shock.&lt;BR /&gt;&lt;BR /&gt;Performing my own matrix transpose call and then calling mkl_dcsrmultcsr completed in 0.07s (incl both functions)&lt;BR /&gt;&lt;BR /&gt;But calling mkl_csrmultcsr with 'T" completed in 3.562s&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;If do the multiply without the transpose it takes 0.015s&lt;BR /&gt;&lt;BR /&gt;Is that the difference I should be expecting? Has anyone else seen similar results?&lt;BR /&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;Ian&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Dear Ian,&lt;BR /&gt;&lt;BR /&gt;As I mentioned before the routine does mandatorysorting of column indices in theoutput matrix since it was designed as a supporting routine for DSS/PARDISO.PARDISO requieres that that the column indices in each row must be sorted in increasing order.Since A^T *B and A*B are different sparse matrices, the cost of sorting is different andthedifference in timecan be explained by different cost of sorting.Computational complexity of sort varies and it might be O(nnz) where nnz is the number of non-zeros in the output matrix in good case or &lt;BR /&gt;O(nnz*nnz) in the worst case.&lt;BR /&gt;&lt;BR /&gt;The version withadditional switchesfor turning off ofsorting in output matrixis under development.&lt;BR /&gt;&lt;BR /&gt;All the best&lt;BR /&gt;Sergey</description>
      <pubDate>Thu, 29 Oct 2009 06:56:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-mkl-csrmultcsr/m-p/871913#M8599</guid>
      <dc:creator>Sergey_K_Intel1</dc:creator>
      <dc:date>2009-10-29T06:56:20Z</dc:date>
    </item>
    <item>
      <title>Re: Performance of mkl_?csrmultcsr</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-mkl-csrmultcsr/m-p/871914#M8600</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/93645"&gt;Sergey Kuznetsov (Intel)&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;BR /&gt;Dear Ian,&lt;BR /&gt;&lt;BR /&gt;As I mentioned before the routine does mandatorysorting of column indices in theoutput matrix since it was designed as a supporting routine for DSS/PARDISO.PARDISO requieres that that the column indices in each row must be sorted in increasing order.Since A^T *B and A*B are different sparse matrices, the cost of sorting is different andthedifference in timecan be explained by different cost of sorting.Computational complexity of sort varies and it might be O(nnz) where nnz is the number of non-zeros in the output matrix in good case or &lt;BR /&gt;O(nnz*nnz) in the worst case.&lt;BR /&gt;&lt;BR /&gt;The version withadditional switchesfor turning off ofsorting in output matrixis under development.&lt;BR /&gt;&lt;BR /&gt;All the best&lt;BR /&gt;Sergey&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Thanks again Sergey. That does make sense. &lt;BR /&gt;&lt;BR /&gt;Though its kind of off topic for this particular thread, I am curious if a zero-based indexing option is in the works for ?csrmultcsr. I use zero based exclusively for my matrices, except for this particular function, which is less than ideal due to the extra passes over the index arrays to change them to one-based. &lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Ian&lt;BR /&gt;</description>
      <pubDate>Thu, 29 Oct 2009 20:59:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-mkl-csrmultcsr/m-p/871914#M8600</guid>
      <dc:creator>Ian_Fraser</dc:creator>
      <dc:date>2009-10-29T20:59:06Z</dc:date>
    </item>
    <item>
      <title>Re: Performance of mkl_?csrmultcsr</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-mkl-csrmultcsr/m-p/871915#M8601</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/446133"&gt;Ian Fraser&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;BR /&gt;Thanks again Sergey. That does make sense. &lt;BR /&gt;&lt;BR /&gt;Though its kind of off topic for this particular thread, I am curious if a zero-based indexing option is in the works for ?csrmultcsr. I use zero based exclusively for my matrices, except for this particular function, which is less than ideal due to the extra passes over the index arrays to change them to one-based. &lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Ian&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Dear Ian,&lt;BR /&gt;&lt;BR /&gt;Please file afeature request through premier.intel.comfor zero-based indexing option for ?csrmultcsr. &lt;BR /&gt;&lt;BR /&gt;Thanks in advance&lt;BR /&gt;All the best&lt;BR /&gt;Sergey</description>
      <pubDate>Fri, 30 Oct 2009 07:13:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-mkl-csrmultcsr/m-p/871915#M8601</guid>
      <dc:creator>Sergey_K_Intel1</dc:creator>
      <dc:date>2009-10-30T07:13:35Z</dc:date>
    </item>
  </channel>
</rss>

