<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Threaded SVD using MKL/LAPACK in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Threaded-SVD-using-MKL-LAPACK/m-p/836560#M6086</link>
    <description>Hello,&lt;BR /&gt;&lt;DIV&gt;I am trying to optimize software that uses the Intel MKL to perform an SVD of a large complex-valued matrix, using calls to the zgesvd() Lapack driver routine. The &lt;A href="http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/mklxe/mkl_userguide_lnx/MKL_UG_managing_performance/Threaded_Routines.htm"&gt;MKL documentation&lt;/A&gt; states that ?gesvd routines "make effective use of parallelism", and two of the three computational routines used by zgesvd are listed as threaded (?gebrd, ?bdsqr but not ?ungbr).&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;I profiled my program using the Vtune Amplifier XE. &lt;A href="http://i.imgur.com/bswmM.png"&gt;Here&lt;/A&gt; is a screenshot of the Vtune timeline, with the 2 calls to zgesvd() marked by the user events set through the User Event API. I notice that the majority of the call to SVD routine is spent with only a single active thread. The rest of the time, all 8 threads are engaged.&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;I have several questions regarding this:&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;1) Does this behaviour (sub-optimal utilization of threads) seem normal? Is there anything I can do to improve it?&lt;BR /&gt;
2) Can I get any improvement by directly using the computational routines to compute the SVD (i.e. calling ?gebrd, ?bdsqr and ?ungbr), instead of using the driver routine?&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;Thanks in advance for your time.&lt;/DIV&gt;&lt;DIV&gt;&lt;BR /&gt;P.S. If you need more specific information about my code, please let me know.&lt;/DIV&gt;&lt;DIV&gt;&lt;BR /&gt;P.P.S. There are 7 threads that are suspended during the entire call to the SVD routine (rows 2-8 in the screenshot). These are used by another section of the algorithm and (ideally) should be combined with the threads used by SVD. However, having these extra threads suspended should not affect the computation of SVD, as far as I know.&lt;/DIV&gt;</description>
    <pubDate>Thu, 01 Sep 2011 18:24:54 GMT</pubDate>
    <dc:creator>catalogue126</dc:creator>
    <dc:date>2011-09-01T18:24:54Z</dc:date>
    <item>
      <title>Threaded SVD using MKL/LAPACK</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Threaded-SVD-using-MKL-LAPACK/m-p/836560#M6086</link>
      <description>Hello,&lt;BR /&gt;&lt;DIV&gt;I am trying to optimize software that uses the Intel MKL to perform an SVD of a large complex-valued matrix, using calls to the zgesvd() Lapack driver routine. The &lt;A href="http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/mklxe/mkl_userguide_lnx/MKL_UG_managing_performance/Threaded_Routines.htm"&gt;MKL documentation&lt;/A&gt; states that ?gesvd routines "make effective use of parallelism", and two of the three computational routines used by zgesvd are listed as threaded (?gebrd, ?bdsqr but not ?ungbr).&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;I profiled my program using the Vtune Amplifier XE. &lt;A href="http://i.imgur.com/bswmM.png"&gt;Here&lt;/A&gt; is a screenshot of the Vtune timeline, with the 2 calls to zgesvd() marked by the user events set through the User Event API. I notice that the majority of the call to SVD routine is spent with only a single active thread. The rest of the time, all 8 threads are engaged.&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;I have several questions regarding this:&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;1) Does this behaviour (sub-optimal utilization of threads) seem normal? Is there anything I can do to improve it?&lt;BR /&gt;
2) Can I get any improvement by directly using the computational routines to compute the SVD (i.e. calling ?gebrd, ?bdsqr and ?ungbr), instead of using the driver routine?&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;Thanks in advance for your time.&lt;/DIV&gt;&lt;DIV&gt;&lt;BR /&gt;P.S. If you need more specific information about my code, please let me know.&lt;/DIV&gt;&lt;DIV&gt;&lt;BR /&gt;P.P.S. There are 7 threads that are suspended during the entire call to the SVD routine (rows 2-8 in the screenshot). These are used by another section of the algorithm and (ideally) should be combined with the threads used by SVD. However, having these extra threads suspended should not affect the computation of SVD, as far as I know.&lt;/DIV&gt;</description>
      <pubDate>Thu, 01 Sep 2011 18:24:54 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Threaded-SVD-using-MKL-LAPACK/m-p/836560#M6086</guid>
      <dc:creator>catalogue126</dc:creator>
      <dc:date>2011-09-01T18:24:54Z</dc:date>
    </item>
    <item>
      <title>Threaded SVD using MKL/LAPACK</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Threaded-SVD-using-MKL-LAPACK/m-p/836561#M6087</link>
      <description>&lt;P&gt;SVD it is approximately equal BLAS Level 2 + BLAS Level 3&lt;BR /&gt;BLAS Level 2 - a single active thread.&lt;BR /&gt;BLAS Level 3 - 8 active thread.&lt;BR /&gt;Part of operation with a two-scalar matrix - a single active thread. &lt;BR /&gt;Though on new processors BLAS Level 2 partially parallelized.&lt;/P&gt;</description>
      <pubDate>Fri, 02 Sep 2011 09:32:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Threaded-SVD-using-MKL-LAPACK/m-p/836561#M6087</guid>
      <dc:creator>yuriisig</dc:creator>
      <dc:date>2011-09-02T09:32:26Z</dc:date>
    </item>
    <item>
      <title>Threaded SVD using MKL/LAPACK</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Threaded-SVD-using-MKL-LAPACK/m-p/836562#M6088</link>
      <description>&lt;DIV&gt;Thanks for your reply.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&amp;gt;&amp;gt;&amp;gt;Though on new processors BLAS Level 2 partially parallelized.&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Do you know if that includes XeonE5450?&lt;/DIV&gt;</description>
      <pubDate>Wed, 07 Sep 2011 15:30:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Threaded-SVD-using-MKL-LAPACK/m-p/836562#M6088</guid>
      <dc:creator>catalogue126</dc:creator>
      <dc:date>2011-09-07T15:30:48Z</dc:date>
    </item>
    <item>
      <title>Threaded SVD using MKL/LAPACK</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Threaded-SVD-using-MKL-LAPACK/m-p/836563#M6089</link>
      <description>&lt;DIV&gt;&amp;gt;&amp;gt;Do you know if that includes XeonE5450?&lt;BR /&gt;&lt;BR /&gt;Yes.But parallel algorithms should be programmed. About it address to developers. For example, in algorithms of diagonalization parallel algorithms for BLAS Level 2 are programmed. My interests belong to diagonalization area: &lt;BR /&gt;&lt;A href="http://software.intel.com/en-us/forums/showthread.php?t=77331&amp;amp;o=d&amp;amp;s=lr"&gt;http://software.intel.com/en-us/forums/showthread.php?t=77331&amp;amp;o=d&amp;amp;s=lr&lt;/A&gt;&lt;BR /&gt;&lt;A href="http://software.intel.com/en-us/forums/showthread.php?t=76595&amp;amp;o=d&amp;amp;s=lr"&gt;http://software.intel.com/en-us/forums/showthread.php?t=76595&amp;amp;o=d&amp;amp;s=lr&lt;/A&gt;&lt;BR /&gt;&lt;A href="http://software.intel.com/en-us/forums/showthread.php?t=73653&amp;amp;o=d&amp;amp;s=lr"&gt;http://software.intel.com/en-us/forums/showthread.php?t=73653&amp;amp;o=d&amp;amp;s=lr&lt;/A&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 07 Sep 2011 19:56:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Threaded-SVD-using-MKL-LAPACK/m-p/836563#M6089</guid>
      <dc:creator>yuriisig</dc:creator>
      <dc:date>2011-09-07T19:56:41Z</dc:date>
    </item>
    <item>
      <title>Hello,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Threaded-SVD-using-MKL-LAPACK/m-p/836564#M6090</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;

&lt;P&gt;Have you got the chance to check MKL 11.2 release? The SVD function has significant improvement:&lt;BR /&gt;
	&lt;A href="https://software.intel.com/en-us/articles/significant-performance-improvment-of-symmetric-eigensolvers-and-svd-in-intel-mkl-112"&gt;https://software.intel.com/en-us/articles/significant-performance-improvment-of-symmetric-eigensolvers-and-svd-in-intel-mkl-112&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Feel free to let us know if you have any feedback on this.&lt;/P&gt;

&lt;P&gt;Regards,&lt;BR /&gt;
	Chao&lt;/P&gt;</description>
      <pubDate>Mon, 13 Oct 2014 03:21:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Threaded-SVD-using-MKL-LAPACK/m-p/836564#M6090</guid>
      <dc:creator>Chao_Y_Intel</dc:creator>
      <dc:date>2014-10-13T03:21:42Z</dc:date>
    </item>
  </channel>
</rss>

