<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Laura, in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cluster-sparse-solver-Schur-complement-how-to-distribute-the/m-p/1174832#M28850</link>
    <description>&lt;P&gt;Laura,&lt;/P&gt;

&lt;P&gt;I thought about your issue a little more and I wanted to point out another solution that again probably&amp;nbsp;doesn't really work for your case but is worth mentioning...&amp;nbsp; :)&amp;nbsp; In mkl 2019 beta (now generally available), we have implemented a way to output the Schur complement&amp;nbsp;in sparse (csr) format instead of full dense format for the SMP version of sparse solvers (Intel MKL Pardiso) but not yet the cluster (MPI) version (Cluster Sparse Solvers).&amp;nbsp; This would help if the schur complement&amp;nbsp;is relatively sparse; however this is not always guaranteed to be the case.&amp;nbsp; The documentation can be found in &lt;A href="https://software.intel.com/en-us/mkl-developer-reference-c-2019-beta-pardiso-export"&gt;https://software.intel.com/en-us/mkl-developer-reference-c-2019-beta-pardiso-export.&lt;/A&gt;&amp;nbsp;See the usage example at the bottom of the link.&amp;nbsp; The key differences are the following:&lt;/P&gt;

&lt;P&gt;Use iparm[36-1] = -1 or -2 for sparse&amp;nbsp;instead of 1 or 2 for&amp;nbsp; dense, depending on whether you just want to solve for schur or also want the full LU to be available.&lt;/P&gt;

&lt;P&gt;If sparse output format&amp;nbsp;(&amp;lt; 0) is chosen, then the nnz in the schur complement is filled into iparm[36-1] after the reordering step.&amp;nbsp; You can then use this to create the proper sized csr schur complement arrays.&amp;nbsp; You pass these arrays to pardiso using the pardiso_export() function, then after calling the factorization stage, the schur complement arrays are filled.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Like I said, this probably doesn't help you since you are using the fully distributed cluster version and so your problem is likely much larger than can be handled on a single node.&amp;nbsp; But if your problem could fit onto a single node (for instance with 56 cores, or 44 cores, etc) then this is a much&amp;nbsp;better solution.&lt;/P&gt;

&lt;P&gt;Best,&lt;/P&gt;

&lt;P&gt;Spencer&lt;/P&gt;</description>
    <pubDate>Mon, 07 May 2018 22:31:26 GMT</pubDate>
    <dc:creator>Spencer_P_Intel</dc:creator>
    <dc:date>2018-05-07T22:31:26Z</dc:date>
    <item>
      <title>cluster_sparse_solver Schur complement - how to distribute the Schur matrix?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cluster-sparse-solver-Schur-complement-how-to-distribute-the/m-p/1174830#M28848</link>
      <description>&lt;P&gt;We need to find the Schur complement matrix of a sparse matrix A of form&lt;/P&gt;

&lt;P&gt;A&lt;SUB&gt;11&lt;/SUB&gt;&amp;nbsp; &amp;nbsp; A&lt;SUB&gt;12&lt;/SUB&gt;&lt;/P&gt;

&lt;P&gt;A&lt;SUB&gt;21&lt;/SUB&gt;&amp;nbsp; &amp;nbsp; A&lt;SUB&gt;22&lt;/SUB&gt;&lt;/P&gt;

&lt;P&gt;I.e., we want&amp;nbsp; the Schur block defined by S = A&lt;SUB&gt;22&lt;/SUB&gt; - A&lt;SUB&gt;21&lt;/SUB&gt; A&lt;SUB&gt;11&lt;/SUB&gt;&lt;SUP&gt;-1&lt;/SUP&gt;&amp;nbsp;A&lt;SUB&gt;12&lt;/SUB&gt;, which can be done by the new sparse solver update. In cluster_sparse_solver, S is stored as a dense matrix.&lt;/P&gt;

&lt;P&gt;Our problem is, S is too large to store on a single compute node, so we would like it to be distributed across all compute nodes. We can distribute the input matrix A using the current interface, but the Schur matrix S is always returned to MPI process 0. Is there some option to make it return distributed in MKL 2018 update 2?&lt;/P&gt;

&lt;P&gt;If there is no option, we know we can work around the problem by partitioning A&lt;SUB&gt;22&lt;/SUB&gt; further and finding the Schur complement matrix of each section. However, this would appear to require a lot of repeated calculations. Is there some way to save intermediate calculations involving A&lt;SUB&gt;11&lt;/SUB&gt; so that these calculations don't need repeating for subsequent Schur calculations?&lt;/P&gt;

&lt;P&gt;Thank you,&lt;/P&gt;

&lt;P&gt;Laura&lt;/P&gt;</description>
      <pubDate>Wed, 25 Apr 2018 15:48:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cluster-sparse-solver-Schur-complement-how-to-distribute-the/m-p/1174830#M28848</guid>
      <dc:creator>Laura_S_3</dc:creator>
      <dc:date>2018-04-25T15:48:55Z</dc:date>
    </item>
    <item>
      <title>Laura,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cluster-sparse-solver-Schur-complement-how-to-distribute-the/m-p/1174831#M28849</link>
      <description>&lt;P&gt;Laura,&lt;/P&gt;

&lt;P&gt;As implemented currently in MKL 2018 update 2, the Schur complement, S,&amp;nbsp;is always assembled on the master MPI process.&amp;nbsp; There is no option to have it returned distributed.&amp;nbsp; The Schur complement is computed as a by product of the general LU factorization which is done in parallel, so it is possible it could be parallelized in the future, but at the moment it is not supported.&amp;nbsp; We will look into it.&lt;/P&gt;

&lt;P&gt;I don't really see any viable ways to avoid the recomputation of data.&amp;nbsp; One idea was to break the matrix into the 4 CSR matrices by blocks and then you could construct the schur complement yourself.&amp;nbsp; That is, use cluster sparse solver to solve&lt;/P&gt;

&lt;P&gt;A_11 X = A_12&lt;/P&gt;

&lt;P&gt;but this requires A_12 to be stored in dense format.&amp;nbsp; This is probably&amp;nbsp;undesirable.&amp;nbsp; But if that is acceptable, then you would have to implement your own distributed SpM*M multiplication and sparse + dense addition.&amp;nbsp; Intel MKL Sparse BLAS IE has a (OpenMP or TBB) parallel implementation of SpM * M&amp;nbsp; (&lt;A href="https://software.intel.com/en-us/mkl-developer-reference-c-mkl-sparse-mm"&gt;https://software.intel.com/en-us/mkl-developer-reference-c-mkl-sparse-mm)&lt;/A&gt;&amp;nbsp;but not a distributed MPI version.&amp;nbsp;This could be used internal to each&amp;nbsp;MPI processor, but the communication would have to be done yourself.&amp;nbsp; Likewise the sparse + dense add could be done fairly simply if you save into the dense matrix&amp;nbsp;but the MPI communication would have to be implemented yourself.&lt;/P&gt;

&lt;P&gt;I realize this is inconvenient, but it is what is available at the moment.&lt;/P&gt;

&lt;P&gt;Best,&lt;/P&gt;

&lt;P&gt;Spencer&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 07 May 2018 21:05:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cluster-sparse-solver-Schur-complement-how-to-distribute-the/m-p/1174831#M28849</guid>
      <dc:creator>Spencer_P_Intel</dc:creator>
      <dc:date>2018-05-07T21:05:40Z</dc:date>
    </item>
    <item>
      <title>Laura,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cluster-sparse-solver-Schur-complement-how-to-distribute-the/m-p/1174832#M28850</link>
      <description>&lt;P&gt;Laura,&lt;/P&gt;

&lt;P&gt;I thought about your issue a little more and I wanted to point out another solution that again probably&amp;nbsp;doesn't really work for your case but is worth mentioning...&amp;nbsp; :)&amp;nbsp; In mkl 2019 beta (now generally available), we have implemented a way to output the Schur complement&amp;nbsp;in sparse (csr) format instead of full dense format for the SMP version of sparse solvers (Intel MKL Pardiso) but not yet the cluster (MPI) version (Cluster Sparse Solvers).&amp;nbsp; This would help if the schur complement&amp;nbsp;is relatively sparse; however this is not always guaranteed to be the case.&amp;nbsp; The documentation can be found in &lt;A href="https://software.intel.com/en-us/mkl-developer-reference-c-2019-beta-pardiso-export"&gt;https://software.intel.com/en-us/mkl-developer-reference-c-2019-beta-pardiso-export.&lt;/A&gt;&amp;nbsp;See the usage example at the bottom of the link.&amp;nbsp; The key differences are the following:&lt;/P&gt;

&lt;P&gt;Use iparm[36-1] = -1 or -2 for sparse&amp;nbsp;instead of 1 or 2 for&amp;nbsp; dense, depending on whether you just want to solve for schur or also want the full LU to be available.&lt;/P&gt;

&lt;P&gt;If sparse output format&amp;nbsp;(&amp;lt; 0) is chosen, then the nnz in the schur complement is filled into iparm[36-1] after the reordering step.&amp;nbsp; You can then use this to create the proper sized csr schur complement arrays.&amp;nbsp; You pass these arrays to pardiso using the pardiso_export() function, then after calling the factorization stage, the schur complement arrays are filled.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Like I said, this probably doesn't help you since you are using the fully distributed cluster version and so your problem is likely much larger than can be handled on a single node.&amp;nbsp; But if your problem could fit onto a single node (for instance with 56 cores, or 44 cores, etc) then this is a much&amp;nbsp;better solution.&lt;/P&gt;

&lt;P&gt;Best,&lt;/P&gt;

&lt;P&gt;Spencer&lt;/P&gt;</description>
      <pubDate>Mon, 07 May 2018 22:31:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cluster-sparse-solver-Schur-complement-how-to-distribute-the/m-p/1174832#M28850</guid>
      <dc:creator>Spencer_P_Intel</dc:creator>
      <dc:date>2018-05-07T22:31:26Z</dc:date>
    </item>
  </channel>
</rss>

