<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Emond. what version of MKL do in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Direct-Sparse-Solver-for-Clusters-poor-scaling/m-p/1147401#M26818</link>
    <description>&lt;P&gt;Emond. what version of MKL do you use? is that MKL 2017 or 2019?&amp;nbsp;&lt;/P&gt;

&lt;P&gt;if you want to use Direct Solvers for Clusters, you need to link with some of mpi based libs, using&amp;nbsp;&lt;SPAN style="font-size: 1em;"&gt;mkl=parallel option will allow to link with SMP version of Intel Pardiso. Please refer to the &lt;A href="https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor"&gt;MKL Linker Adviser&lt;/A&gt; to see how to link when you need to use Direct Solver for Cluster.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;nevertheless,&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;what scalability result do you observe?&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 24 Oct 2018 02:27:14 GMT</pubDate>
    <dc:creator>Gennady_F_Intel</dc:creator>
    <dc:date>2018-10-24T02:27:14Z</dc:date>
    <item>
      <title>Direct Sparse Solver for Clusters poor scaling</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Direct-Sparse-Solver-for-Clusters-poor-scaling/m-p/1147400#M26817</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;We are currently developing a distributed version of our c++ finite element program. We planned to use&amp;nbsp;&lt;SPAN style="font-size: 13.008px;"&gt;the Intel Direct Sparse Solver for Cluster but it seems we can't reach good scalability with our settings. The matrix is assumed non symmetric and built in the DCSR format.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;The test case used is a simple thermal diffusion problem on a square grid.&amp;nbsp; Different sizes of problem, ranging from 1M to 25M DOF, have been tested with many combinations of MPI processes and OpenMP threads (usually with 1 MPI process by node or by socket). Memory allocated at factorization phase is scaling down but we observed small speed-up on running time.&lt;/P&gt;

&lt;P&gt;Actually, we observed these behaviors:&lt;/P&gt;

&lt;P&gt;- Symbolic factorization benefits from more MPI processes but is not affected by threads.&lt;/P&gt;

&lt;P&gt;- Factorization scales with number of OpenMP threads and sometimes with MPI.&lt;/P&gt;

&lt;P&gt;- Most of the time, results shows no significant gain&amp;nbsp; on s&lt;SPAN style="font-size: 13.008px;"&gt;olving phase&amp;nbsp;&lt;/SPAN&gt;for both parallelization.&lt;/P&gt;

&lt;P&gt;I must be doing something wrong but i &lt;SPAN style="font-size: 13.008px;"&gt;can't&amp;nbsp;&lt;/SPAN&gt;&amp;nbsp;seem to find the solution to the problem.&lt;/P&gt;

&lt;P&gt;Thanks a lot for any advice&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;The following iparm variables are used:&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;iparm(0) = 1;&lt;/P&gt;

&lt;P&gt;iparm(1) = 10;&lt;/P&gt;

&lt;P&gt;iparm(7) = 2;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px;"&gt;iparm(9) = 13;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px;"&gt;iparm(10) = 1;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px;"&gt;iparm(12) = 1;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px;"&gt;iparm(34) = 1;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px;"&gt;iparm(39) = 2;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px;"&gt;iparm(40,41) = first and last line of local matrix&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px;"&gt;The code is compiled with&amp;nbsp; 2017 Intel compiler and Intel MPI. Compilation flags used are :&amp;nbsp; -03 -qopenmp -mkl=parallel and&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 22 Oct 2018 23:48:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Direct-Sparse-Solver-for-Clusters-poor-scaling/m-p/1147400#M26817</guid>
      <dc:creator>Emond__Guillaume</dc:creator>
      <dc:date>2018-10-22T23:48:21Z</dc:date>
    </item>
    <item>
      <title>Emond. what version of MKL do</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Direct-Sparse-Solver-for-Clusters-poor-scaling/m-p/1147401#M26818</link>
      <description>&lt;P&gt;Emond. what version of MKL do you use? is that MKL 2017 or 2019?&amp;nbsp;&lt;/P&gt;

&lt;P&gt;if you want to use Direct Solvers for Clusters, you need to link with some of mpi based libs, using&amp;nbsp;&lt;SPAN style="font-size: 1em;"&gt;mkl=parallel option will allow to link with SMP version of Intel Pardiso. Please refer to the &lt;A href="https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor"&gt;MKL Linker Adviser&lt;/A&gt; to see how to link when you need to use Direct Solver for Cluster.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;nevertheless,&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;what scalability result do you observe?&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 24 Oct 2018 02:27:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Direct-Sparse-Solver-for-Clusters-poor-scaling/m-p/1147401#M26818</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2018-10-24T02:27:14Z</dc:date>
    </item>
    <item>
      <title>I am using MKL 2017</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Direct-Sparse-Solver-for-Clusters-poor-scaling/m-p/1147402#M26819</link>
      <description>&lt;P&gt;I am using MKL 2017&lt;/P&gt;

&lt;P&gt;To link with mpi and mkl libs, I use these linking flags :&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;-lmkl -lmkl_intel_lp64 -lmkl_core -lmkl_blacs_intelmpi_lp64 -mkl_scalapack_lp64 -lpthread -lm -ldl -lmpi&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;I just realized its not exactly the same as the MKL linker adviser. I will check if it changes anything.&lt;/P&gt;

&lt;P&gt;The attached figure shows our typical running times of a 6.5M DOF problem for analysis, factorization and solving phases.&amp;nbsp; N&amp;nbsp; &amp;amp; T are respectively the number of nodes (1 process per nodes) and threads per nodes.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 24 Oct 2018 21:06:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Direct-Sparse-Solver-for-Clusters-poor-scaling/m-p/1147402#M26819</guid>
      <dc:creator>Emond__Guillaume</dc:creator>
      <dc:date>2018-10-24T21:06:56Z</dc:date>
    </item>
    <item>
      <title>Hi,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Direct-Sparse-Solver-for-Clusters-poor-scaling/m-p/1147403#M26820</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;It seems linking options were not the probleme because same issues still occurs with the exact same options taken from linker advisor.&lt;/P&gt;</description>
      <pubDate>Mon, 29 Oct 2018 20:30:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Direct-Sparse-Solver-for-Clusters-poor-scaling/m-p/1147403#M26820</guid>
      <dc:creator>Emond__Guillaume</dc:creator>
      <dc:date>2018-10-29T20:30:20Z</dc:date>
    </item>
    <item>
      <title>Could we ask you to try the</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Direct-Sparse-Solver-for-Clusters-poor-scaling/m-p/1147404#M26821</link>
      <description>&lt;P&gt;Could we ask you to try&amp;nbsp;the latest MKL 2019 and check if the scalability problem will be the similar? or please give us the reproducer with these input data to check the problem on our side. thanks Gennady&lt;/P&gt;</description>
      <pubDate>Fri, 02 Nov 2018 01:15:37 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Direct-Sparse-Solver-for-Clusters-poor-scaling/m-p/1147404#M26821</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2018-11-02T01:15:37Z</dc:date>
    </item>
    <item>
      <title>Hello,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Direct-Sparse-Solver-for-Clusters-poor-scaling/m-p/1147405#M26822</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I noticed this post is about a year old and wonder what was the outcome. I have just submitted a help request ticket on a similar issue with 2019 version MKL. I am not seen any monotonicity in&amp;nbsp;scaling neither by MPI nor OMP (except MPI=1). See attached report. The solver itself blends perfectly with our code and I am keeping my fingers crossed.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Endel&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 07 Nov 2019 23:48:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Direct-Sparse-Solver-for-Clusters-poor-scaling/m-p/1147405#M26822</guid>
      <dc:creator>iarve__endel</dc:creator>
      <dc:date>2019-11-07T23:48:51Z</dc:date>
    </item>
  </channel>
</rss>

