<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic No speedup of cluster_sparse_solver beyond 32 cpus in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/No-speedup-of-cluster-sparse-solver-beyond-32-cpus/m-p/1082933#M22872</link>
    <description>&lt;P&gt;My cluster has 16 cpus/node. My matrix is symmetric positive definite and size is ~2 million by 2 million with ~4 million non-zero entries. My factorization times are:&lt;/P&gt;

&lt;P&gt;16 cpus -&amp;nbsp; 84 seconds&lt;/P&gt;

&lt;P&gt;32 cpus - 44 seconds&lt;/P&gt;

&lt;P&gt;48 cpus - 48 seconds ?!&lt;/P&gt;

&lt;P&gt;The factorization takes longer with 48 cpus compared to 32 cpus.&lt;/P&gt;

&lt;P&gt;I have tried with smaller matrix and get the same results. There is no speedup beyond 32 cpus. Is this a known limitation of cluster_sparse_solver or a problem with my cluster? If a cluster problem, any suggestions on how can I narrow down the bottleneck?&lt;/P&gt;</description>
    <pubDate>Fri, 11 Nov 2016 03:58:04 GMT</pubDate>
    <dc:creator>Ferris_H_</dc:creator>
    <dc:date>2016-11-11T03:58:04Z</dc:date>
    <item>
      <title>No speedup of cluster_sparse_solver beyond 32 cpus</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/No-speedup-of-cluster-sparse-solver-beyond-32-cpus/m-p/1082933#M22872</link>
      <description>&lt;P&gt;My cluster has 16 cpus/node. My matrix is symmetric positive definite and size is ~2 million by 2 million with ~4 million non-zero entries. My factorization times are:&lt;/P&gt;

&lt;P&gt;16 cpus -&amp;nbsp; 84 seconds&lt;/P&gt;

&lt;P&gt;32 cpus - 44 seconds&lt;/P&gt;

&lt;P&gt;48 cpus - 48 seconds ?!&lt;/P&gt;

&lt;P&gt;The factorization takes longer with 48 cpus compared to 32 cpus.&lt;/P&gt;

&lt;P&gt;I have tried with smaller matrix and get the same results. There is no speedup beyond 32 cpus. Is this a known limitation of cluster_sparse_solver or a problem with my cluster? If a cluster problem, any suggestions on how can I narrow down the bottleneck?&lt;/P&gt;</description>
      <pubDate>Fri, 11 Nov 2016 03:58:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/No-speedup-of-cluster-sparse-solver-beyond-32-cpus/m-p/1082933#M22872</guid>
      <dc:creator>Ferris_H_</dc:creator>
      <dc:date>2016-11-11T03:58:04Z</dc:date>
    </item>
    <item>
      <title>Ferris, could you check the</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/No-speedup-of-cluster-sparse-solver-beyond-32-cpus/m-p/1082934#M22873</link>
      <description>&lt;P&gt;Ferris, could you check the scalability with larger problem size?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 11 Nov 2016 09:17:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/No-speedup-of-cluster-sparse-solver-beyond-32-cpus/m-p/1082934#M22873</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2016-11-11T09:17:32Z</dc:date>
    </item>
    <item>
      <title>Unfortunately, I do not have</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/No-speedup-of-cluster-sparse-solver-beyond-32-cpus/m-p/1082935#M22874</link>
      <description>&lt;P&gt;Unfortunately, I do not have any larger matrixes to test . The size I am testing is around the largest I would see in my area. Are there any public benchmark matrixes I could download to test? If not , I can create an example code that reads in my matrix for you to test on your cluster.&lt;/P&gt;</description>
      <pubDate>Fri, 11 Nov 2016 15:31:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/No-speedup-of-cluster-sparse-solver-beyond-32-cpus/m-p/1082935#M22874</guid>
      <dc:creator>Ferris_H_</dc:creator>
      <dc:date>2016-11-11T15:31:04Z</dc:date>
    </item>
    <item>
      <title>I created an example file</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/No-speedup-of-cluster-sparse-solver-beyond-32-cpus/m-p/1082936#M22875</link>
      <description>&lt;P&gt;I created an example file that can reproduce the issue. Download cl_solver_sym_sp_0_based_c.c from here:&lt;/P&gt;

&lt;P&gt;&lt;A href="https://www.dropbox.com/s/ndkzi9zojxuh1xo/cl_solver_sym_sp_0_based_c.c?dl=0"&gt;https://www.dropbox.com/s/ndkzi9zojxuh1xo/cl_solver_sym_sp_0_based_c.c?dl=0&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Edit all the occurences of *.txt to the path where the files are on your system.&lt;/P&gt;

&lt;P&gt;ia, ja, a and b data in text files are all here:&lt;/P&gt;

&lt;P&gt;&lt;A href="https://www.dropbox.com/s/3dkhbillyso03kc/ia_ja_a_b_data.tar.gz?dl=0"&gt;https://www.dropbox.com/s/3dkhbillyso03kc/ia_ja_a_b_data.tar.gz?dl=0&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Curious what kind of performance improvement you get when running with MPI on 16, 32, 48, and 72 cpus!&lt;/P&gt;</description>
      <pubDate>Mon, 21 Nov 2016 16:21:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/No-speedup-of-cluster-sparse-solver-beyond-32-cpus/m-p/1082936#M22875</guid>
      <dc:creator>Ferris_H_</dc:creator>
      <dc:date>2016-11-21T16:21:42Z</dc:date>
    </item>
    <item>
      <title>Ferris, do you have access to</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/No-speedup-of-cluster-sparse-solver-beyond-32-cpus/m-p/1082937#M22876</link>
      <description>&lt;P&gt;Ferris, do you have access to the 64 cores system? i am currently not, &amp;nbsp;if you have, could you please try and give us the results? The scalability may be different if the number of nodes will be power of 2.&lt;/P&gt;</description>
      <pubDate>Tue, 22 Nov 2016 11:05:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/No-speedup-of-cluster-sparse-solver-beyond-32-cpus/m-p/1082937#M22876</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2016-11-22T11:05:01Z</dc:date>
    </item>
    <item>
      <title>Quote:Gennady F. (Intel)</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/No-speedup-of-cluster-sparse-solver-beyond-32-cpus/m-p/1082938#M22877</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Gennady F. (Intel) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Ferris, do you have access to the 64 cores system? i am currently not, &amp;nbsp;if you have, could you please try and give us the results? The scalability may be different if the number of nodes will be power of 2.&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Hi Gennady,&lt;/P&gt;

&lt;P&gt;As requested, I solved my model on a larger 4-node 60 core cluster with 15 cores/node each. Below are the factorization times:&lt;/P&gt;

&lt;P&gt;15 cores - 70 seconds&lt;/P&gt;

&lt;P&gt;30 cores - 41 seconds&lt;/P&gt;

&lt;P&gt;45 cores - 42 seconds&lt;/P&gt;

&lt;P&gt;60 cores - 36 seconds&lt;/P&gt;

&lt;P&gt;So seems there is some improvement when the number of nodes is 4. But with 3 nodes it shows same solve times as 2 nodes. Does the number of nodes always have to be a power of 2? Or could there be some problem with my cluster?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 29 Nov 2016 03:34:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/No-speedup-of-cluster-sparse-solver-beyond-32-cpus/m-p/1082938#M22877</guid>
      <dc:creator>Ferris_H_</dc:creator>
      <dc:date>2016-11-29T03:34:22Z</dc:date>
    </item>
  </channel>
</rss>

