<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic The 11.3 update 4 has been in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Pardiso-solver-much-slower-when-using-MPI/m-p/1077373#M22631</link>
    <description>&lt;P&gt;The 11.3 update 4 has been released the last week. You may try to check the problem on his side. thanks.&lt;/P&gt;</description>
    <pubDate>Mon, 26 Sep 2016 04:13:39 GMT</pubDate>
    <dc:creator>Gennady_F_Intel</dc:creator>
    <dc:date>2016-09-26T04:13:39Z</dc:date>
    <item>
      <title>Pardiso solver much slower when using MPI?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Pardiso-solver-much-slower-when-using-MPI/m-p/1077362#M22620</link>
      <description>&lt;P&gt;My cluster has 24 cpus/node with 256GB ram and Infiniband. We have mpich, mvapich2, openmpi, impi all installed.&lt;/P&gt;

&lt;P&gt;I studied the example cl_solver_sym_sp_0_based_c.c in cluster_sparse_solverc/source . I compiled it using:&lt;/P&gt;

&lt;P&gt;make libintel64 example=cl_solver_sym_sp_0_based_c&lt;/P&gt;

&lt;P&gt;It runs fine . However the matrix is too small to look at performance. So I modified the example to read in a 3million^2 matrix from a text file. When I run it without any mpi, using just:&lt;/P&gt;

&lt;P&gt;./cl_solver_sym_sp_0_based_c&lt;/P&gt;

&lt;P&gt;It solves quickly and factors the matrix in 30 seconds. A 'top' command shows the CPU % go to 2400%.&lt;/P&gt;

&lt;P&gt;If I try and do mpirun or mpiexec -np 24 ./cl_solver_sym_sp_0_based_c , then the factorization takes nearly 10X longer! A "top" shows each process using 100%cpu.&lt;/P&gt;

&lt;P&gt;I think I am doing something wrong with mpirun/mpiexec ? I would expect it give the same factorization times as just running it directly? I tried also playing around the OMP_NUM_THREADS variable. But nothing seemed to improve the factorization times. Here is some output of my history:&lt;/P&gt;

&lt;P&gt;&amp;nbsp; 926&amp;nbsp; mpiexec -np2 /cl_solver_sym_sp_0_based_c.exe&lt;BR /&gt;
	&amp;nbsp; 927&amp;nbsp; mpiexec -np 2 ./cl_solver_sym_sp_0_based_c.exe&lt;BR /&gt;
	&amp;nbsp; 928&amp;nbsp; module avail&lt;BR /&gt;
	&amp;nbsp; 929&amp;nbsp; module lad mvapich2-2.1rc2-intel-16.0&lt;BR /&gt;
	&amp;nbsp; 930&amp;nbsp; module load mvapich2-2.1rc2-intel-16.0&lt;BR /&gt;
	&amp;nbsp; 931&amp;nbsp; mpiexec -np 2 ./cl_solver_sym_sp_0_based_c.exe&lt;BR /&gt;
	&amp;nbsp; 932&amp;nbsp; mpdboot&lt;BR /&gt;
	&amp;nbsp; 933&amp;nbsp; mpiexec -np 2 ./cl_solver_sym_sp_0_based_c.exe&lt;BR /&gt;
	&amp;nbsp; 934&amp;nbsp; export OMP_NUM_THREADS=1&lt;BR /&gt;
	&amp;nbsp; 935&amp;nbsp; mpiexec -np 12 ./cl_solver_sym_sp_0_based_c.exe&lt;BR /&gt;
	&amp;nbsp; 936&amp;nbsp; export OMP_NUM_THREADS=24&lt;BR /&gt;
	&amp;nbsp; 937&amp;nbsp; mpiexec -np 1 ./cl_solver_sym_sp_0_based_c.exe&lt;BR /&gt;
	&amp;nbsp; 938&amp;nbsp; mpirun -V&lt;BR /&gt;
	&amp;nbsp; 939&amp;nbsp; mpirun -np 1 ./cl_solver_sym_sp_0_based_c.exe&lt;BR /&gt;
	&amp;nbsp; 940&amp;nbsp; export OMP_NUM_THREADS=4&lt;BR /&gt;
	&amp;nbsp; 941&amp;nbsp; mpirun -np 6 ./cl_solver_sym_sp_0_based_c.exe&lt;BR /&gt;
	&amp;nbsp; 942&amp;nbsp; export OMP_NUM_THREADS=6&lt;BR /&gt;
	&amp;nbsp; 943&amp;nbsp; mpirun -np 4 ./cl_solver_sym_sp_0_based_c.exe&lt;BR /&gt;
	&amp;nbsp; 944&amp;nbsp; mpiexec -np 4 ./cl_solver_sym_sp_0_based_c.exe&lt;BR /&gt;
	&amp;nbsp; 945&amp;nbsp; mpiexec -np 1 ./cl_solver_sym_sp_0_based_c.exe&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 29 Jul 2016 16:50:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Pardiso-solver-much-slower-when-using-MPI/m-p/1077362#M22620</guid>
      <dc:creator>Ferris_H_</dc:creator>
      <dc:date>2016-07-29T16:50:30Z</dc:date>
    </item>
    <item>
      <title>An example is worth a</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Pardiso-solver-much-slower-when-using-MPI/m-p/1077363#M22621</link>
      <description>&lt;P&gt;An example is worth a thousands words, so here are my example files!&lt;/P&gt;

&lt;P&gt;cl_solver_sym_sp_0_based_c.c - Edit all the occurences of *.txt to the path where the files are on your system&lt;/P&gt;

&lt;P&gt;&lt;A href="https://www.dropbox.com/s/ndkzi9zojxuh1xo/cl_solver_sym_sp_0_based_c.c?dl=0" target="_blank"&gt;https://www.dropbox.com/s/ndkzi9zojxuh1xo/cl_solver_sym_sp_0_based_c.c?dl=0&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;ia, ja, a, and b data in text files:&lt;/P&gt;

&lt;P&gt;&lt;A href="https://www.dropbox.com/s/3dkhbillyso03kc/ia_ja_a_b_data.tar.gz?dl=0" target="_blank"&gt;https://www.dropbox.com/s/3dkhbillyso03kc/ia_ja_a_b_data.tar.gz?dl=0&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Curious what kind of performance improvement you get when running with MPI on 12, 24, 48, and 72 cpus!&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 29 Jul 2016 17:05:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Pardiso-solver-much-slower-when-using-MPI/m-p/1077363#M22621</guid>
      <dc:creator>Ferris_H_</dc:creator>
      <dc:date>2016-07-29T17:05:58Z</dc:date>
    </item>
    <item>
      <title>Hi Ferris.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Pardiso-solver-much-slower-when-using-MPI/m-p/1077364#M22622</link>
      <description>&lt;P&gt;Hi Ferris.&lt;/P&gt;

&lt;P&gt;That's really strange behaviour. Can i ask you to set msglvl to 1 and provide output here?&lt;/P&gt;

&lt;P&gt;Thanks,&lt;/P&gt;

&lt;P&gt;Alex&lt;/P&gt;</description>
      <pubDate>Mon, 01 Aug 2016 10:25:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Pardiso-solver-much-slower-when-using-MPI/m-p/1077364#M22622</guid>
      <dc:creator>Alexander_K_Intel2</dc:creator>
      <dc:date>2016-08-01T10:25:10Z</dc:date>
    </item>
    <item>
      <title>I am attaching the output for</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Pardiso-solver-much-slower-when-using-MPI/m-p/1077365#M22623</link>
      <description>&lt;P&gt;I am attaching the output for the non-mpi run&amp;nbsp; with msglvl=1. Today when I try and run with mpi I am getting errors like:&lt;/P&gt;

&lt;P&gt;[hussaf@cforge200 cluster_sparse_solverc]$ mpiexec -np 12 ./cl_solver_sym_sp_0_based_c.exe &amp;gt; out.txt&lt;BR /&gt;
	[cforge200:mpi_rank_1][error_sighandler] Caught error: Segmentation fault (signal 11)&lt;/P&gt;

&lt;P&gt;Reordering completed ... rank 1 in job 2&amp;nbsp; cforge200_35175&amp;nbsp;&amp;nbsp; caused collective abort of all ranks&lt;BR /&gt;
	&amp;nbsp; exit status of rank 1: killed by signal 9&lt;/P&gt;

&lt;P&gt;[hussaf@cforge200 cluster_sparse_solverc]$ module load mvapich2-2.1rc2-intel-16.0&lt;BR /&gt;
	[hussaf@cforge200 cluster_sparse_solverc]$ mpirun -V&lt;BR /&gt;
	Intel(R) MPI Library for Linux* OS, Version 5.1.3 Build 20160120 (build id: 14053)&lt;BR /&gt;
	Copyright (C) 2003-2016, Intel Corporation. All rights reserved.&lt;BR /&gt;
	[hussaf@cforge200 cluster_sparse_solverc]$ mpiexec -V&lt;BR /&gt;
	Intel(R) MPI Library for Linux* OS, 64-bit applications, Version 5.1.3&amp;nbsp; Build 20160120&lt;BR /&gt;
	Copyright (C) 2003-2015 Intel Corporation.&amp;nbsp; All rights reserved.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 01 Aug 2016 13:59:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Pardiso-solver-much-slower-when-using-MPI/m-p/1077365#M22623</guid>
      <dc:creator>Ferris_H_</dc:creator>
      <dc:date>2016-08-01T13:59:00Z</dc:date>
    </item>
    <item>
      <title>A little progress. If I do:</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Pardiso-solver-much-slower-when-using-MPI/m-p/1077366#M22624</link>
      <description>&lt;P&gt;A little progress. If I do:&lt;/P&gt;

&lt;P&gt;mpirun -np 1 ./cl_solver_sym_sp_0_based_c.exe&lt;/P&gt;

&lt;P&gt;Then it completes in similar time to the non-mpi run ( ./cl_solver_sym_sp_0_based_c.exe ). It does appear to be using 24 threads .&lt;/P&gt;

&lt;P&gt;Now I want to test this on two hosts. So my hostfile looks like:&lt;/P&gt;

&lt;P&gt;cforge200:24&lt;BR /&gt;
	cforge201:24&lt;/P&gt;

&lt;P&gt;When I execute:&lt;/P&gt;

&lt;P&gt;&amp;nbsp;mpirun -np 2 -hostfile /home/hussaf/intel/cluster_sparse_solverc/hostfile ./cl_solver_sym_sp_0_based_c.exe&lt;/P&gt;

&lt;P&gt;It runs everything on one execution node&amp;nbsp;and creates two MPI processes on cforge200. The solve time is same as previous cases. How can I get it to run on two hosts using all 48 cpus?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 01 Aug 2016 14:56:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Pardiso-solver-much-slower-when-using-MPI/m-p/1077366#M22624</guid>
      <dc:creator>Ferris_H_</dc:creator>
      <dc:date>2016-08-01T14:56:00Z</dc:date>
    </item>
    <item>
      <title>I made some more progress.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Pardiso-solver-much-slower-when-using-MPI/m-p/1077367#M22625</link>
      <description>&lt;P&gt;I made some more progress. Instead of -hostfile, I had to use -machinefile. So my command is:&lt;/P&gt;

&lt;P&gt;mpirun -np 2 -env OMP_NUM_THREADS=24 -machinefile ./hostfile ./cl_solver_sym_sp_0_based_c.exe&lt;/P&gt;

&lt;P&gt;I am attaching the output of this run with msglvl=1 . As you can see it solves nearly 8X longer than when just run on one node with no mpi ! Any suggestions for how to debug further?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 01 Aug 2016 16:08:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Pardiso-solver-much-slower-when-using-MPI/m-p/1077367#M22625</guid>
      <dc:creator>Ferris_H_</dc:creator>
      <dc:date>2016-08-01T16:08:20Z</dc:date>
    </item>
    <item>
      <title>No idea. I see such results</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Pardiso-solver-much-slower-when-using-MPI/m-p/1077368#M22626</link>
      <description>&lt;P&gt;No idea. I see such results on cluster with poor network but you wrote that infiniband used. Currently I am far for my cluster but I will download and run you testcase tomorrow when will back to office to check results on my side, ok?&lt;/P&gt;

&lt;P&gt;Thanks,&lt;/P&gt;

&lt;P&gt;Alex&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 01 Aug 2016 18:00:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Pardiso-solver-much-slower-when-using-MPI/m-p/1077368#M22626</guid>
      <dc:creator>Alexander_K_Intel2</dc:creator>
      <dc:date>2016-08-01T18:00:39Z</dc:date>
    </item>
    <item>
      <title>I figured out my issue. I was</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Pardiso-solver-much-slower-when-using-MPI/m-p/1077369#M22627</link>
      <description>&lt;P&gt;I figured out my issue. I was using mpirun by mvapich2-2.1rc2-intel-16.0 . When I used Intel mpirun, the problem solved fast. I am now facing&amp;nbsp;a new issue where I can only solve on 1 or 2 compute nodes. If I try and use 3 or more compute nodes, I get an error. Will start a new thread on that to avoid confusion!&lt;/P&gt;</description>
      <pubDate>Mon, 01 Aug 2016 18:51:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Pardiso-solver-much-slower-when-using-MPI/m-p/1077369#M22627</guid>
      <dc:creator>Ferris_H_</dc:creator>
      <dc:date>2016-08-01T18:51:24Z</dc:date>
    </item>
    <item>
      <title>Hi,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Pardiso-solver-much-slower-when-using-MPI/m-p/1077370#M22628</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;Is the matrix the same?&lt;/P&gt;

&lt;P&gt;Thanks,&lt;/P&gt;

&lt;P&gt;Alex&lt;/P&gt;</description>
      <pubDate>Tue, 02 Aug 2016 02:16:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Pardiso-solver-much-slower-when-using-MPI/m-p/1077370#M22628</guid>
      <dc:creator>Alexander_K_Intel2</dc:creator>
      <dc:date>2016-08-02T02:16:08Z</dc:date>
    </item>
    <item>
      <title>Yes, matrix is the same. I</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Pardiso-solver-much-slower-when-using-MPI/m-p/1077371#M22629</link>
      <description>&lt;P&gt;Yes, matrix is the same. I will start a new forum post that describes the issue and how to reproduce it.&lt;/P&gt;</description>
      <pubDate>Tue, 02 Aug 2016 14:16:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Pardiso-solver-much-slower-when-using-MPI/m-p/1077371#M22629</guid>
      <dc:creator>Ferris_H_</dc:creator>
      <dc:date>2016-08-02T14:16:16Z</dc:date>
    </item>
    <item>
      <title>We see the problem with the</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Pardiso-solver-much-slower-when-using-MPI/m-p/1077372#M22630</link>
      <description>&lt;P&gt;We see the problem with the current version of mkl 11.3.3 but this has been fixed into the next update 4 which we are planning to release soon. We will keep you updated when this release happens.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Aug 2016 04:26:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Pardiso-solver-much-slower-when-using-MPI/m-p/1077372#M22630</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2016-08-03T04:26:26Z</dc:date>
    </item>
    <item>
      <title>The 11.3 update 4 has been</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Pardiso-solver-much-slower-when-using-MPI/m-p/1077373#M22631</link>
      <description>&lt;P&gt;The 11.3 update 4 has been released the last week. You may try to check the problem on his side. thanks.&lt;/P&gt;</description>
      <pubDate>Mon, 26 Sep 2016 04:13:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Pardiso-solver-much-slower-when-using-MPI/m-p/1077373#M22631</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2016-09-26T04:13:39Z</dc:date>
    </item>
  </channel>
</rss>

