<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic I just tried this with Intel in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-perhost-and-SLURM-Can-I-override-SLURM/m-p/1048866#M4374</link>
    <description>&lt;P&gt;Overriding works with Intel MPI 5.1.3.181&lt;/P&gt;

&lt;P&gt;I just tried this with Intel MPI 5.1.3.181. It seems, "I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=disable" is no longer ignored . When this variable is set, SLURM process placement is overwritten by "-ppn" or "-perhost".&lt;/P&gt;</description>
    <pubDate>Thu, 12 May 2016 11:02:09 GMT</pubDate>
    <dc:creator>Nico_Mittenzwey</dc:creator>
    <dc:date>2016-05-12T11:02:09Z</dc:date>
    <item>
      <title>Intel MPI, perhost, and SLURM: Can I override SLURM?</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-perhost-and-SLURM-Can-I-override-SLURM/m-p/1048863#M4371</link>
      <description>&lt;P&gt;All,&lt;/P&gt;

&lt;P&gt;(Note: I'm also asking this on the slurm-dev list.)&lt;/P&gt;

&lt;P&gt;I'm hoping you can help me with a question. Namely, I'm on a cluster that uses SLURM and lets say I ask for 2 28-core Haswell nodes to run interactively and I get them. Great, so my environment now has things like:&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;SLURM_NTASKS_PER_NODE=28
SLURM_TASKS_PER_NODE=28(x2)
SLURM_JOB_CPUS_PER_NODE=28(x2)
SLURM_CPUS_ON_NODE=28
&lt;/PRE&gt;

&lt;P&gt;Now, let's run a simple HelloWorld on, say, 48 processors (and pipe through sort to see things a bit better):&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;(1047) $ mpirun -np 48 -print-rank-map ./helloWorld.exe | sort -k2 -g
srun.slurm: cluster configuration lacks support for cpu binding
(borgj102:0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27)
(borgj105:28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47)
Process    0 of   48 is on borgj102
Process    1 of   48 is on borgj102
Process    2 of   48 is on borgj102
Process    3 of   48 is on borgj102
Process    4 of   48 is on borgj102
Process    5 of   48 is on borgj102
Process    6 of   48 is on borgj102
Process    7 of   48 is on borgj102
Process    8 of   48 is on borgj102
Process    9 of   48 is on borgj102
Process   10 of   48 is on borgj102
Process   11 of   48 is on borgj102
Process   12 of   48 is on borgj102
Process   13 of   48 is on borgj102
Process   14 of   48 is on borgj102
Process   15 of   48 is on borgj102
Process   16 of   48 is on borgj102
Process   17 of   48 is on borgj102
Process   18 of   48 is on borgj102
Process   19 of   48 is on borgj102
Process   20 of   48 is on borgj102
Process   21 of   48 is on borgj102
Process   22 of   48 is on borgj102
Process   23 of   48 is on borgj102
Process   24 of   48 is on borgj102
Process   25 of   48 is on borgj102
Process   26 of   48 is on borgj102
Process   27 of   48 is on borgj102
Process   28 of   48 is on borgj105
Process   29 of   48 is on borgj105
Process   30 of   48 is on borgj105
Process   31 of   48 is on borgj105
Process   32 of   48 is on borgj105
Process   33 of   48 is on borgj105
Process   34 of   48 is on borgj105
Process   35 of   48 is on borgj105
Process   36 of   48 is on borgj105
Process   37 of   48 is on borgj105
Process   38 of   48 is on borgj105
Process   39 of   48 is on borgj105
Process   40 of   48 is on borgj105
Process   41 of   48 is on borgj105
Process   42 of   48 is on borgj105
Process   43 of   48 is on borgj105
Process   44 of   48 is on borgj105
Process   45 of   48 is on borgj105
Process   46 of   48 is on borgj105
Process   47 of   48 is on borgj105
&lt;/PRE&gt;

&lt;P&gt;As you can see, the first 28 processes are on node 1, and the last 20 are on node 2. Okay. Now, I want to do some load balancing, so I want 24 on each. In the past, I always used -perhost and it worked, but now:&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;(1048) $ mpirun -np 48 -perhost 24 -print-rank-map ./helloWorld.exe | sort -k2 -g
srun.slurm: cluster configuration lacks support for cpu binding
(borgj102:0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27)
(borgj105:28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47)
Process    0 of   48 is on borgj102
Process    1 of   48 is on borgj102
Process    2 of   48 is on borgj102
Process    3 of   48 is on borgj102
Process    4 of   48 is on borgj102
Process    5 of   48 is on borgj102
Process    6 of   48 is on borgj102
Process    7 of   48 is on borgj102
Process    8 of   48 is on borgj102
Process    9 of   48 is on borgj102
Process   10 of   48 is on borgj102
Process   11 of   48 is on borgj102
Process   12 of   48 is on borgj102
Process   13 of   48 is on borgj102
Process   14 of   48 is on borgj102
Process   15 of   48 is on borgj102
Process   16 of   48 is on borgj102
Process   17 of   48 is on borgj102
Process   18 of   48 is on borgj102
Process   19 of   48 is on borgj102
Process   20 of   48 is on borgj102
Process   21 of   48 is on borgj102
Process   22 of   48 is on borgj102
Process   23 of   48 is on borgj102
Process   24 of   48 is on borgj102
Process   25 of   48 is on borgj102
Process   26 of   48 is on borgj102
Process   27 of   48 is on borgj102
Process   28 of   48 is on borgj105
Process   29 of   48 is on borgj105
Process   30 of   48 is on borgj105
Process   31 of   48 is on borgj105
Process   32 of   48 is on borgj105
Process   33 of   48 is on borgj105
Process   34 of   48 is on borgj105
Process   35 of   48 is on borgj105
Process   36 of   48 is on borgj105
Process   37 of   48 is on borgj105
Process   38 of   48 is on borgj105
Process   39 of   48 is on borgj105
Process   40 of   48 is on borgj105
Process   41 of   48 is on borgj105
Process   42 of   48 is on borgj105
Process   43 of   48 is on borgj105
Process   44 of   48 is on borgj105
Process   45 of   48 is on borgj105
Process   46 of   48 is on borgj105
Process   47 of   48 is on borgj105
&lt;/PRE&gt;

&lt;P&gt;Huh. No change and still 28,20. Do you know if there is a way to "override" what appears to be SLURM beating the -perhost flag? I suppose there is that srun.slurm warning being thrown, but that usually is a warning for more "tasks-per-core" sort of manipulations.&lt;/P&gt;

&lt;P&gt;Thanks,&lt;/P&gt;

&lt;P&gt;Matt&lt;/P&gt;</description>
      <pubDate>Thu, 30 Apr 2015 14:26:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-perhost-and-SLURM-Can-I-override-SLURM/m-p/1048863#M4371</guid>
      <dc:creator>Matt_Thompson</dc:creator>
      <dc:date>2015-04-30T14:26:28Z</dc:date>
    </item>
    <item>
      <title>Oh, and since I forgot, I'm</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-perhost-and-SLURM-Can-I-override-SLURM/m-p/1048864#M4372</link>
      <description>&lt;P&gt;Oh, and since I forgot, I'm running Intel MPI 5.0.3.048. Sorry!&lt;/P&gt;</description>
      <pubDate>Thu, 30 Apr 2015 14:27:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-perhost-and-SLURM-Can-I-override-SLURM/m-p/1048864#M4372</guid>
      <dc:creator>Matt_Thompson</dc:creator>
      <dc:date>2015-04-30T14:27:13Z</dc:date>
    </item>
    <item>
      <title>Addendum,</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-perhost-and-SLURM-Can-I-override-SLURM/m-p/1048865#M4373</link>
      <description>&lt;P&gt;Addendum: Per an admin here at NASA on the SLURM List:&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;I'm pretty confident in saying this is entirely in Intel MPI land:

aknister@borgj157:~&amp;gt; I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=enable mpiexec.hydra -np 48 -ppn 24 -print-rank-map /bin/true
(borgj157:0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27)
(borgj164:28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47)

aknister@borgj157:~&amp;gt; I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=disable mpiexec.hydra -np 48 -ppn 24 -print-rank-map /bin/true
(borgj157:0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23)
(borgj164:24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47)

However, if a machinefile argument is passed to mpiexec.hydra (which mpirun does by default) 
the I_MPI_JOB_RESPECT_PROCESS_PLACEMENT variable isn't respected (see below). 
Maybe we need an I_MPI_JOB_RESPECT_I_MPI_JOB_RESPECT_PROCESS_PLACEMENT_VARIABLE variable.

aknister@borgj157:~&amp;gt; I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=enable mpiexec.hydra -machinefile $PBS_NODEFILE -np 48 -ppn 24 --print-rank-map true
(borgj157:0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27)
(borgj164:28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47)

aknister@borgj157:~&amp;gt; I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=disable mpiexec.hydra -machinefile $PBS_NODEFILE -np 48 -ppn 24 --print-rank-map true
(borgj157:0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27)
(borgj164:28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47)
&lt;/PRE&gt;

&lt;P&gt;Does anyone here at Intel know how to get mpirun to respect this so -ppn can work with SLURM?&lt;/P&gt;</description>
      <pubDate>Mon, 04 May 2015 12:39:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-perhost-and-SLURM-Can-I-override-SLURM/m-p/1048865#M4373</guid>
      <dc:creator>Matt_Thompson</dc:creator>
      <dc:date>2015-05-04T12:39:09Z</dc:date>
    </item>
    <item>
      <title>I just tried this with Intel</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-perhost-and-SLURM-Can-I-override-SLURM/m-p/1048866#M4374</link>
      <description>&lt;P&gt;Overriding works with Intel MPI 5.1.3.181&lt;/P&gt;

&lt;P&gt;I just tried this with Intel MPI 5.1.3.181. It seems, "I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=disable" is no longer ignored . When this variable is set, SLURM process placement is overwritten by "-ppn" or "-perhost".&lt;/P&gt;</description>
      <pubDate>Thu, 12 May 2016 11:02:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-perhost-and-SLURM-Can-I-override-SLURM/m-p/1048866#M4374</guid>
      <dc:creator>Nico_Mittenzwey</dc:creator>
      <dc:date>2016-05-12T11:02:09Z</dc:date>
    </item>
  </channel>
</rss>

