<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Intel mpirun error - AI workload in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/Intel-mpirun-error-AI-workload/m-p/1130909#M5636</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp; I tried to run one of my workload model for training on a CentOs cluster for MPI analysis. Please find below the command used and the error is displayed below. Request your help in resolving the issue.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Commands used&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;mpiexec&amp;nbsp; –ppn 1 -- &lt;/STRONG&gt;./scripts/run_intelcaffe.sh --hostfile ~/mpd.hosts --solver models/intel_optimized_models/multinode/resnet50_8nodes_2s/solver.prototxt --network tcp --netmask enp175s0 --benchmark mpi&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;mpirun&amp;nbsp; –ppn 1 –l amplxe-cl -collect hotspots -k sampling-mode=hw -result-dir results -- &lt;/STRONG&gt;./scripts/run_intelcaffe.sh --hostfile ~/mpd.hosts --solver models/intel_optimized_models/multinode/resnet50_8nodes_2s/solver.prototxt --network tcp --netmask enp175s0 --benchmark mpi&lt;/P&gt;&lt;P&gt;I keep getting the following error.&amp;nbsp;&lt;/P&gt;&lt;P&gt;===================================================================================&lt;BR /&gt;= &amp;nbsp; BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES&lt;BR /&gt;= &amp;nbsp; RANK 26 PID 72362 RUNNING AT node001&lt;BR /&gt;= &amp;nbsp; EXIT STATUS: 255&lt;BR /&gt;===================================================================================&lt;/P&gt;&lt;P&gt;===================================================================================&lt;BR /&gt;= &amp;nbsp; BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES&lt;BR /&gt;= &amp;nbsp; RANK 27 PID 72363 RUNNING AT node001&lt;BR /&gt;= &amp;nbsp; KILLED BY SIGNAL: 9 (Killed)&lt;BR /&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 19 Jul 2019 04:38:58 GMT</pubDate>
    <dc:creator>Naveen_T_Intel</dc:creator>
    <dc:date>2019-07-19T04:38:58Z</dc:date>
    <item>
      <title>Intel mpirun error - AI workload</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Intel-mpirun-error-AI-workload/m-p/1130909#M5636</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp; I tried to run one of my workload model for training on a CentOs cluster for MPI analysis. Please find below the command used and the error is displayed below. Request your help in resolving the issue.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Commands used&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;mpiexec&amp;nbsp; –ppn 1 -- &lt;/STRONG&gt;./scripts/run_intelcaffe.sh --hostfile ~/mpd.hosts --solver models/intel_optimized_models/multinode/resnet50_8nodes_2s/solver.prototxt --network tcp --netmask enp175s0 --benchmark mpi&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;mpirun&amp;nbsp; –ppn 1 –l amplxe-cl -collect hotspots -k sampling-mode=hw -result-dir results -- &lt;/STRONG&gt;./scripts/run_intelcaffe.sh --hostfile ~/mpd.hosts --solver models/intel_optimized_models/multinode/resnet50_8nodes_2s/solver.prototxt --network tcp --netmask enp175s0 --benchmark mpi&lt;/P&gt;&lt;P&gt;I keep getting the following error.&amp;nbsp;&lt;/P&gt;&lt;P&gt;===================================================================================&lt;BR /&gt;= &amp;nbsp; BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES&lt;BR /&gt;= &amp;nbsp; RANK 26 PID 72362 RUNNING AT node001&lt;BR /&gt;= &amp;nbsp; EXIT STATUS: 255&lt;BR /&gt;===================================================================================&lt;/P&gt;&lt;P&gt;===================================================================================&lt;BR /&gt;= &amp;nbsp; BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES&lt;BR /&gt;= &amp;nbsp; RANK 27 PID 72363 RUNNING AT node001&lt;BR /&gt;= &amp;nbsp; KILLED BY SIGNAL: 9 (Killed)&lt;BR /&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 19 Jul 2019 04:38:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Intel-mpirun-error-AI-workload/m-p/1130909#M5636</guid>
      <dc:creator>Naveen_T_Intel</dc:creator>
      <dc:date>2019-07-19T04:38:58Z</dc:date>
    </item>
    <item>
      <title>Hi, Thallam. Do I get it</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Intel-mpirun-error-AI-workload/m-p/1130910#M5637</link>
      <description>&lt;P&gt;Hi, Thallam. Do I get it right you get a crash without ampxle too?&lt;/P&gt;&lt;P&gt;Let's check if issue is in environment/mpirun area, by running a simpler test, like:&lt;/P&gt;
&lt;PRE class="brush:bash; class-name:dark;"&gt;mpiexec.hydra -n 2 IMB-MPI1 Barrier&lt;/PRE&gt;

&lt;P&gt;Try supplying -n &amp;lt;process_count&amp;gt; argument, in case your scheduler provides node list to mpiexec too.&lt;/P&gt;
&lt;P&gt;A run with -v option and I_MPI_DEBUG=10 will give you a longer log, which you can post here.&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jul 2019 06:48:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Intel-mpirun-error-AI-workload/m-p/1130910#M5637</guid>
      <dc:creator>Maksim_B_Intel</dc:creator>
      <dc:date>2019-07-23T06:48:02Z</dc:date>
    </item>
  </channel>
</rss>

