<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Your system may be crashing in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/parallel-studio-xe-2019-update3-cluster-edition-tgz-mpiexec/m-p/1145924#M5985</link>
    <description>&lt;P&gt;Your system may be crashing hwloc library. Please, try&amp;nbsp;with I_MPI_HYDRA_TOPOLIB=ipl .&lt;/P&gt;</description>
    <pubDate>Tue, 14 May 2019 14:10:08 GMT</pubDate>
    <dc:creator>Maksim_B_Intel</dc:creator>
    <dc:date>2019-05-14T14:10:08Z</dc:date>
    <item>
      <title>parallel_studio_xe_2019_update3_cluster_edition.tgz (mpiexec.hydra - floating point exception)</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/parallel-studio-xe-2019-update3-cluster-edition-tgz-mpiexec/m-p/1145923#M5984</link>
      <description>&lt;P&gt;After installing:&lt;/P&gt;&lt;P&gt;parallel_studio_xe_2019_update3_cluster_edition.tgz&lt;/P&gt;&lt;P&gt;on any of the following OSes:&lt;/P&gt;&lt;P&gt;CentOS 7.6 / RHEL 7.6 / RHEL 8.0&lt;/P&gt;&lt;P&gt;and sourcing the corresponding env file:&lt;/P&gt;&lt;P&gt;source /opt/intel/bin/compilervars.sh -arch intel64 -platform linux&lt;/P&gt;&lt;P&gt;the following simple mpirun command:&lt;/P&gt;&lt;P&gt;mpirun -ppn 1 -n 1 -hosts localhost hostname&lt;/P&gt;&lt;P&gt;fails with the following error:&lt;/P&gt;&lt;P&gt;/opt/intel/compilers_and_libraries_2019.3.199/linux/mpi/intel64/bin/mpirun: line 103: 63486 Floating point exceptionmpiexec.hydra "$@" 0&amp;lt;&amp;amp;0&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is anybody else experiencing the same issue?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you.&lt;/P&gt;</description>
      <pubDate>Tue, 14 May 2019 14:06:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/parallel-studio-xe-2019-update3-cluster-edition-tgz-mpiexec/m-p/1145923#M5984</guid>
      <dc:creator>Matteo_Guglielmi</dc:creator>
      <dc:date>2019-05-14T14:06:01Z</dc:date>
    </item>
    <item>
      <title>Your system may be crashing</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/parallel-studio-xe-2019-update3-cluster-edition-tgz-mpiexec/m-p/1145924#M5985</link>
      <description>&lt;P&gt;Your system may be crashing hwloc library. Please, try&amp;nbsp;with I_MPI_HYDRA_TOPOLIB=ipl .&lt;/P&gt;</description>
      <pubDate>Tue, 14 May 2019 14:10:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/parallel-studio-xe-2019-update3-cluster-edition-tgz-mpiexec/m-p/1145924#M5985</guid>
      <dc:creator>Maksim_B_Intel</dc:creator>
      <dc:date>2019-05-14T14:10:08Z</dc:date>
    </item>
    <item>
      <title>setting I_MPI_HYDRA_TOPOLIB</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/parallel-studio-xe-2019-update3-cluster-edition-tgz-mpiexec/m-p/1145925#M5986</link>
      <description>&lt;P&gt;setting&amp;nbsp;I_MPI_HYDRA_TOPOLIB to ipl:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;export I_MPI_HYDRA_TOPOLIB=ipl&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;does not change anything in terms of strace:&lt;/P&gt;&lt;P&gt;strace mpiexec.hydra -ppn 1 -n 1 -hosts localhost hostname&lt;/P&gt;&lt;P&gt;...&lt;/P&gt;&lt;P&gt;open("/home/dalco/.mpiexec.conf", O_RDONLY) = -1 ENOENT (No such file or directory)&lt;BR /&gt;open("/home/dalco/mpiexec.conf", O_RDONLY) = -1 ENOENT (No such file or directory)&lt;BR /&gt;openat(AT_FDCWD, "/sys/devices/system/cpu", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3&lt;BR /&gt;getdents(3, /* 145 entries */, 32768) &amp;nbsp; = 4544&lt;BR /&gt;getdents(3, /* 0 entries */, 32768) &amp;nbsp; &amp;nbsp; = 0&lt;BR /&gt;close(3) &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;= 0&lt;BR /&gt;uname({sysname="Linux", nodename="dalcosrv", ...}) = 0&lt;BR /&gt;sched_getaffinity(0, 128, [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127]) = 128&lt;BR /&gt;--- SIGFPE {si_signo=SIGFPE, si_code=FPE_INTDIV, si_addr=0x4429b6} ---&lt;BR /&gt;+++ killed by SIGFPE +++&lt;BR /&gt;Floating point exception&lt;BR /&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;On a lab machine provided by intel for testing:&lt;/P&gt;&lt;P&gt;cat /etc/os-release&amp;nbsp;&lt;BR /&gt;NAME="Clear Linux OS"&lt;BR /&gt;VERSION=1&lt;BR /&gt;ID=clear-linux-os&lt;BR /&gt;ID_LIKE=clear-linux-os&lt;BR /&gt;VERSION_ID=29400&lt;BR /&gt;PRETTY_NAME="Clear Linux OS"&lt;BR /&gt;ANSI_COLOR="1;35"&lt;BR /&gt;HOME_URL="https://clearlinux.org"&lt;BR /&gt;SUPPORT_URL="https://clearlinux.org"&lt;BR /&gt;BUG_REPORT_URL="mailto:dev@lists.clearlinux.org"&lt;BR /&gt;PRIVACY_POLICY_URL="http://www.intel.com/privacy"&lt;BR /&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;the same parallel studio installation runs smoothly:&lt;/P&gt;&lt;P&gt;mpiexec.hydra -ppn 1 -n 1 -hosts localhost hostname&lt;BR /&gt;clxap1.lab.internal&lt;BR /&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;### here is a successful strace command on the lab machine ###&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;getdents64(3&amp;lt;/sys/devices/system/cpu&amp;gt;, /* 211 entries */, 32768) = 6656&lt;BR /&gt;getdents64(3&amp;lt;/sys/devices/system/cpu&amp;gt;, /* 0 entries */, 32768) = 0&lt;BR /&gt;close(3&amp;lt;/sys/devices/system/cpu&amp;gt;) &amp;nbsp; &amp;nbsp; &amp;nbsp; = 0&lt;BR /&gt;uname({sysname="Linux", nodename="clxap1.lab.internal", ...}) = 0&lt;BR /&gt;sched_getaffinity(0, 128, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95]) = 40&lt;BR /&gt;openat(AT_FDCWD, "/", O_RDONLY|O_DIRECTORY) = 3&amp;lt;/&amp;gt;&lt;BR /&gt;fcntl(3&amp;lt;/&amp;gt;, F_GETFD) &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;= 0&lt;BR /&gt;fcntl(3&amp;lt;/&amp;gt;, F_SETFD, FD_CLOEXEC) &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;= 0&lt;BR /&gt;faccessat(3&amp;lt;/&amp;gt;, "sys/bus/cpu/devices/cpu0/topology/thread_siblings", R_OK) = 0&lt;BR /&gt;faccessat(3&amp;lt;/&amp;gt;, "sys/bus/node/devices/node0/cpumap", R_OK) = 0&lt;BR /&gt;uname({sysname="Linux", nodename="clxap1.lab.internal", ...}) = 0&lt;BR /&gt;openat(AT_FDCWD, "/sys/devices/system/cpu/online", O_RDONLY|O_CLOEXEC) = 4&amp;lt;/sys/devices/system/cpu/online&amp;gt;&lt;BR /&gt;...&lt;/P&gt;</description>
      <pubDate>Tue, 14 May 2019 14:21:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/parallel-studio-xe-2019-update3-cluster-edition-tgz-mpiexec/m-p/1145925#M5986</guid>
      <dc:creator>Matteo_Guglielmi</dc:creator>
      <dc:date>2019-05-14T14:21:00Z</dc:date>
    </item>
    <item>
      <title>Ok, start</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/parallel-studio-xe-2019-update3-cluster-edition-tgz-mpiexec/m-p/1145926#M5987</link>
      <description>&lt;P&gt;Ok, start&lt;/P&gt;
&lt;PRE class="brush:bash; class-name:dark;"&gt;gdb mpiexec.hydra -ppn 1 -n 1 -hosts localhost hostname&lt;/PRE&gt;

&lt;P&gt;type in &lt;EM&gt;run &lt;/EM&gt;to start the command, and when it displays message about getting floating-point exception, type &lt;EM&gt;bt&lt;/EM&gt;.&lt;/P&gt;
&lt;P&gt;What is the output?&lt;/P&gt;
&lt;P&gt;Also, I didn't see anywhere what hardware it fails on.&lt;/P&gt;</description>
      <pubDate>Wed, 15 May 2019 10:14:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/parallel-studio-xe-2019-update3-cluster-edition-tgz-mpiexec/m-p/1145926#M5987</guid>
      <dc:creator>Maksim_B_Intel</dc:creator>
      <dc:date>2019-05-15T10:14:24Z</dc:date>
    </item>
  </channel>
</rss>

