<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: running on MPI cluster in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/running-on-MPI-cluster/m-p/1439830#M10162</link>
    <description>&lt;P&gt;Is&amp;nbsp;&lt;EM class="sub_section_element_selectors"&gt;"/opt/intel/oneapi/mpi/2021.8.0/etc/tuning_skx_shm-ofi_tcp-ofi-rxm_1.dat" located on all systems in your cluster?&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;EM class="sub_section_element_selectors"&gt;Jim Dempsey&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 19 Dec 2022 18:40:10 GMT</pubDate>
    <dc:creator>jimdempseyatthecove</dc:creator>
    <dc:date>2022-12-19T18:40:10Z</dc:date>
    <item>
      <title>running on MPI cluster</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/running-on-MPI-cluster/m-p/1439584#M10155</link>
      <description>&lt;P&gt;Hello every one.&lt;BR /&gt;&lt;BR /&gt;I installed a linux cluster (Ubuntu 20.04.2 LTS) following this &lt;A href="https://mpitutorial.com/tutorials/running-an-mpi-cluster-within-a-lan/" target="_self"&gt;web note&lt;/A&gt;&amp;nbsp;.&lt;BR /&gt;I finally get a working set-up with gfortran and &lt;STRIKE&gt;openmpi&amp;nbsp;&lt;/STRIKE&gt; mpich.&lt;BR /&gt;The program test :&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="fortran"&gt;program hello_mpi
USE MPI_f08
implicit none
integer num_procs, namelen,id
character *(MPI_MAX_PROCESSOR_NAME) procs_name

call MPI_INIT ()
call MPI_COMM_RANK (MPI_COMM_WORLD, id)
call MPI_COMM_SIZE (MPI_COMM_WORLD, num_procs)
call MPI_GET_PROCESSOR_NAME (procs_name, namelen)

write(*,'(A24,I2,A4,I2,A14,A15)') "Hello world from process", id, " of ", num_procs, &amp;amp;
" processes on ", procs_name

call MPI_FINALIZE ()
end program&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="lia-indent-padding-left-30px"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;!compilation command :&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;mpifort -I /usr/lib/x86_64-linux-gnu/mpich/include/ MPI_test.f90&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;! launch command :&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;mpirun -np 20 -hosts master,slave1 ./a.out&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;.....&lt;/P&gt;
&lt;P&gt;Hello world from process 2 of 20 processes on master &lt;BR /&gt;Hello world from process 1 of 20 processes on master &lt;BR /&gt;Hello world from process18 of 20 processes on slave1 &lt;BR /&gt;Hello world from process19 of 20 processes on slave1&lt;/P&gt;
&lt;P&gt;........&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Now I go toward intel compiler with oneapi. I installed the toolkit on the four nodes. I followed the install process given by Intel and I add the command in the .bashrc of all nodes :&lt;BR /&gt;source /opt/intel/oneapi/setvars.sh&lt;/P&gt;
&lt;P&gt;With the same program :&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;&lt;EM&gt;!compilation command :&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;! mpiifort MPI_test.f90&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;! launch command :&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;! mpirun -np 20 -hosts master,slave1 ./a.out&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;and I got :&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;forrtl: severe (174): SIGSEGV, segmentation fault occurred&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;Image PC Routine Line Source &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;libpthread-2.31.s 00007F50EDB343C0 Unknown Unknown Unknown&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;librxm-fi.so 00007F5021F1A856 Unknown Unknown Unknown&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;....&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;librxm-fi.so 00007F5021F1D9C7 Unknown Unknown Unknown&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;libmpi.so.12.0.0 00007F50EE196EAE Unknown Unknown Unknown&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;....&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;libmpi.so.12.0.0 00007F50EDFFCD1B MPI_Init Unknown Unknown&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;libmpifort.so.12. 00007F50EF53C816 mpi_init_f08_ Unknown Unknown&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;a.out 00000000004041EB Unknown Unknown Unknown&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;a.out 000000000040419D Unknown Unknown Unknown&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;libc-2.31.so 00007F50ED8050B3 __libc_start_main Unknown Unknown&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;a.out 00000000004040BE Unknown Unknown Unknown&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;=================================================================&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;= RANK 17 PID 18095 RUNNING AT slave1&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;= KILLED BY SIGNAL: 9 (Killed)&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;=================================================================&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;Note that the program is OK when it remains on the same node.&lt;BR /&gt;I was wondering if the&amp;nbsp;&lt;EM&gt;source /opt/intel/oneapi/setvars.sh&amp;nbsp;&lt;/EM&gt;was launched while you use MPI ? Since ssh is used without bash. If so, how can we solve that ?&lt;BR /&gt;Maybe the issue is something else.&lt;BR /&gt;&lt;BR /&gt;Some help would be appreciated.&lt;BR /&gt;Thank you,&lt;BR /&gt;Alexandre&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 19 Dec 2022 16:30:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/running-on-MPI-cluster/m-p/1439584#M10155</guid>
      <dc:creator>Alexandre_fr</dc:creator>
      <dc:date>2022-12-19T16:30:13Z</dc:date>
    </item>
    <item>
      <title>Re: running on MPI cluster</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/running-on-MPI-cluster/m-p/1439587#M10156</link>
      <description>&lt;P&gt;Additional tests that lost me more than before...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;This following one is OK.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;mpiexec -n 2 -ppn 1 -hosts master,slave1 ./a.out&lt;BR /&gt;Hello world from process 0 of 2 processes on master&lt;BR /&gt;Hello world from process 1 of 2 processes on slave1&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Those following ones are not OK.&lt;/STRONG&gt;&lt;BR /&gt;&lt;EM&gt;mpirun -n 4 -ppn 1 -hosts master,slave1,slave2,slave3 ./a.out &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;Hello world from process 0 of 4 processes on master &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;Hello world from process 2 of 4 processes on slave2 &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;Hello world from process 3 of 4 processes on slave3 &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;Hello world from process 1 of 4 processes on slave1 &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;Abort(810114063) on node 2 (rank 2 in comm 0): Fatal error in PMPI_Finalize: Other MPI error, error stack:&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;PMPI_Finalize(220)...............: MPI_Finalize failed&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;PMPI_Finalize(164)...............: &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;MPID_Finalize(1716)..............: &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;MPIDI_OFI_mpi_finalize_hook(1760): &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;MPIR_Reduce_intra_binomial(149)..: &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;MPIC_Send(129)...................: &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;MPID_Send(888)...................: &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;MPIDI_send_unsafe(203)...........: &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;MPIDI_OFI_send_normal(252).......: &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;MPIDI_OFI_send_handler_vni(496)..: OFI tagged send failed (ofi_impl.h:496:MPIDI_OFI_send_handler_vni:Network is unreachable)&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;mpiexec -n 2 -ppn 1 -hosts master,slave2 ./a.out&lt;BR /&gt;&lt;EM&gt;Hello world from process 0 of 2 processes on master &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;Hello world from process 1 of 2 processes on slave2 &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;Abort(810114063) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Finalize: Other MPI error, error stack:&lt;/EM&gt;&lt;BR /&gt;&lt;I&gt;......&lt;/I&gt;&lt;BR /&gt;&lt;EM&gt;MPIDI_OFI_send_handler_vni(496)..: OFI tagged send failed (ofi_impl.h:496:MPIDI_OFI_send_handler_vni:Network is unreachable)&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;mpiexec -n 2 -ppn 1 -hosts master,slave3 ./a.out&lt;BR /&gt;Hello world from process 1 of 2 processes on slave3 &lt;BR /&gt;Abort(810114063) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Finalize: Other MPI error, error stack:&lt;BR /&gt;...........&lt;BR /&gt;MPIDI_OFI_send_handler_vni(496)..: OFI tagged send failed (ofi_impl.h:496:MPIDI_OFI_send_handler_vni:Network is unreachable)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This invalidates the possible setvars.sh problem, since it can reach and finish some test.&lt;BR /&gt;It say the Network is unreachable.&amp;nbsp; But the ssh connections was done with ssh-keygen to make a link :&lt;BR /&gt;master &amp;lt;---&amp;gt; slave1&lt;BR /&gt;master &amp;lt;---&amp;gt; slave2&lt;/P&gt;
&lt;P&gt;master &amp;lt;---&amp;gt; slave3&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Ok, now I am lost with no idea about what is happening...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 18 Dec 2022 21:33:49 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/running-on-MPI-cluster/m-p/1439587#M10156</guid>
      <dc:creator>Alexandre_fr</dc:creator>
      <dc:date>2022-12-18T21:33:49Z</dc:date>
    </item>
    <item>
      <title>Re: running on MPI cluster</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/running-on-MPI-cluster/m-p/1439669#M10157</link>
      <description>&lt;P&gt;I am not all that familiar with the mechanics of MPI, but if I understand it correctly, from conversations with colleagues, MPI "environments" are specific to the compiler that you used. Could it be that the mpirun/mpiexec commands or the MPI background process are the GCC versions and not the Intel versions? That might explain the dramatic failure you observe.&lt;/P&gt;</description>
      <pubDate>Mon, 19 Dec 2022 08:06:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/running-on-MPI-cluster/m-p/1439669#M10157</guid>
      <dc:creator>Arjen_Markus</dc:creator>
      <dc:date>2022-12-19T08:06:28Z</dc:date>
    </item>
    <item>
      <title>Re: running on MPI cluster</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/running-on-MPI-cluster/m-p/1439676#M10158</link>
      <description>&lt;P&gt;Thank you for your reply.&lt;BR /&gt;According to tests, the wrapper mpirun/mpiexec does not affect the results.&lt;BR /&gt;If the environment, was problematic, how could I get a successful intel test but only with 2 processes on master/slave? On the top, the test with one process on each node is ok, until finalisation.&lt;/P&gt;
&lt;P&gt;If I go in verbose mode with the test I see first that the environment looks ok (?).&lt;/P&gt;
&lt;P&gt;mpiifort MPI_test.f90&lt;BR /&gt;mpirun -np 20 -v -hosts master,slave1 ./a.out&lt;BR /&gt;&lt;EM&gt;[mpiexec@master] Launch arguments: /opt/intel/oneapi/mpi/2021.8.0//bin//hydra_bstrap_proxy --upstream-host&amp;nbsp;&lt;STRONG&gt;master&lt;/STRONG&gt;&amp;nbsp;--upstream-port 33591 --pgid 0 --launcher ssh --launcher-number 0 --base-path /opt/intel/oneapi/mpi/2021.8.0//bin/ --tree-width 16 --tree-level 1 --time-left -1 --launch-type 2 --debug --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 7 /opt/intel/oneapi/mpi/2021.8.0//bin//hydra_pmi_proxy --usize -1 --auto-cleanup 1 --abort-signal 9&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;[mpiexec@master] Launch arguments: /usr/bin/ssh -q -x&amp;nbsp;&lt;STRONG&gt;slave1&lt;/STRONG&gt;&amp;nbsp;/opt/intel/oneapi/mpi/2021.8.0//bin//hydra_bstrap_proxy --upstream-host master --upstream-port 33591 --pgid 0 --launcher ssh --launcher-number 0 --base-path /opt/intel/oneapi/mpi/2021.8.0//bin/ --tree-width 16 --tree-level 1 --time-left -1 --launch-type 2 --debug --proxy-id 1 --node-id 1 --subtree-size 1 /opt/intel/oneapi/mpi/2021.8.0//bin//hydra_pmi_proxy --usize -1 --auto-cleanup 1 --abort-signal 9&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;Moreover, I was wondering if the MPI_finalize() contains a Barrier in the mpich (I made a mistake on the first message: I made a set-up gcc-mpich) but not in the intel-openmpi. Adding a barrier after the prompt and before the finalization does not help but reveals a failure in the only working test :&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;mpiifort MPI_test.f90&lt;BR /&gt;mpiexec -n 2 -ppn 1 -hosts master,slave1 ./a.out&lt;BR /&gt;&lt;EM&gt;Hello world from process 1 of 2 processes on slave1 &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;Hello world from process 0 of 2 processes on master&lt;/EM&gt;&lt;BR /&gt;... And the program is waiting forever.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;I put here the verbose of the test that crash directly (with node slave2) :&lt;/P&gt;
&lt;P&gt;mpiexec -n 2 -ppn 1 -v -hosts master,slave2 ./a.out&lt;/P&gt;
&lt;P&gt;... environment stuff .....&lt;BR /&gt;[proxy:0:1@slave2] pmi cmd from fd 4: cmd=init pmi_version=1 pmi_subversion=1&lt;BR /&gt;[proxy:0:1@slave2] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0&lt;BR /&gt;[proxy:0:1@slave2] pmi cmd from fd 4: cmd=get_maxes&lt;BR /&gt;[proxy:0:1@slave2] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=4096&lt;BR /&gt;[proxy:0:1@slave2] pmi cmd from fd 4: cmd=get_appnum&lt;BR /&gt;[proxy:0:1@slave2] PMI response: cmd=appnum appnum=0&lt;BR /&gt;[proxy:0:1@slave2] pmi cmd from fd 4: cmd=get_my_kvsname&lt;BR /&gt;[proxy:0:1@slave2] PMI response: cmd=my_kvsname kvsname=kvs_16360_0&lt;BR /&gt;[proxy:0:1@slave2] pmi cmd from fd 4: cmd=get kvsname=kvs_16360_0 key=PMI_process_mapping&lt;BR /&gt;[proxy:0:1@slave2] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,2,1))&lt;BR /&gt;[proxy:0:0@master] pmi cmd from fd 6: cmd=init pmi_version=1 pmi_subversion=1&lt;BR /&gt;[proxy:0:0@master] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0&lt;BR /&gt;[proxy:0:0@master] pmi cmd from fd 6: cmd=get_maxes&lt;BR /&gt;[proxy:0:0@master] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=4096&lt;BR /&gt;[proxy:0:0@master] pmi cmd from fd 6: cmd=get_appnum&lt;BR /&gt;[proxy:0:0@master] PMI response: cmd=appnum appnum=0&lt;BR /&gt;[proxy:0:0@master] pmi cmd from fd 6: cmd=get_my_kvsname&lt;BR /&gt;[proxy:0:0@master] PMI response: cmd=my_kvsname kvsname=kvs_16360_0&lt;BR /&gt;[proxy:0:0@master] pmi cmd from fd 6: cmd=get kvsname=kvs_16360_0 key=PMI_process_mapping&lt;BR /&gt;[proxy:0:0@master] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,2,1))&lt;BR /&gt;[proxy:0:1@slave2] pmi cmd from fd 4: cmd=barrier_in&lt;BR /&gt;[proxy:0:0@master] pmi cmd from fd 6: cmd=barrier_in&lt;BR /&gt;[proxy:0:0@master] PMI response: cmd=barrier_out&lt;BR /&gt;[proxy:0:1@slave2] PMI response: cmd=barrier_out&lt;BR /&gt;[proxy:0:1@slave2] pmi cmd from fd 4: cmd=put kvsname=kvs_16360_0 key=bc-1 value=mpi#0200A7DB0A2A003E0000000000000000$&lt;BR /&gt;[proxy:0:1@slave2] PMI response: cmd=put_result rc=0 msg=&lt;STRONG&gt;success&lt;/STRONG&gt;&lt;BR /&gt;[proxy:0:1@slave2] pmi cmd from fd 4: cmd=barrier_in&lt;BR /&gt;[proxy:0:0@master] pmi cmd from fd 6: cmd=put kvsname=kvs_16360_0 key=bc-0 value=mpi#02008F8B0AB80E970000000000000000$&lt;BR /&gt;[proxy:0:0@master] PMI response: cmd=put_result rc=0 msg=&lt;STRONG&gt;success&lt;/STRONG&gt;&lt;BR /&gt;[proxy:0:0@master] pmi cmd from fd 6: cmd=barrier_in&lt;BR /&gt;[proxy:0:0@master] PMI response: cmd=barrier_out&lt;BR /&gt;[proxy:0:0@master] pmi cmd from fd 6: cmd=get kvsname=kvs_16360_0 key=bc-0&lt;BR /&gt;[proxy:0:0@master] PMI response: cmd=get_result rc=0 msg=success value=mpi#02008F8B0AB80E970000000000000000$&lt;BR /&gt;[proxy:0:0@master] pmi cmd from fd 6: cmd=get kvsname=kvs_16360_0 key=bc-1&lt;BR /&gt;[proxy:0:0@master] PMI response: cmd=get_result rc=0 msg=success value=mpi#0200A7DB0A2A003E0000000000000000$&lt;BR /&gt;[proxy:0:1@slave2] PMI response: cmd=barrier_out&lt;BR /&gt;[proxy:0:1@slave2] pmi cmd from fd 4: cmd=get kvsname=kvs_16360_0 key=bc-0&lt;BR /&gt;[proxy:0:1@slave2] PMI response: cmd=get_result rc=0 msg=success value=mpi#02008F8B0AB80E970000000000000000$&lt;BR /&gt;[proxy:0:1@slave2] pmi cmd from fd 4: cmd=get kvsname=kvs_16360_0 key=bc-1&lt;BR /&gt;[proxy:0:1@slave2] PMI response: cmd=get_result rc=0 msg=success value=mpi#0200A7DB0A2A003E0000000000000000$&lt;BR /&gt;Hello world from process 1 of 2 processes on slave2 &lt;BR /&gt;Hello world from process 0 of 2 processes on master &lt;BR /&gt;forrtl: severe (174): SIGSEGV, segmentation fault occurred&lt;BR /&gt;... memory location stuff ....&lt;/P&gt;
&lt;P&gt;It seems the communication looks ok but not for the end of the run.&lt;BR /&gt;What could it be?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 19 Dec 2022 08:41:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/running-on-MPI-cluster/m-p/1439676#M10158</guid>
      <dc:creator>Alexandre_fr</dc:creator>
      <dc:date>2022-12-19T08:41:31Z</dc:date>
    </item>
    <item>
      <title>Re: running on MPI cluster</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/running-on-MPI-cluster/m-p/1439764#M10159</link>
      <description>&lt;P&gt;Alexandre,&lt;/P&gt;
&lt;P&gt;There was a post on this forum with a similar issue (which I am unable to locate). The issue involved an incompatibility amongst fabric selection(s). I think Ron Green provided the answer.&lt;/P&gt;
&lt;P&gt;You might want to experiment with fabric selections starting with the older generation methods.&lt;/P&gt;
&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Mon, 19 Dec 2022 14:49:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/running-on-MPI-cluster/m-p/1439764#M10159</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2022-12-19T14:49:08Z</dc:date>
    </item>
    <item>
      <title>Re: running on MPI cluster</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/running-on-MPI-cluster/m-p/1439775#M10160</link>
      <description>&lt;P&gt;Moved this MPI question over to the oneAPI HPC Toolkit Forum. That's the best source for MPI information.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 19 Dec 2022 16:59:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/running-on-MPI-cluster/m-p/1439775#M10160</guid>
      <dc:creator>Barbara_P_Intel</dc:creator>
      <dc:date>2022-12-19T16:59:27Z</dc:date>
    </item>
    <item>
      <title>Re: running on MPI cluster</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/running-on-MPI-cluster/m-p/1439800#M10161</link>
      <description>&lt;P&gt;The good point is that I am learning a lot. The bad point is that it doesn't seem to change anything.&lt;BR /&gt;&lt;BR /&gt;First, the debug option 3 and higher do not work: the code is freezing.&lt;BR /&gt;Level 2 provides the fabric information.&lt;BR /&gt;&lt;BR /&gt;mpiexec -n 2 -ppn 1 -hosts master,slave3 &lt;STRONG&gt;-env I_MPI_DEBUG=2&lt;/STRONG&gt; ./a.out&lt;BR /&gt;&lt;EM&gt;[0] MPI startup(): Intel(R) MPI Library, Version 2021.8 Build 20221129 (id: 339ec755a1)&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;[0] MPI startup(): Copyright (C) 2003-2022 Intel Corporation. All rights reserved.&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;[0] MPI startup(): library kind: release&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;[0] MPI startup(): libfabric version: 1.13.2rc1-impi&lt;/EM&gt;&lt;BR /&gt;&lt;STRONG&gt;&lt;EM&gt;[0] MPI startup(): libfabric provider: tcp;ofi_rxm&lt;/EM&gt;&lt;/STRONG&gt;&lt;BR /&gt;&lt;EM&gt;[0] MPI startup(): File "/opt/intel/oneapi/mpi/2021.8.0/etc/tuning_skx_shm-ofi_tcp-ofi-rxm_1.dat" not found&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;[0] MPI startup(): Load tuning file: "/opt/intel/oneapi/mpi/2021.8.0/etc/tuning_skx_shm-ofi_tcp-ofi-rxm.dat"&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;Hello world from process 0 of 2 processes on master &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;Hello world from process 1 of 2 processes on slave3 &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;forrtl: severe (174): SIGSEGV, segmentation fault occurred&lt;/EM&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;But I tried to change the fabric environment, but it doesn't help at all (I tried available options, according to intel: shm, shm:ofi, ofi)&lt;/P&gt;
&lt;P&gt;mpiexec -n 2 -ppn 1 -hosts master,slave3 -env I_MPI_DEBUG=2 -&lt;STRONG&gt;env I_MPI_FABRICS=shm:ofi&lt;/STRONG&gt;&amp;nbsp;./a.out&lt;/P&gt;
&lt;P&gt;[0] MPI startup(): Intel(R) MPI Library, Version 2021.8 Build 20221129 (id: 339ec755a1)&lt;BR /&gt;[0] MPI startup(): Copyright (C) 2003-2022 Intel Corporation. All rights reserved.&lt;BR /&gt;[0] MPI startup(): library kind: release&lt;BR /&gt;[0] MPI startup(): libfabric version: 1.13.2rc1-impi&lt;BR /&gt;&lt;STRONG&gt;[0] MPI startup(): libfabric provider: tcp;ofi_rxm&lt;/STRONG&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;[0] MPI startup(): File "/opt/intel/oneapi/mpi/2021.8.0/etc/tuning_skx_shm-ofi_tcp-ofi-rxm_1.dat" not found&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;[0] MPI startup(): Load tuning file: "/opt/intel/oneapi/mpi/2021.8.0/etc/tuning_skx_shm-ofi_tcp-ofi-rxm.dat"&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Maybe the red lines are a problem?&lt;/P&gt;</description>
      <pubDate>Mon, 19 Dec 2022 16:56:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/running-on-MPI-cluster/m-p/1439800#M10161</guid>
      <dc:creator>Alexandre_fr</dc:creator>
      <dc:date>2022-12-19T16:56:58Z</dc:date>
    </item>
    <item>
      <title>Re: running on MPI cluster</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/running-on-MPI-cluster/m-p/1439830#M10162</link>
      <description>&lt;P&gt;Is&amp;nbsp;&lt;EM class="sub_section_element_selectors"&gt;"/opt/intel/oneapi/mpi/2021.8.0/etc/tuning_skx_shm-ofi_tcp-ofi-rxm_1.dat" located on all systems in your cluster?&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;EM class="sub_section_element_selectors"&gt;Jim Dempsey&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 19 Dec 2022 18:40:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/running-on-MPI-cluster/m-p/1439830#M10162</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2022-12-19T18:40:10Z</dc:date>
    </item>
    <item>
      <title>Re: running on MPI cluster</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/running-on-MPI-cluster/m-p/1439930#M10163</link>
      <description>&lt;P&gt;This file &lt;EM class="sub_section_element_selectors"&gt;tuning_skx_shm-ofi_tcp-ofi-rxm_1.dat&amp;nbsp;&lt;/EM&gt;is absent on all system.&lt;BR /&gt;But its friend&amp;nbsp;&lt;SPAN&gt;tuning_skx_shm-ofi_tcp-ofi-rxm.dat is present on all.&lt;BR /&gt;The api was installed on the 4 nodes. The master'api from the online installer, the 3 slaves from the off line one. Maybe I should install the very same toolkit. I will try. But there is worst...&lt;BR /&gt;&lt;BR /&gt;During tests, I perform this run :&lt;BR /&gt;mpifort MPI_test.f90&lt;BR /&gt;mpirun -n 20 -hosts master,slave2 ./a.out&lt;BR /&gt;&lt;BR /&gt;And the installation I though good (gfortran + mpich) &lt;STRONG&gt;have a trouble too&lt;/STRONG&gt;. It appears when I put the call of the barrier&lt;BR /&gt;(call MPI_BARRIER(MPI_COMM_WORLD)) after the hello world message.&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;the hello world looks ok&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;Hello world from process 1 of 20 processes on master&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;Hello world from process 4 of 20 processes on master&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;........&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;Hello world from process18 of 20 processes on slave2&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;Hello world from process16 of 20 processes on slave2&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;But the barrier fails...&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;Fatal error in PMPI_Barrier: Unknown error class, error stack:&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;PMPI_Barrier(289).....................: MPI_Barrier(comm=MPI_COMM_WORLD) failed&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;PMPI_Barrier(275).....................: &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;MPIR_Barrier_impl(175)................: &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;MPIR_Barrier_intra_auto(110)..........: &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;MPIR_Barrier_intra_smp(43)............: &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;MPIR_Barrier_impl(175)................: &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;MPIR_Barrier_intra_auto(110)..........: &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;MPIR_Barrier_intra_dissemination(49)..: &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;MPIDU_Complete_posted_with_error(1137): Process failed&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;MPIR_Barrier_intra_smp(59)............: &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;MPIR_Bcast_impl(310)..................: &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;MPIR_Bcast_intra_auto(223)............: &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;MPIR_Bcast_intra_binomial(182)........: Failure during collective&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;Fatal error in PMPI_Barrier: Unknown error class, error stack:&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;.....&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;With the barrier call, and within the same node, the program is ok.&lt;BR /&gt;Maybe the intel and openmpi are more sensitive than gfortran and mpich, which is why the problem rise with the intel configuration. FYI, the firewalls were put down. I am not owning the router that connects the four nodes. I may suspect the router now. Is it possible?&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 20 Dec 2022 00:48:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/running-on-MPI-cluster/m-p/1439930#M10163</guid>
      <dc:creator>Alexandre_fr</dc:creator>
      <dc:date>2022-12-20T00:48:18Z</dc:date>
    </item>
  </channel>
</rss>

