<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Unexpected DAPL event 0x4003 in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/Unexpected-DAPL-event-0x4003/m-p/1125259#M5541</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;

&lt;P&gt;I try to start an MPI job on with the following settings.&lt;/P&gt;

&lt;P&gt;I have two nodes, workstation1 and workstation2.&lt;BR /&gt;
	I can ssh from workstation1 (10.0.0.1) to workstation2 (10.0.0.') without password. I've already arranged rsa keys.&lt;BR /&gt;
	I can ssh from both workstation1 and workstation2 to themselves without password.&lt;BR /&gt;
	I can ping from 10.0.0.1 to 10.0.0.2 and from 10.0.0.2 to 10.0.0.1&lt;/P&gt;

&lt;P&gt;workstation 1 &amp;amp; workstation2 are connected via Mellanox inifiniband.&lt;BR /&gt;
	I'm running Intel(R) MPI Library, Version 2017 Update 2&amp;nbsp; Build 20170125&lt;BR /&gt;
	I've installed MLNX_OFED_LINUX-4.1-1.0.2.0-ubuntu16.04-x86_64&lt;/P&gt;

&lt;P&gt;workstation1 /etc/hosts :&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;127.0.0.1    localhost
10.0.0.1    workstation1

# The following lines are desirable for IPv6 capable hosts
#::1     ip6-localhost ip6-loopback
#fe00::0 ip6-localnet
#ff00::0 ip6-mcastprefix
#ff02::1 ip6-allnodes
#ff02::2 ip6-allrouters

# mpi nodes
10.0.0.2 workstation2&lt;/PRE&gt;

&lt;P&gt;-------------------------------------------------------------&lt;BR /&gt;
	workstation2 /etc/hosts :&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;127.0.0.1    localhost
10.0.0.2    workstation2

# The following lines are desirable for IPv6 capable hosts
#::1     ip6-localhost ip6-loopback
#fe00::0 ip6-localnet
#ff00::0 ip6-mcastprefix
#ff02::1 ip6-allnodes
#ff02::2 ip6-allrouters

#mpi nodes
10.0.0.1 workstation1&lt;/PRE&gt;

&lt;P&gt;--------------------------------------------------------------&lt;BR /&gt;
	Here's my application start command, (simplified app names and params)&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;#!/bin/bash
export PATH=$PATH:$PWD:/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$I_MPI_ROOT/intel64/lib:../program1/bin:../program2/bin
export I_MPI_FABRICS=dapl:dapl
export I_MPI_DEBUG=6
export I_MPI_DAPL_PROVIDER_LIST=ofa-v2-mlx4_0-1

# Due to the bug in IntelMPI, -genv I_MPI_ADJUST_BCAST "9" flags has been added.
# Mode detailed information is available : &lt;A href="https://software.intel.com/en-us/articles/intel-mpi-library-2017-known-issue-mpi-bcast-hang-on-large-user-defined-datatypes" target="_blank"&gt;https://software.intel.com/en-us/articles/intel-mpi-library-2017-known-issue-mpi-bcast-hang-on-large-user-defined-datatypes&lt;/A&gt;

mpirun -l -genv I_MPI_ADJUST_BCAST "9" -genv I_MPI_PIN_DOMAIN=omp
: -n 1 -host 10.0.0.1 ../program1/bin/program1 master stitching stitching \
: -n 1 -host 10.0.0.2 ../program1/bin/program1 slave dissemination \
: -n 1 -host 10.0.0.1 ../program1/bin/program2 param1 param2

&lt;/PRE&gt;

&lt;P&gt;-------------------------------------------&lt;/P&gt;

&lt;P&gt;I can start my application in dual node with export I_MPI_FABRICS=tcp:tcp, but when I start with dapl:dapl it gives the following error :&lt;/P&gt;

&lt;P&gt;OUTPUT :&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;0] [0] MPI startup(): Intel(R) MPI Library, Version 2017 Update 2  Build 20170125 (id: 16752)
[0] [0] MPI startup(): Copyright (C) 2003-2017 Intel Corporation.  All rights reserved.
[0] [0] MPI startup(): Multi-threaded optimized library
[0] [0] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1
[1] [1] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1
[2] [2] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1
[0] [0] MPI startup(): DAPL provider ofa-v2-mlx4_0-1
[0] [0] MPI startup(): dapl data transfer mode
[1] [1] MPI startup(): DAPL provider ofa-v2-mlx4_0-1
[2] [2] MPI startup(): DAPL provider ofa-v2-mlx4_0-1
[1] [1] MPI startup(): dapl data transfer mode
[2] [2] MPI startup(): dapl data transfer mode
[0] [0:10.0.0.1] unexpected DAPL event 0x4003
[0] Fatal error in PMPI_Init_thread: Internal MPI error!, error stack:
[0] MPIR_Init_thread(805): fail failed
[0] MPID_Init(1831)......: channel initialization failed
[0] MPIDI_CH3_Init(147)..: fail failed
[0] (unknown)(): Internal MPI error!
[1] [1:10.0.0.2] unexpected DAPL event 0x4003
[1] Fatal error in PMPI_Init_thread: Internal MPI error!, error stack:
[1] MPIR_Init_thread(805): fail failed
[1] MPID_Init(1831)......: channel initialization failed
[1] MPIDI_CH3_Init(147)..: fail failed
[1] (unknown)(): Internal MPI error!

&lt;/PRE&gt;

&lt;P&gt;Do you have any idea what could be the cause? By the way, on single node with dapl, I can start my application on both computers separately (meaning -host 10.0.0.1 for all application for workstation1, never attaching 10.0.0.2 related apps).&lt;/P&gt;</description>
    <pubDate>Fri, 15 Sep 2017 06:55:55 GMT</pubDate>
    <dc:creator>sayginify</dc:creator>
    <dc:date>2017-09-15T06:55:55Z</dc:date>
    <item>
      <title>Unexpected DAPL event 0x4003</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Unexpected-DAPL-event-0x4003/m-p/1125259#M5541</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;

&lt;P&gt;I try to start an MPI job on with the following settings.&lt;/P&gt;

&lt;P&gt;I have two nodes, workstation1 and workstation2.&lt;BR /&gt;
	I can ssh from workstation1 (10.0.0.1) to workstation2 (10.0.0.') without password. I've already arranged rsa keys.&lt;BR /&gt;
	I can ssh from both workstation1 and workstation2 to themselves without password.&lt;BR /&gt;
	I can ping from 10.0.0.1 to 10.0.0.2 and from 10.0.0.2 to 10.0.0.1&lt;/P&gt;

&lt;P&gt;workstation 1 &amp;amp; workstation2 are connected via Mellanox inifiniband.&lt;BR /&gt;
	I'm running Intel(R) MPI Library, Version 2017 Update 2&amp;nbsp; Build 20170125&lt;BR /&gt;
	I've installed MLNX_OFED_LINUX-4.1-1.0.2.0-ubuntu16.04-x86_64&lt;/P&gt;

&lt;P&gt;workstation1 /etc/hosts :&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;127.0.0.1    localhost
10.0.0.1    workstation1

# The following lines are desirable for IPv6 capable hosts
#::1     ip6-localhost ip6-loopback
#fe00::0 ip6-localnet
#ff00::0 ip6-mcastprefix
#ff02::1 ip6-allnodes
#ff02::2 ip6-allrouters

# mpi nodes
10.0.0.2 workstation2&lt;/PRE&gt;

&lt;P&gt;-------------------------------------------------------------&lt;BR /&gt;
	workstation2 /etc/hosts :&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;127.0.0.1    localhost
10.0.0.2    workstation2

# The following lines are desirable for IPv6 capable hosts
#::1     ip6-localhost ip6-loopback
#fe00::0 ip6-localnet
#ff00::0 ip6-mcastprefix
#ff02::1 ip6-allnodes
#ff02::2 ip6-allrouters

#mpi nodes
10.0.0.1 workstation1&lt;/PRE&gt;

&lt;P&gt;--------------------------------------------------------------&lt;BR /&gt;
	Here's my application start command, (simplified app names and params)&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;#!/bin/bash
export PATH=$PATH:$PWD:/opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/intel64/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$I_MPI_ROOT/intel64/lib:../program1/bin:../program2/bin
export I_MPI_FABRICS=dapl:dapl
export I_MPI_DEBUG=6
export I_MPI_DAPL_PROVIDER_LIST=ofa-v2-mlx4_0-1

# Due to the bug in IntelMPI, -genv I_MPI_ADJUST_BCAST "9" flags has been added.
# Mode detailed information is available : &lt;A href="https://software.intel.com/en-us/articles/intel-mpi-library-2017-known-issue-mpi-bcast-hang-on-large-user-defined-datatypes" target="_blank"&gt;https://software.intel.com/en-us/articles/intel-mpi-library-2017-known-issue-mpi-bcast-hang-on-large-user-defined-datatypes&lt;/A&gt;

mpirun -l -genv I_MPI_ADJUST_BCAST "9" -genv I_MPI_PIN_DOMAIN=omp
: -n 1 -host 10.0.0.1 ../program1/bin/program1 master stitching stitching \
: -n 1 -host 10.0.0.2 ../program1/bin/program1 slave dissemination \
: -n 1 -host 10.0.0.1 ../program1/bin/program2 param1 param2

&lt;/PRE&gt;

&lt;P&gt;-------------------------------------------&lt;/P&gt;

&lt;P&gt;I can start my application in dual node with export I_MPI_FABRICS=tcp:tcp, but when I start with dapl:dapl it gives the following error :&lt;/P&gt;

&lt;P&gt;OUTPUT :&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;0] [0] MPI startup(): Intel(R) MPI Library, Version 2017 Update 2  Build 20170125 (id: 16752)
[0] [0] MPI startup(): Copyright (C) 2003-2017 Intel Corporation.  All rights reserved.
[0] [0] MPI startup(): Multi-threaded optimized library
[0] [0] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1
[1] [1] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1
[2] [2] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1
[0] [0] MPI startup(): DAPL provider ofa-v2-mlx4_0-1
[0] [0] MPI startup(): dapl data transfer mode
[1] [1] MPI startup(): DAPL provider ofa-v2-mlx4_0-1
[2] [2] MPI startup(): DAPL provider ofa-v2-mlx4_0-1
[1] [1] MPI startup(): dapl data transfer mode
[2] [2] MPI startup(): dapl data transfer mode
[0] [0:10.0.0.1] unexpected DAPL event 0x4003
[0] Fatal error in PMPI_Init_thread: Internal MPI error!, error stack:
[0] MPIR_Init_thread(805): fail failed
[0] MPID_Init(1831)......: channel initialization failed
[0] MPIDI_CH3_Init(147)..: fail failed
[0] (unknown)(): Internal MPI error!
[1] [1:10.0.0.2] unexpected DAPL event 0x4003
[1] Fatal error in PMPI_Init_thread: Internal MPI error!, error stack:
[1] MPIR_Init_thread(805): fail failed
[1] MPID_Init(1831)......: channel initialization failed
[1] MPIDI_CH3_Init(147)..: fail failed
[1] (unknown)(): Internal MPI error!

&lt;/PRE&gt;

&lt;P&gt;Do you have any idea what could be the cause? By the way, on single node with dapl, I can start my application on both computers separately (meaning -host 10.0.0.1 for all application for workstation1, never attaching 10.0.0.2 related apps).&lt;/P&gt;</description>
      <pubDate>Fri, 15 Sep 2017 06:55:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Unexpected-DAPL-event-0x4003/m-p/1125259#M5541</guid>
      <dc:creator>sayginify</dc:creator>
      <dc:date>2017-09-15T06:55:55Z</dc:date>
    </item>
    <item>
      <title>Hi,</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Unexpected-DAPL-event-0x4003/m-p/1125260#M5542</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;This could be an internal MPI library issue, or something completely different. I'd like to see the output for a few tests to see if I can help you isolate the issue:&lt;/P&gt;

&lt;P&gt;1) Run a simple "mpirun -n 1 -host 10.0.0.1 hostname : -n 1 -host 10.0.0.2 hostname"&lt;/P&gt;

&lt;P&gt;2) Build the "test.c"&amp;nbsp;example provided with Intel MPI (in the installation directory under the test directory) and run that:&lt;/P&gt;

&lt;P&gt;$ mpicc test.c -o impi_test&lt;/P&gt;

&lt;P&gt;$ mpirun -n 1 -host 10.0.0.1 ./impi_test : -n 1 -host 10.0.0.2 ./impi_test&lt;/P&gt;

&lt;P&gt;This will help me determine if this is a startup issue as it looks like or more related to the mpmd setup you seem to be running.&lt;/P&gt;

&lt;P&gt;Also, is your system configured as IPV4 or IPV6?&lt;/P&gt;

&lt;P&gt;Regards,&lt;BR /&gt;
	Carlos&lt;/P&gt;</description>
      <pubDate>Thu, 28 Sep 2017 16:28:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Unexpected-DAPL-event-0x4003/m-p/1125260#M5542</guid>
      <dc:creator>Carlos_R_Intel</dc:creator>
      <dc:date>2017-09-28T16:28:34Z</dc:date>
    </item>
  </channel>
</rss>

