Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
Announcements
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.
2020 Discussions

Intel MPI unable to use 'ofa' fabric with Mellanox OFED on ConnectX-4 Lx EN ethernet cards

Jacob_H_
Beginner
621 Views

I have a system where I am unable to get Intel MPI to use the 'ofa' fabric with Mellanox OFED over ConnectX-4 Lx EN ethernet cards and have exhausted all means I know of to get it to work. I'd appreciate any input to help me to get this working.

Relevant info:

  • Operating System is CentOS 7.3
  • Intel(R) MPI Library for Linux* OS, Version 2017 Update 3 Build 20170405 (id: 17193)
  • NIC is a single Mellanox ConnectX-4 Lx EN ethernet-only card (RoCE v1 and v2 supported) with 2x25Gb ports
    •  lspci | grep Mell
      05:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
      05:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
    •  ibv_devinfo
      hca_id:	mlx5_1
      	transport:			InfiniBand (0)
      	fw_ver:				14.18.2000
      	node_guid:			248a:0703:008d:6237
      	sys_image_guid:			248a:0703:008d:6236
      	vendor_id:			0x02c9
      	vendor_part_id:			4117
      	hw_ver:				0x0
      	board_id:			MT_2420110034
      	phys_port_cnt:			1
      	Device ports:
      		port:	1
      			state:			PORT_ACTIVE (4)
      			max_mtu:		4096 (5)
      			active_mtu:		1024 (3)
      			sm_lid:			0
      			port_lid:		0
      			port_lmc:		0x00
      			link_layer:		Ethernet
      
      hca_id:	mlx5_0
      	transport:			InfiniBand (0)
      	fw_ver:				14.18.2000
      	node_guid:			248a:0703:008d:6236
      	sys_image_guid:			248a:0703:008d:6236
      	vendor_id:			0x02c9
      	vendor_part_id:			4117
      	hw_ver:				0x0
      	board_id:			MT_2420110034
      	phys_port_cnt:			1
      	Device ports:
      		port:	1
      			state:			PORT_ACTIVE (4)
      			max_mtu:		4096 (5)
      			active_mtu:		1024 (3)
      			sm_lid:			0
      			port_lid:		0
      			port_lmc:		0x00
      			link_layer:		Ethernet
  • Mellanox OFED driver, output of 'ofed_info' is:
    • MLNX_OFED_LINUX-4.0-2.0.0.1 (OFED-4.0-2.0.0):
    • Verified RDMA is working via perftest utilities, e.g. ib_write_bw, ib_write_lat, etc.
  • Specified that ofa fabric should be used via the environment variable I_MPI_FABRICS=shm:ofa
    • Verified that using 'tcp' and 'dapl' fabrics works (had to manually add RoCE entries to the dat.conf to get dapl to work)
    • $ env | grep I_MPI
      I_MPI_FABRICS=shm:ofa
      I_MPI_HYDRA_DEBUG=on
      I_MPI_DEBUG=6
      I_MPI_ROOT=/opt/intel/psxe_runtime_2017.4.196/linux/mpi
  • Output when I try and run my code when 'ofa' is specified as the fabric for Intel MPI:
    • mpirun -n 1 ./server
      host: sp1.muskrat.local
      
      ==================================================================================================
      mpiexec options:
      ----------------
        Base path: /opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin/
        Launcher: ssh
        Debug level: 1
        Enable X: -1
      
        Global environment:
        -------------------
          I_MPI_PERHOST=allcores
          LD_LIBRARY_PATH=/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib:/opt/intel/psxe_runtime_2017.4.196/linux/daal/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/mkl/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/tbb/lib/intel64/gcc4.1:/opt/intel/psxe_runtime_2017.4.196/linux/ipp/lib/intel64:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib
          MKLROOT=/opt/intel/psxe_runtime_2017.4.196/linux/mkl
          MANPATH=/opt/intel/psxe_runtime_2017.4.196/linux/mpi/man:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/man:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/man:/usr/local/share/man:/usr/share/man:/opt/ibutils/share/man
          I_MPI_DEBUG_HYDRA=0
          XDG_SESSION_ID=216
          HOSTNAME=sp1.muskrat.local
          SELINUX_ROLE_REQUESTED=
          IPPROOT=/opt/intel/psxe_runtime_2017.4.196/linux/ipp
          SHELL=/bin/bash
          TERM=xterm-256color
          HISTSIZE=1000
          I_MPI_FABRICS=shm:ofa
          SSH_CLIENT=192.168.1.2 33072 22
          LIBRARY_PATH=/opt/intel/psxe_runtime_2017.4.196/linux/daal/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/mkl/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/tbb/lib/intel64/gcc4.1:/opt/intel/psxe_runtime_2017.4.196/linux/ipp/lib/intel64:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin
          SELINUX_USE_CURRENT_RANGE=
          SSH_TTY=/dev/pts/1
          MIC_LD_LIBRARY_PATH=/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin_mic:/opt/intel/psxe_runtime_2017.4.196/linux/mkl/lib/intel64_lin_mic:/opt/intel/psxe_runtime_2017.4.196/linux/tbb/lib/mic:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin_mic:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin_mic
          USER=jrhemst
          LS_COLORS=rs=0:di=38;5;27:ln=38;5;51:mh=44;38;5;15:pi=40;38;5;11:so=38;5;13:do=38;5;5:bd=48;5;232;38;5;11:cd=48;5;232;38;5;3:or=48;5;232;38;5;9:mi=05;48;5;232;38;5;15:su=48;5;196;38;5;15:sg=48;5;11;38;5;16:ca=48;5;196;38;5;226:tw=48;5;10;38;5;16:ow=48;5;10;38;5;21:st=48;5;21;38;5;15:ex=38;5;34:*.tar=38;5;9:*.tgz=38;5;9:*.arc=38;5;9:*.arj=38;5;9:*.taz=38;5;9:*.lha=38;5;9:*.lz4=38;5;9:*.lzh=38;5;9:*.lzma=38;5;9:*.tlz=38;5;9:*.txz=38;5;9:*.tzo=38;5;9:*.t7z=38;5;9:*.zip=38;5;9:*.z=38;5;9:*.Z=38;5;9:*.dz=38;5;9:*.gz=38;5;9:*.lrz=38;5;9:*.lz=38;5;9:*.lzo=38;5;9:*.xz=38;5;9:*.bz2=38;5;9:*.bz=38;5;9:*.tbz=38;5;9:*.tbz2=38;5;9:*.tz=38;5;9:*.deb=38;5;9:*.rpm=38;5;9:*.jar=38;5;9:*.war=38;5;9:*.ear=38;5;9:*.sar=38;5;9:*.rar=38;5;9:*.alz=38;5;9:*.ace=38;5;9:*.zoo=38;5;9:*.cpio=38;5;9:*.7z=38;5;9:*.rz=38;5;9:*.cab=38;5;9:*.jpg=38;5;13:*.jpeg=38;5;13:*.gif=38;5;13:*.bmp=38;5;13:*.pbm=38;5;13:*.pgm=38;5;13:*.ppm=38;5;13:*.tga=38;5;13:*.xbm=38;5;13:*.xpm=38;5;13:*.tif=38;5;13:*.tiff=38;5;13:*.png=38;5;13:*.svg=38;5;13:*.svgz=38;5;13:*.mng=38;5;13:*.pcx=38;5;13:*.mov=38;5;13:*.mpg=38;5;13:*.mpeg=38;5;13:*.m2v=38;5;13:*.mkv=38;5;13:*.webm=38;5;13:*.ogm=38;5;13:*.mp4=38;5;13:*.m4v=38;5;13:*.mp4v=38;5;13:*.vob=38;5;13:*.qt=38;5;13:*.nuv=38;5;13:*.wmv=38;5;13:*.asf=38;5;13:*.rm=38;5;13:*.rmvb=38;5;13:*.flc=38;5;13:*.avi=38;5;13:*.fli=38;5;13:*.flv=38;5;13:*.gl=38;5;13:*.dl=38;5;13:*.xcf=38;5;13:*.xwd=38;5;13:*.yuv=38;5;13:*.cgm=38;5;13:*.emf=38;5;13:*.axv=38;5;13:*.anx=38;5;13:*.ogv=38;5;13:*.ogx=38;5;13:*.aac=38;5;45:*.au=38;5;45:*.flac=38;5;45:*.mid=38;5;45:*.midi=38;5;45:*.mka=38;5;45:*.mp3=38;5;45:*.mpc=38;5;45:*.ogg=38;5;45:*.ra=38;5;45:*.wav=38;5;45:*.axa=38;5;45:*.oga=38;5;45:*.spx=38;5;45:*.xspf=38;5;45:
          I_MPI_MPIRUN=mpirun
          MIC_LIBRARY_PATH=/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin_mic:/opt/intel/psxe_runtime_2017.4.196/linux/mkl/lib/intel64_lin_mic:/opt/intel/psxe_runtime_2017.4.196/linux/tbb/lib/mic
          CPATH=/opt/intel/psxe_runtime_2017.4.196/linux/daal/include:/opt/intel/psxe_runtime_2017.4.196/linux/mkl/include:/opt/intel/psxe_runtime_2017.4.196/linux/tbb/include:/opt/intel/psxe_runtime_2017.4.196/linux/ipp/include:
          NLSPATH=/opt/intel/psxe_runtime_2017.4.196/linux/mkl/lib/intel64_lin/locale/%l_%t/%N:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin/locale/%l_%t/%N:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin/locale/%l_%t/%N
          PATH=/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin:/opt/intel/psxe_runtime_2017.4.196/linux/bin:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin:/opt/intel/psxe_runtime_2017.4.196/linux/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/ibutils/bin:/opt/puppetlabs/bin:/home/jrhemst/.local/bin:/home/jrhemst/bin
          MAIL=/var/spool/mail/jrhemst
          TBBROOT=/opt/intel/psxe_runtime_2017.4.196/linux/tbb
          PWD=/home/jrhemst/mpi_test
          I_MPI_HYDRA_DEBUG=on
          XMODIFIERS=@im=none
          EDITOR=vim
          LANG=en_US.UTF-8
          MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
          LOADEDMODULES=
          SELINUX_LEVEL_REQUESTED=
          DAALROOT=/opt/intel/psxe_runtime_2017.4.196/linux/daal
          HISTCONTROL=ignoredups
          HOME=/home/jrhemst
          SHLVL=2
          I_MPI_DEBUG=6
          LOGNAME=jrhemst
          SSH_CONNECTION=192.168.1.2 33072 192.168.1.113 22
          CLASSPATH=/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib/mpi.jar:/opt/intel/psxe_runtime_2017.4.196/linux/daal/lib/daal.jar:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib/mpi.jar:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib/mpi.jar
          MODULESHOME=/usr/share/Modules
          LESSOPEN=||/usr/bin/lesspipe.sh %s
          XDG_RUNTIME_DIR=/run/user/1338400006
          I_MPI_ROOT=/opt/intel/psxe_runtime_2017.4.196/linux/mpi
          BASH_FUNC_module()=() {  eval `/usr/bin/modulecmd bash $*`
      }
          _=/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin/mpiexec.hydra
      
        Hydra internal environment:
        ---------------------------
          MPIR_CVAR_NEMESIS_ENABLE_CKPOINT=1
          GFORTRAN_UNBUFFERED_PRECONNECTED=y
          I_MPI_HYDRA_UUID=7e590000-3bc9-185f-6852-05000171c0a8
          DAPL_NETWORK_PROCESS_NUM=1
      
        Intel(R) MPI Library specific variables:
        ----------------------------------------
          I_MPI_PERHOST=allcores
          I_MPI_DEBUG_HYDRA=0
          I_MPI_FABRICS=shm:ofa
          I_MPI_MPIRUN=mpirun
          I_MPI_HYDRA_DEBUG=on
          I_MPI_DEBUG=6
          I_MPI_ROOT=/opt/intel/psxe_runtime_2017.4.196/linux/mpi
          I_MPI_HYDRA_UUID=7e590000-3bc9-185f-6852-05000171c0a8
      
      
          Proxy information:
          *********************
            [1] proxy: sp1.muskrat.local (16 cores)
            Exec list: ./server (1 processes); 
      
      
      ==================================================================================================
      
      [mpiexec@sp1.muskrat.local] Timeout set to -1 (-1 means infinite)
      [mpiexec@sp1.muskrat.local] Got a control port string of sp1.muskrat.local:33538
      
      Proxy launch args: /opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin/pmi_proxy --control-port sp1.muskrat.local:33538 --debug --pmi-connect alltoall --pmi-aggregate -s 0 --rmk user --launcher ssh --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 1079489000 --usize -2 --proxy-id 
      
      Arguments being passed to proxy 0:
      --version 3.2 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname sp1.muskrat.local --global-core-map 0,16,16 --pmi-id-map 0,0 --global-process-count 1 --auto-cleanup 1 --pmi-kvsname kvs_22910_0 --pmi-process-mapping (vector,(0,1,16)) --topolib ipl --ckpointlib blcr --ckpoint-prefix /tmp --ckpoint-preserve 1 --ckpoint off --ckpoint-num -1 --global-inherited-env 49 'I_MPI_PERHOST=allcores' 'LD_LIBRARY_PATH=/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib:/opt/intel/psxe_runtime_2017.4.196/linux/daal/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/mkl/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/tbb/lib/intel64/gcc4.1:/opt/intel/psxe_runtime_2017.4.196/linux/ipp/lib/intel64:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib' 'MKLROOT=/opt/intel/psxe_runtime_2017.4.196/linux/mkl' 'MANPATH=/opt/intel/psxe_runtime_2017.4.196/linux/mpi/man:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/man:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/man:/usr/local/share/man:/usr/share/man:/opt/ibutils/share/man' 'I_MPI_DEBUG_HYDRA=0' 'XDG_SESSION_ID=216' 'HOSTNAME=sp1.muskrat.local' 'SELINUX_ROLE_REQUESTED=' 'IPPROOT=/opt/intel/psxe_runtime_2017.4.196/linux/ipp' 'SHELL=/bin/bash' 'TERM=xterm-256color' 'HISTSIZE=1000' 'I_MPI_FABRICS=shm:ofa' 'SSH_CLIENT=192.168.1.2 33072 22' 'LIBRARY_PATH=/opt/intel/psxe_runtime_2017.4.196/linux/daal/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/mkl/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/tbb/lib/intel64/gcc4.1:/opt/intel/psxe_runtime_2017.4.196/linux/ipp/lib/intel64:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin' 'SELINUX_USE_CURRENT_RANGE=' 'SSH_TTY=/dev/pts/1' 'MIC_LD_LIBRARY_PATH=/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin_mic:/opt/intel/psxe_runtime_2017.4.196/linux/mkl/lib/intel64_lin_mic:/opt/intel/psxe_runtime_2017.4.196/linux/tbb/lib/mic:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin_mic:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin_mic' 'USER=jrhemst' 'LS_COLORS=rs=0:di=38;5;27:ln=38;5;51:mh=44;38;5;15:pi=40;38;5;11:so=38;5;13:do=38;5;5:bd=48;5;232;38;5;11:cd=48;5;232;38;5;3:or=48;5;232;38;5;9:mi=05;48;5;232;38;5;15:su=48;5;196;38;5;15:sg=48;5;11;38;5;16:ca=48;5;196;38;5;226:tw=48;5;10;38;5;16:ow=48;5;10;38;5;21:st=48;5;21;38;5;15:ex=38;5;34:*.tar=38;5;9:*.tgz=38;5;9:*.arc=38;5;9:*.arj=38;5;9:*.taz=38;5;9:*.lha=38;5;9:*.lz4=38;5;9:*.lzh=38;5;9:*.lzma=38;5;9:*.tlz=38;5;9:*.txz=38;5;9:*.tzo=38;5;9:*.t7z=38;5;9:*.zip=38;5;9:*.z=38;5;9:*.Z=38;5;9:*.dz=38;5;9:*.gz=38;5;9:*.lrz=38;5;9:*.lz=38;5;9:*.lzo=38;5;9:*.xz=38;5;9:*.bz2=38;5;9:*.bz=38;5;9:*.tbz=38;5;9:*.tbz2=38;5;9:*.tz=38;5;9:*.deb=38;5;9:*.rpm=38;5;9:*.jar=38;5;9:*.war=38;5;9:*.ear=38;5;9:*.sar=38;5;9:*.rar=38;5;9:*.alz=38;5;9:*.ace=38;5;9:*.zoo=38;5;9:*.cpio=38;5;9:*.7z=38;5;9:*.rz=38;5;9:*.cab=38;5;9:*.jpg=38;5;13:*.jpeg=38;5;13:*.gif=38;5;13:*.bmp=38;5;13:*.pbm=38;5;13:*.pgm=38;5;13:*.ppm=38;5;13:*.tga=38;5;13:*.xbm=38;5;13:*.xpm=38;5;13:*.tif=38;5;13:*.tiff=38;5;13:*.png=38;5;13:*.svg=38;5;13:*.svgz=38;5;13:*.mng=38;5;13:*.pcx=38;5;13:*.mov=38;5;13:*.mpg=38;5;13:*.mpeg=38;5;13:*.m2v=38;5;13:*.mkv=38;5;13:*.webm=38;5;13:*.ogm=38;5;13:*.mp4=38;5;13:*.m4v=38;5;13:*.mp4v=38;5;13:*.vob=38;5;13:*.qt=38;5;13:*.nuv=38;5;13:*.wmv=38;5;13:*.asf=38;5;13:*.rm=38;5;13:*.rmvb=38;5;13:*.flc=38;5;13:*.avi=38;5;13:*.fli=38;5;13:*.flv=38;5;13:*.gl=38;5;13:*.dl=38;5;13:*.xcf=38;5;13:*.xwd=38;5;13:*.yuv=38;5;13:*.cgm=38;5;13:*.emf=38;5;13:*.axv=38;5;13:*.anx=38;5;13:*.ogv=38;5;13:*.ogx=38;5;13:*.aac=38;5;45:*.au=38;5;45:*.flac=38;5;45:*.mid=38;5;45:*.midi=38;5;45:*.mka=38;5;45:*.mp3=38;5;45:*.mpc=38;5;45:*.ogg=38;5;45:*.ra=38;5;45:*.wav=38;5;45:*.axa=38;5;45:*.oga=38;5;45:*.spx=38;5;45:*.xspf=38;5;45:' 'I_MPI_MPIRUN=mpirun' 'MIC_LIBRARY_PATH=/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin_mic:/opt/intel/psxe_runtime_2017.4.196/linux/mkl/lib/intel64_lin_mic:/opt/intel/psxe_runtime_2017.4.196/linux/tbb/lib/mic' 'CPATH=/opt/intel/psxe_runtime_2017.4.196/linux/daal/include:/opt/intel/psxe_runtime_2017.4.196/linux/mkl/include:/opt/intel/psxe_runtime_2017.4.196/linux/tbb/include:/opt/intel/psxe_runtime_2017.4.196/linux/ipp/include:' 'NLSPATH=/opt/intel/psxe_runtime_2017.4.196/linux/mkl/lib/intel64_lin/locale/%l_%t/%N:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin/locale/%l_%t/%N:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin/locale/%l_%t/%N' 'PATH=/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin:/opt/intel/psxe_runtime_2017.4.196/linux/bin:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin:/opt/intel/psxe_runtime_2017.4.196/linux/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/ibutils/bin:/opt/puppetlabs/bin:/home/jrhemst/.local/bin:/home/jrhemst/bin' 'MAIL=/var/spool/mail/jrhemst' 'TBBROOT=/opt/intel/psxe_runtime_2017.4.196/linux/tbb' 'PWD=/home/jrhemst/mpi_test' 'I_MPI_HYDRA_DEBUG=on' 'XMODIFIERS=@im=none' 'EDITOR=vim' 'LANG=en_US.UTF-8' 'MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles' 'LOADEDMODULES=' 'SELINUX_LEVEL_REQUESTED=' 'DAALROOT=/opt/intel/psxe_runtime_2017.4.196/linux/daal' 'HISTCONTROL=ignoredups' 'HOME=/home/jrhemst' 'SHLVL=2' 'I_MPI_DEBUG=6' 'LOGNAME=jrhemst' 'SSH_CONNECTION=192.168.1.2 33072 192.168.1.113 22' 'CLASSPATH=/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib/mpi.jar:/opt/intel/psxe_runtime_2017.4.196/linux/daal/lib/daal.jar:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib/mpi.jar:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib/mpi.jar' 'MODULESHOME=/usr/share/Modules' 'LESSOPEN=||/usr/bin/lesspipe.sh %s' 'XDG_RUNTIME_DIR=/run/user/1338400006' 'I_MPI_ROOT=/opt/intel/psxe_runtime_2017.4.196/linux/mpi' 'BASH_FUNC_module()=() {  eval `/usr/bin/modulecmd bash $*`
      }' '_=/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin/mpiexec.hydra' --global-user-env 0 --global-system-env 4 'MPIR_CVAR_NEMESIS_ENABLE_CKPOINT=1' 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' 'I_MPI_HYDRA_UUID=7e590000-3bc9-185f-6852-05000171c0a8' 'DAPL_NETWORK_PROCESS_NUM=1' --proxy-core-count 16 --mpi-cmd-env mpirun -n 1 ./server  --exec --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir /home/jrhemst/mpi_test --exec-args 1 ./server 
      
      [mpiexec@sp1.muskrat.local] Launch arguments: /opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin/pmi_proxy --control-port sp1.muskrat.local:33538 --debug --pmi-connect alltoall --pmi-aggregate -s 0 --rmk user --launcher ssh --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 1079489000 --usize -2 --proxy-id 0 
      [proxy:0:0@sp1.muskrat.local] Start PMI_proxy 0
      [proxy:0:0@sp1.muskrat.local] STDIN will be redirected to 1 fd(s): 17 
      [proxy:0:0@sp1.muskrat.local] got pmi command (from 12): init
      pmi_version=1 pmi_subversion=1 
      [proxy:0:0@sp1.muskrat.local] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
      [proxy:0:0@sp1.muskrat.local] got pmi command (from 12): get_maxes
      
      [proxy:0:0@sp1.muskrat.local] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
      [proxy:0:0@sp1.muskrat.local] got pmi command (from 12): barrier_in
      
      [proxy:0:0@sp1.muskrat.local] forwarding command (cmd=barrier_in) upstream
      [mpiexec@sp1.muskrat.local] [pgid: 0] got PMI command: cmd=barrier_in
      [mpiexec@sp1.muskrat.local] PMI response to fd 8 pid 12: cmd=barrier_out
      [proxy:0:0@sp1.muskrat.local] PMI response: cmd=barrier_out
      [proxy:0:0@sp1.muskrat.local] got pmi command (from 12): get_ranks2hosts
      
      [proxy:0:0@sp1.muskrat.local] PMI response: put_ranks2hosts 26 1
      17 sp1.muskrat.local 0, 
      [proxy:0:0@sp1.muskrat.local] got pmi command (from 12): get_appnum
      
      [proxy:0:0@sp1.muskrat.local] PMI response: cmd=appnum appnum=0
      [proxy:0:0@sp1.muskrat.local] got pmi command (from 12): get_my_kvsname
      
      [proxy:0:0@sp1.muskrat.local] PMI response: cmd=my_kvsname kvsname=kvs_22910_0
      [proxy:0:0@sp1.muskrat.local] got pmi command (from 12): get_my_kvsname
      
      [proxy:0:0@sp1.muskrat.local] PMI response: cmd=my_kvsname kvsname=kvs_22910_0
      [0] MPI startup(): Intel(R) MPI Library, Version 2017 Update 3  Build 20170405 (id: 17193)
      [0] MPI startup(): Copyright (C) 2003-2017 Intel Corporation.  All rights reserved.
      [0] MPI startup(): Multi-threaded optimized library
      [proxy:0:0@sp1.muskrat.local] got pmi command (from 12): barrier_in
      
      [proxy:0:0@sp1.muskrat.local] forwarding command (cmd=barrier_in) upstream
      [mpiexec@sp1.muskrat.local] [pgid: 0] got PMI command: cmd=barrier_in
      [mpiexec@sp1.muskrat.local] PMI response to fd 8 pid 12: cmd=barrier_out
      [proxy:0:0@sp1.muskrat.local] PMI response: cmd=barrier_out
      [0] MPI startup(): Found 2 IB devices
      [0] MPI startup(): Open 0 IB device: mlx5_1
      [0] MPI startup(): Open 1 IB device: mlx5_0
      [proxy:0:0@sp1.muskrat.local] got pmi command (from 12): put
      kvsname=kvs_22910_0 key=OFA_Init_fail value=1 
      [proxy:0:0@sp1.muskrat.local] PMI response: cmd=put_result rc=0 msg=success
      [proxy:0:0@sp1.muskrat.local] forwarding command (cmd=put kvsname=kvs_22910_0 key=OFA_Init_fail value=1) upstream
      [mpiexec@sp1.muskrat.local] [pgid: 0] got PMI command: cmd=put kvsname=kvs_22910_0 key=OFA_Init_fail value=1
      [0] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
      

       

0 Kudos
2 Replies
Ralf_U_
Beginner
621 Views

Did you ever get this to work? We would be quite interested in a solution.

best, Ralf

Stephen_Behling
Beginner
621 Views

I found that I had to add:

export I_MPI_OFA_NUM_ADAPTERS=1
export I_MPI_OFA_ADAPTER_NAME="mlx5_0"
 

when using nodes with two IB adapters and using I_MPI_FABRICS=OFA.  While DAPL did not have problems, OFA seems to get confused. 

I am using Intel MPI 2018.1.163

Reply