Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Jacob_H_
Beginner
237 Views

Intel MPI unable to use 'ofa' fabric with Mellanox OFED on ConnectX-4 Lx EN ethernet cards

I have a system where I am unable to get Intel MPI to use the 'ofa' fabric with Mellanox OFED over ConnectX-4 Lx EN ethernet cards and have exhausted all means I know of to get it to work. I'd appreciate any input to help me to get this working.

Relevant info:

  • Operating System is CentOS 7.3
  • Intel(R) MPI Library for Linux* OS, Version 2017 Update 3 Build 20170405 (id: 17193)
  • NIC is a single Mellanox ConnectX-4 Lx EN ethernet-only card (RoCE v1 and v2 supported) with 2x25Gb ports
    •  lspci | grep Mell
      05:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
      05:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
    •  ibv_devinfo
      hca_id:	mlx5_1
      	transport:			InfiniBand (0)
      	fw_ver:				14.18.2000
      	node_guid:			248a:0703:008d:6237
      	sys_image_guid:			248a:0703:008d:6236
      	vendor_id:			0x02c9
      	vendor_part_id:			4117
      	hw_ver:				0x0
      	board_id:			MT_2420110034
      	phys_port_cnt:			1
      	Device ports:
      		port:	1
      			state:			PORT_ACTIVE (4)
      			max_mtu:		4096 (5)
      			active_mtu:		1024 (3)
      			sm_lid:			0
      			port_lid:		0
      			port_lmc:		0x00
      			link_layer:		Ethernet
      
      hca_id:	mlx5_0
      	transport:			InfiniBand (0)
      	fw_ver:				14.18.2000
      	node_guid:			248a:0703:008d:6236
      	sys_image_guid:			248a:0703:008d:6236
      	vendor_id:			0x02c9
      	vendor_part_id:			4117
      	hw_ver:				0x0
      	board_id:			MT_2420110034
      	phys_port_cnt:			1
      	Device ports:
      		port:	1
      			state:			PORT_ACTIVE (4)
      			max_mtu:		4096 (5)
      			active_mtu:		1024 (3)
      			sm_lid:			0
      			port_lid:		0
      			port_lmc:		0x00
      			link_layer:		Ethernet
  • Mellanox OFED driver, output of 'ofed_info' is:
    • MLNX_OFED_LINUX-4.0-2.0.0.1 (OFED-4.0-2.0.0):
    • Verified RDMA is working via perftest utilities, e.g. ib_write_bw, ib_write_lat, etc.
  • Specified that ofa fabric should be used via the environment variable I_MPI_FABRICS=shm:ofa
    • Verified that using 'tcp' and 'dapl' fabrics works (had to manually add RoCE entries to the dat.conf to get dapl to work)
    • $ env | grep I_MPI
      I_MPI_FABRICS=shm:ofa
      I_MPI_HYDRA_DEBUG=on
      I_MPI_DEBUG=6
      I_MPI_ROOT=/opt/intel/psxe_runtime_2017.4.196/linux/mpi
  • Output when I try and run my code when 'ofa' is specified as the fabric for Intel MPI:
    • mpirun -n 1 ./server
      host: sp1.muskrat.local
      
      ==================================================================================================
      mpiexec options:
      ----------------
        Base path: /opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin/
        Launcher: ssh
        Debug level: 1
        Enable X: -1
      
        Global environment:
        -------------------
          I_MPI_PERHOST=allcores
          LD_LIBRARY_PATH=/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib:/opt/intel/psxe_runtime_2017.4.196/linux/daal/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/mkl/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/tbb/lib/intel64/gcc4.1:/opt/intel/psxe_runtime_2017.4.196/linux/ipp/lib/intel64:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib
          MKLROOT=/opt/intel/psxe_runtime_2017.4.196/linux/mkl
          MANPATH=/opt/intel/psxe_runtime_2017.4.196/linux/mpi/man:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/man:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/man:/usr/local/share/man:/usr/share/man:/opt/ibutils/share/man
          I_MPI_DEBUG_HYDRA=0
          XDG_SESSION_ID=216
          HOSTNAME=sp1.muskrat.local
          SELINUX_ROLE_REQUESTED=
          IPPROOT=/opt/intel/psxe_runtime_2017.4.196/linux/ipp
          SHELL=/bin/bash
          TERM=xterm-256color
          HISTSIZE=1000
          I_MPI_FABRICS=shm:ofa
          SSH_CLIENT=192.168.1.2 33072 22
          LIBRARY_PATH=/opt/intel/psxe_runtime_2017.4.196/linux/daal/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/mkl/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/tbb/lib/intel64/gcc4.1:/opt/intel/psxe_runtime_2017.4.196/linux/ipp/lib/intel64:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin
          SELINUX_USE_CURRENT_RANGE=
          SSH_TTY=/dev/pts/1
          MIC_LD_LIBRARY_PATH=/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin_mic:/opt/intel/psxe_runtime_2017.4.196/linux/mkl/lib/intel64_lin_mic:/opt/intel/psxe_runtime_2017.4.196/linux/tbb/lib/mic:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin_mic:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin_mic
          USER=jrhemst
          LS_COLORS=rs=0:di=38;5;27:ln=38;5;51:mh=44;38;5;15:pi=40;38;5;11:so=38;5;13:do=38;5;5:bd=48;5;232;38;5;11:cd=48;5;232;38;5;3:or=48;5;232;38;5;9:mi=05;48;5;232;38;5;15:su=48;5;196;38;5;15:sg=48;5;11;38;5;16:ca=48;5;196;38;5;226:tw=48;5;10;38;5;16:ow=48;5;10;38;5;21:st=48;5;21;38;5;15:ex=38;5;34:*.tar=38;5;9:*.tgz=38;5;9:*.arc=38;5;9:*.arj=38;5;9:*.taz=38;5;9:*.lha=38;5;9:*.lz4=38;5;9:*.lzh=38;5;9:*.lzma=38;5;9:*.tlz=38;5;9:*.txz=38;5;9:*.tzo=38;5;9:*.t7z=38;5;9:*.zip=38;5;9:*.z=38;5;9:*.Z=38;5;9:*.dz=38;5;9:*.gz=38;5;9:*.lrz=38;5;9:*.lz=38;5;9:*.lzo=38;5;9:*.xz=38;5;9:*.bz2=38;5;9:*.bz=38;5;9:*.tbz=38;5;9:*.tbz2=38;5;9:*.tz=38;5;9:*.deb=38;5;9:*.rpm=38;5;9:*.jar=38;5;9:*.war=38;5;9:*.ear=38;5;9:*.sar=38;5;9:*.rar=38;5;9:*.alz=38;5;9:*.ace=38;5;9:*.zoo=38;5;9:*.cpio=38;5;9:*.7z=38;5;9:*.rz=38;5;9:*.cab=38;5;9:*.jpg=38;5;13:*.jpeg=38;5;13:*.gif=38;5;13:*.bmp=38;5;13:*.pbm=38;5;13:*.pgm=38;5;13:*.ppm=38;5;13:*.tga=38;5;13:*.xbm=38;5;13:*.xpm=38;5;13:*.tif=38;5;13:*.tiff=38;5;13:*.png=38;5;13:*.svg=38;5;13:*.svgz=38;5;13:*.mng=38;5;13:*.pcx=38;5;13:*.mov=38;5;13:*.mpg=38;5;13:*.mpeg=38;5;13:*.m2v=38;5;13:*.mkv=38;5;13:*.webm=38;5;13:*.ogm=38;5;13:*.mp4=38;5;13:*.m4v=38;5;13:*.mp4v=38;5;13:*.vob=38;5;13:*.qt=38;5;13:*.nuv=38;5;13:*.wmv=38;5;13:*.asf=38;5;13:*.rm=38;5;13:*.rmvb=38;5;13:*.flc=38;5;13:*.avi=38;5;13:*.fli=38;5;13:*.flv=38;5;13:*.gl=38;5;13:*.dl=38;5;13:*.xcf=38;5;13:*.xwd=38;5;13:*.yuv=38;5;13:*.cgm=38;5;13:*.emf=38;5;13:*.axv=38;5;13:*.anx=38;5;13:*.ogv=38;5;13:*.ogx=38;5;13:*.aac=38;5;45:*.au=38;5;45:*.flac=38;5;45:*.mid=38;5;45:*.midi=38;5;45:*.mka=38;5;45:*.mp3=38;5;45:*.mpc=38;5;45:*.ogg=38;5;45:*.ra=38;5;45:*.wav=38;5;45:*.axa=38;5;45:*.oga=38;5;45:*.spx=38;5;45:*.xspf=38;5;45:
          I_MPI_MPIRUN=mpirun
          MIC_LIBRARY_PATH=/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin_mic:/opt/intel/psxe_runtime_2017.4.196/linux/mkl/lib/intel64_lin_mic:/opt/intel/psxe_runtime_2017.4.196/linux/tbb/lib/mic
          CPATH=/opt/intel/psxe_runtime_2017.4.196/linux/daal/include:/opt/intel/psxe_runtime_2017.4.196/linux/mkl/include:/opt/intel/psxe_runtime_2017.4.196/linux/tbb/include:/opt/intel/psxe_runtime_2017.4.196/linux/ipp/include:
          NLSPATH=/opt/intel/psxe_runtime_2017.4.196/linux/mkl/lib/intel64_lin/locale/%l_%t/%N:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin/locale/%l_%t/%N:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin/locale/%l_%t/%N
          PATH=/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin:/opt/intel/psxe_runtime_2017.4.196/linux/bin:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin:/opt/intel/psxe_runtime_2017.4.196/linux/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/ibutils/bin:/opt/puppetlabs/bin:/home/jrhemst/.local/bin:/home/jrhemst/bin
          MAIL=/var/spool/mail/jrhemst
          TBBROOT=/opt/intel/psxe_runtime_2017.4.196/linux/tbb
          PWD=/home/jrhemst/mpi_test
          I_MPI_HYDRA_DEBUG=on
          XMODIFIERS=@im=none
          EDITOR=vim
          LANG=en_US.UTF-8
          MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
          LOADEDMODULES=
          SELINUX_LEVEL_REQUESTED=
          DAALROOT=/opt/intel/psxe_runtime_2017.4.196/linux/daal
          HISTCONTROL=ignoredups
          HOME=/home/jrhemst
          SHLVL=2
          I_MPI_DEBUG=6
          LOGNAME=jrhemst
          SSH_CONNECTION=192.168.1.2 33072 192.168.1.113 22
          CLASSPATH=/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib/mpi.jar:/opt/intel/psxe_runtime_2017.4.196/linux/daal/lib/daal.jar:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib/mpi.jar:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib/mpi.jar
          MODULESHOME=/usr/share/Modules
          LESSOPEN=||/usr/bin/lesspipe.sh %s
          XDG_RUNTIME_DIR=/run/user/1338400006
          I_MPI_ROOT=/opt/intel/psxe_runtime_2017.4.196/linux/mpi
          BASH_FUNC_module()=() {  eval `/usr/bin/modulecmd bash $*`
      }
          _=/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin/mpiexec.hydra
      
        Hydra internal environment:
        ---------------------------
          MPIR_CVAR_NEMESIS_ENABLE_CKPOINT=1
          GFORTRAN_UNBUFFERED_PRECONNECTED=y
          I_MPI_HYDRA_UUID=7e590000-3bc9-185f-6852-05000171c0a8
          DAPL_NETWORK_PROCESS_NUM=1
      
        Intel(R) MPI Library specific variables:
        ----------------------------------------
          I_MPI_PERHOST=allcores
          I_MPI_DEBUG_HYDRA=0
          I_MPI_FABRICS=shm:ofa
          I_MPI_MPIRUN=mpirun
          I_MPI_HYDRA_DEBUG=on
          I_MPI_DEBUG=6
          I_MPI_ROOT=/opt/intel/psxe_runtime_2017.4.196/linux/mpi
          I_MPI_HYDRA_UUID=7e590000-3bc9-185f-6852-05000171c0a8
      
      
          Proxy information:
          *********************
            [1] proxy: sp1.muskrat.local (16 cores)
            Exec list: ./server (1 processes); 
      
      
      ==================================================================================================
      
      [mpiexec@sp1.muskrat.local] Timeout set to -1 (-1 means infinite)
      [mpiexec@sp1.muskrat.local] Got a control port string of sp1.muskrat.local:33538
      
      Proxy launch args: /opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin/pmi_proxy --control-port sp1.muskrat.local:33538 --debug --pmi-connect alltoall --pmi-aggregate -s 0 --rmk user --launcher ssh --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 1079489000 --usize -2 --proxy-id 
      
      Arguments being passed to proxy 0:
      --version 3.2 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname sp1.muskrat.local --global-core-map 0,16,16 --pmi-id-map 0,0 --global-process-count 1 --auto-cleanup 1 --pmi-kvsname kvs_22910_0 --pmi-process-mapping (vector,(0,1,16)) --topolib ipl --ckpointlib blcr --ckpoint-prefix /tmp --ckpoint-preserve 1 --ckpoint off --ckpoint-num -1 --global-inherited-env 49 'I_MPI_PERHOST=allcores' 'LD_LIBRARY_PATH=/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib:/opt/intel/psxe_runtime_2017.4.196/linux/daal/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/mkl/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/tbb/lib/intel64/gcc4.1:/opt/intel/psxe_runtime_2017.4.196/linux/ipp/lib/intel64:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib' 'MKLROOT=/opt/intel/psxe_runtime_2017.4.196/linux/mkl' 'MANPATH=/opt/intel/psxe_runtime_2017.4.196/linux/mpi/man:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/man:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/man:/usr/local/share/man:/usr/share/man:/opt/ibutils/share/man' 'I_MPI_DEBUG_HYDRA=0' 'XDG_SESSION_ID=216' 'HOSTNAME=sp1.muskrat.local' 'SELINUX_ROLE_REQUESTED=' 'IPPROOT=/opt/intel/psxe_runtime_2017.4.196/linux/ipp' 'SHELL=/bin/bash' 'TERM=xterm-256color' 'HISTSIZE=1000' 'I_MPI_FABRICS=shm:ofa' 'SSH_CLIENT=192.168.1.2 33072 22' 'LIBRARY_PATH=/opt/intel/psxe_runtime_2017.4.196/linux/daal/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/mkl/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/tbb/lib/intel64/gcc4.1:/opt/intel/psxe_runtime_2017.4.196/linux/ipp/lib/intel64:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin' 'SELINUX_USE_CURRENT_RANGE=' 'SSH_TTY=/dev/pts/1' 'MIC_LD_LIBRARY_PATH=/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin_mic:/opt/intel/psxe_runtime_2017.4.196/linux/mkl/lib/intel64_lin_mic:/opt/intel/psxe_runtime_2017.4.196/linux/tbb/lib/mic:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin_mic:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin_mic' 'USER=jrhemst' 'LS_COLORS=rs=0:di=38;5;27:ln=38;5;51:mh=44;38;5;15:pi=40;38;5;11:so=38;5;13:do=38;5;5:bd=48;5;232;38;5;11:cd=48;5;232;38;5;3:or=48;5;232;38;5;9:mi=05;48;5;232;38;5;15:su=48;5;196;38;5;15:sg=48;5;11;38;5;16:ca=48;5;196;38;5;226:tw=48;5;10;38;5;16:ow=48;5;10;38;5;21:st=48;5;21;38;5;15:ex=38;5;34:*.tar=38;5;9:*.tgz=38;5;9:*.arc=38;5;9:*.arj=38;5;9:*.taz=38;5;9:*.lha=38;5;9:*.lz4=38;5;9:*.lzh=38;5;9:*.lzma=38;5;9:*.tlz=38;5;9:*.txz=38;5;9:*.tzo=38;5;9:*.t7z=38;5;9:*.zip=38;5;9:*.z=38;5;9:*.Z=38;5;9:*.dz=38;5;9:*.gz=38;5;9:*.lrz=38;5;9:*.lz=38;5;9:*.lzo=38;5;9:*.xz=38;5;9:*.bz2=38;5;9:*.bz=38;5;9:*.tbz=38;5;9:*.tbz2=38;5;9:*.tz=38;5;9:*.deb=38;5;9:*.rpm=38;5;9:*.jar=38;5;9:*.war=38;5;9:*.ear=38;5;9:*.sar=38;5;9:*.rar=38;5;9:*.alz=38;5;9:*.ace=38;5;9:*.zoo=38;5;9:*.cpio=38;5;9:*.7z=38;5;9:*.rz=38;5;9:*.cab=38;5;9:*.jpg=38;5;13:*.jpeg=38;5;13:*.gif=38;5;13:*.bmp=38;5;13:*.pbm=38;5;13:*.pgm=38;5;13:*.ppm=38;5;13:*.tga=38;5;13:*.xbm=38;5;13:*.xpm=38;5;13:*.tif=38;5;13:*.tiff=38;5;13:*.png=38;5;13:*.svg=38;5;13:*.svgz=38;5;13:*.mng=38;5;13:*.pcx=38;5;13:*.mov=38;5;13:*.mpg=38;5;13:*.mpeg=38;5;13:*.m2v=38;5;13:*.mkv=38;5;13:*.webm=38;5;13:*.ogm=38;5;13:*.mp4=38;5;13:*.m4v=38;5;13:*.mp4v=38;5;13:*.vob=38;5;13:*.qt=38;5;13:*.nuv=38;5;13:*.wmv=38;5;13:*.asf=38;5;13:*.rm=38;5;13:*.rmvb=38;5;13:*.flc=38;5;13:*.avi=38;5;13:*.fli=38;5;13:*.flv=38;5;13:*.gl=38;5;13:*.dl=38;5;13:*.xcf=38;5;13:*.xwd=38;5;13:*.yuv=38;5;13:*.cgm=38;5;13:*.emf=38;5;13:*.axv=38;5;13:*.anx=38;5;13:*.ogv=38;5;13:*.ogx=38;5;13:*.aac=38;5;45:*.au=38;5;45:*.flac=38;5;45:*.mid=38;5;45:*.midi=38;5;45:*.mka=38;5;45:*.mp3=38;5;45:*.mpc=38;5;45:*.ogg=38;5;45:*.ra=38;5;45:*.wav=38;5;45:*.axa=38;5;45:*.oga=38;5;45:*.spx=38;5;45:*.xspf=38;5;45:' 'I_MPI_MPIRUN=mpirun' 'MIC_LIBRARY_PATH=/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin_mic:/opt/intel/psxe_runtime_2017.4.196/linux/mkl/lib/intel64_lin_mic:/opt/intel/psxe_runtime_2017.4.196/linux/tbb/lib/mic' 'CPATH=/opt/intel/psxe_runtime_2017.4.196/linux/daal/include:/opt/intel/psxe_runtime_2017.4.196/linux/mkl/include:/opt/intel/psxe_runtime_2017.4.196/linux/tbb/include:/opt/intel/psxe_runtime_2017.4.196/linux/ipp/include:' 'NLSPATH=/opt/intel/psxe_runtime_2017.4.196/linux/mkl/lib/intel64_lin/locale/%l_%t/%N:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin/locale/%l_%t/%N:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin/locale/%l_%t/%N' 'PATH=/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin:/opt/intel/psxe_runtime_2017.4.196/linux/bin:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin:/opt/intel/psxe_runtime_2017.4.196/linux/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/ibutils/bin:/opt/puppetlabs/bin:/home/jrhemst/.local/bin:/home/jrhemst/bin' 'MAIL=/var/spool/mail/jrhemst' 'TBBROOT=/opt/intel/psxe_runtime_2017.4.196/linux/tbb' 'PWD=/home/jrhemst/mpi_test' 'I_MPI_HYDRA_DEBUG=on' 'XMODIFIERS=@im=none' 'EDITOR=vim' 'LANG=en_US.UTF-8' 'MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles' 'LOADEDMODULES=' 'SELINUX_LEVEL_REQUESTED=' 'DAALROOT=/opt/intel/psxe_runtime_2017.4.196/linux/daal' 'HISTCONTROL=ignoredups' 'HOME=/home/jrhemst' 'SHLVL=2' 'I_MPI_DEBUG=6' 'LOGNAME=jrhemst' 'SSH_CONNECTION=192.168.1.2 33072 192.168.1.113 22' 'CLASSPATH=/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib/mpi.jar:/opt/intel/psxe_runtime_2017.4.196/linux/daal/lib/daal.jar:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib/mpi.jar:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib/mpi.jar' 'MODULESHOME=/usr/share/Modules' 'LESSOPEN=||/usr/bin/lesspipe.sh %s' 'XDG_RUNTIME_DIR=/run/user/1338400006' 'I_MPI_ROOT=/opt/intel/psxe_runtime_2017.4.196/linux/mpi' 'BASH_FUNC_module()=() {  eval `/usr/bin/modulecmd bash $*`
      }' '_=/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin/mpiexec.hydra' --global-user-env 0 --global-system-env 4 'MPIR_CVAR_NEMESIS_ENABLE_CKPOINT=1' 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' 'I_MPI_HYDRA_UUID=7e590000-3bc9-185f-6852-05000171c0a8' 'DAPL_NETWORK_PROCESS_NUM=1' --proxy-core-count 16 --mpi-cmd-env mpirun -n 1 ./server  --exec --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir /home/jrhemst/mpi_test --exec-args 1 ./server 
      
      [mpiexec@sp1.muskrat.local] Launch arguments: /opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin/pmi_proxy --control-port sp1.muskrat.local:33538 --debug --pmi-connect alltoall --pmi-aggregate -s 0 --rmk user --launcher ssh --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 1079489000 --usize -2 --proxy-id 0 
      [proxy:0:0@sp1.muskrat.local] Start PMI_proxy 0
      [proxy:0:0@sp1.muskrat.local] STDIN will be redirected to 1 fd(s): 17 
      [proxy:0:0@sp1.muskrat.local] got pmi command (from 12): init
      pmi_version=1 pmi_subversion=1 
      [proxy:0:0@sp1.muskrat.local] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
      [proxy:0:0@sp1.muskrat.local] got pmi command (from 12): get_maxes
      
      [proxy:0:0@sp1.muskrat.local] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
      [proxy:0:0@sp1.muskrat.local] got pmi command (from 12): barrier_in
      
      [proxy:0:0@sp1.muskrat.local] forwarding command (cmd=barrier_in) upstream
      [mpiexec@sp1.muskrat.local] [pgid: 0] got PMI command: cmd=barrier_in
      [mpiexec@sp1.muskrat.local] PMI response to fd 8 pid 12: cmd=barrier_out
      [proxy:0:0@sp1.muskrat.local] PMI response: cmd=barrier_out
      [proxy:0:0@sp1.muskrat.local] got pmi command (from 12): get_ranks2hosts
      
      [proxy:0:0@sp1.muskrat.local] PMI response: put_ranks2hosts 26 1
      17 sp1.muskrat.local 0, 
      [proxy:0:0@sp1.muskrat.local] got pmi command (from 12): get_appnum
      
      [proxy:0:0@sp1.muskrat.local] PMI response: cmd=appnum appnum=0
      [proxy:0:0@sp1.muskrat.local] got pmi command (from 12): get_my_kvsname
      
      [proxy:0:0@sp1.muskrat.local] PMI response: cmd=my_kvsname kvsname=kvs_22910_0
      [proxy:0:0@sp1.muskrat.local] got pmi command (from 12): get_my_kvsname
      
      [proxy:0:0@sp1.muskrat.local] PMI response: cmd=my_kvsname kvsname=kvs_22910_0
      [0] MPI startup(): Intel(R) MPI Library, Version 2017 Update 3  Build 20170405 (id: 17193)
      [0] MPI startup(): Copyright (C) 2003-2017 Intel Corporation.  All rights reserved.
      [0] MPI startup(): Multi-threaded optimized library
      [proxy:0:0@sp1.muskrat.local] got pmi command (from 12): barrier_in
      
      [proxy:0:0@sp1.muskrat.local] forwarding command (cmd=barrier_in) upstream
      [mpiexec@sp1.muskrat.local] [pgid: 0] got PMI command: cmd=barrier_in
      [mpiexec@sp1.muskrat.local] PMI response to fd 8 pid 12: cmd=barrier_out
      [proxy:0:0@sp1.muskrat.local] PMI response: cmd=barrier_out
      [0] MPI startup(): Found 2 IB devices
      [0] MPI startup(): Open 0 IB device: mlx5_1
      [0] MPI startup(): Open 1 IB device: mlx5_0
      [proxy:0:0@sp1.muskrat.local] got pmi command (from 12): put
      kvsname=kvs_22910_0 key=OFA_Init_fail value=1 
      [proxy:0:0@sp1.muskrat.local] PMI response: cmd=put_result rc=0 msg=success
      [proxy:0:0@sp1.muskrat.local] forwarding command (cmd=put kvsname=kvs_22910_0 key=OFA_Init_fail value=1) upstream
      [mpiexec@sp1.muskrat.local] [pgid: 0] got PMI command: cmd=put kvsname=kvs_22910_0 key=OFA_Init_fail value=1
      [0] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
      

       

0 Kudos
2 Replies
Ralf_U_
Beginner
237 Views

Did you ever get this to work? We would be quite interested in a solution.

best, Ralf

Stephen_Behling
Beginner
237 Views

I found that I had to add:

export I_MPI_OFA_NUM_ADAPTERS=1
export I_MPI_OFA_ADAPTER_NAME="mlx5_0"
 

when using nodes with two IB adapters and using I_MPI_FABRICS=OFA.  While DAPL did not have problems, OFA seems to get confused. 

I am using Intel MPI 2018.1.163

Reply