- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a system where I am unable to get Intel MPI to use the 'ofa' fabric with Mellanox OFED over ConnectX-4 Lx EN ethernet cards and have exhausted all means I know of to get it to work. I'd appreciate any input to help me to get this working.
Relevant info:
- Operating System is CentOS 7.3
- Intel(R) MPI Library for Linux* OS, Version 2017 Update 3 Build 20170405 (id: 17193)
- NIC is a single Mellanox ConnectX-4 Lx EN ethernet-only card (RoCE v1 and v2 supported) with 2x25Gb ports
-
lspci | grep Mell 05:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] 05:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
-
ibv_devinfo hca_id: mlx5_1 transport: InfiniBand (0) fw_ver: 14.18.2000 node_guid: 248a:0703:008d:6237 sys_image_guid: 248a:0703:008d:6236 vendor_id: 0x02c9 vendor_part_id: 4117 hw_ver: 0x0 board_id: MT_2420110034 phys_port_cnt: 1 Device ports: port: 1 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: 1024 (3) sm_lid: 0 port_lid: 0 port_lmc: 0x00 link_layer: Ethernet hca_id: mlx5_0 transport: InfiniBand (0) fw_ver: 14.18.2000 node_guid: 248a:0703:008d:6236 sys_image_guid: 248a:0703:008d:6236 vendor_id: 0x02c9 vendor_part_id: 4117 hw_ver: 0x0 board_id: MT_2420110034 phys_port_cnt: 1 Device ports: port: 1 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: 1024 (3) sm_lid: 0 port_lid: 0 port_lmc: 0x00 link_layer: Ethernet
-
- Mellanox OFED driver, output of 'ofed_info' is:
- MLNX_OFED_LINUX-4.0-2.0.0.1 (OFED-4.0-2.0.0):
- Verified RDMA is working via perftest utilities, e.g. ib_write_bw, ib_write_lat, etc.
- Specified that ofa fabric should be used via the environment variable I_MPI_FABRICS=shm:ofa
- Verified that using 'tcp' and 'dapl' fabrics works (had to manually add RoCE entries to the dat.conf to get dapl to work)
-
$ env | grep I_MPI I_MPI_FABRICS=shm:ofa I_MPI_HYDRA_DEBUG=on I_MPI_DEBUG=6 I_MPI_ROOT=/opt/intel/psxe_runtime_2017.4.196/linux/mpi
- Output when I try and run my code when 'ofa' is specified as the fabric for Intel MPI:
-
mpirun -n 1 ./server host: sp1.muskrat.local ================================================================================================== mpiexec options: ---------------- Base path: /opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin/ Launcher: ssh Debug level: 1 Enable X: -1 Global environment: ------------------- I_MPI_PERHOST=allcores LD_LIBRARY_PATH=/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib:/opt/intel/psxe_runtime_2017.4.196/linux/daal/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/mkl/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/tbb/lib/intel64/gcc4.1:/opt/intel/psxe_runtime_2017.4.196/linux/ipp/lib/intel64:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib MKLROOT=/opt/intel/psxe_runtime_2017.4.196/linux/mkl MANPATH=/opt/intel/psxe_runtime_2017.4.196/linux/mpi/man:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/man:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/man:/usr/local/share/man:/usr/share/man:/opt/ibutils/share/man I_MPI_DEBUG_HYDRA=0 XDG_SESSION_ID=216 HOSTNAME=sp1.muskrat.local SELINUX_ROLE_REQUESTED= IPPROOT=/opt/intel/psxe_runtime_2017.4.196/linux/ipp SHELL=/bin/bash TERM=xterm-256color HISTSIZE=1000 I_MPI_FABRICS=shm:ofa SSH_CLIENT=192.168.1.2 33072 22 LIBRARY_PATH=/opt/intel/psxe_runtime_2017.4.196/linux/daal/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/mkl/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/tbb/lib/intel64/gcc4.1:/opt/intel/psxe_runtime_2017.4.196/linux/ipp/lib/intel64:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin SELINUX_USE_CURRENT_RANGE= SSH_TTY=/dev/pts/1 MIC_LD_LIBRARY_PATH=/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin_mic:/opt/intel/psxe_runtime_2017.4.196/linux/mkl/lib/intel64_lin_mic:/opt/intel/psxe_runtime_2017.4.196/linux/tbb/lib/mic:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin_mic:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin_mic USER=jrhemst LS_COLORS=rs=0:di=38;5;27:ln=38;5;51:mh=44;38;5;15:pi=40;38;5;11:so=38;5;13:do=38;5;5:bd=48;5;232;38;5;11:cd=48;5;232;38;5;3:or=48;5;232;38;5;9:mi=05;48;5;232;38;5;15:su=48;5;196;38;5;15:sg=48;5;11;38;5;16:ca=48;5;196;38;5;226:tw=48;5;10;38;5;16:ow=48;5;10;38;5;21:st=48;5;21;38;5;15:ex=38;5;34:*.tar=38;5;9:*.tgz=38;5;9:*.arc=38;5;9:*.arj=38;5;9:*.taz=38;5;9:*.lha=38;5;9:*.lz4=38;5;9:*.lzh=38;5;9:*.lzma=38;5;9:*.tlz=38;5;9:*.txz=38;5;9:*.tzo=38;5;9:*.t7z=38;5;9:*.zip=38;5;9:*.z=38;5;9:*.Z=38;5;9:*.dz=38;5;9:*.gz=38;5;9:*.lrz=38;5;9:*.lz=38;5;9:*.lzo=38;5;9:*.xz=38;5;9:*.bz2=38;5;9:*.bz=38;5;9:*.tbz=38;5;9:*.tbz2=38;5;9:*.tz=38;5;9:*.deb=38;5;9:*.rpm=38;5;9:*.jar=38;5;9:*.war=38;5;9:*.ear=38;5;9:*.sar=38;5;9:*.rar=38;5;9:*.alz=38;5;9:*.ace=38;5;9:*.zoo=38;5;9:*.cpio=38;5;9:*.7z=38;5;9:*.rz=38;5;9:*.cab=38;5;9:*.jpg=38;5;13:*.jpeg=38;5;13:*.gif=38;5;13:*.bmp=38;5;13:*.pbm=38;5;13:*.pgm=38;5;13:*.ppm=38;5;13:*.tga=38;5;13:*.xbm=38;5;13:*.xpm=38;5;13:*.tif=38;5;13:*.tiff=38;5;13:*.png=38;5;13:*.svg=38;5;13:*.svgz=38;5;13:*.mng=38;5;13:*.pcx=38;5;13:*.mov=38;5;13:*.mpg=38;5;13:*.mpeg=38;5;13:*.m2v=38;5;13:*.mkv=38;5;13:*.webm=38;5;13:*.ogm=38;5;13:*.mp4=38;5;13:*.m4v=38;5;13:*.mp4v=38;5;13:*.vob=38;5;13:*.qt=38;5;13:*.nuv=38;5;13:*.wmv=38;5;13:*.asf=38;5;13:*.rm=38;5;13:*.rmvb=38;5;13:*.flc=38;5;13:*.avi=38;5;13:*.fli=38;5;13:*.flv=38;5;13:*.gl=38;5;13:*.dl=38;5;13:*.xcf=38;5;13:*.xwd=38;5;13:*.yuv=38;5;13:*.cgm=38;5;13:*.emf=38;5;13:*.axv=38;5;13:*.anx=38;5;13:*.ogv=38;5;13:*.ogx=38;5;13:*.aac=38;5;45:*.au=38;5;45:*.flac=38;5;45:*.mid=38;5;45:*.midi=38;5;45:*.mka=38;5;45:*.mp3=38;5;45:*.mpc=38;5;45:*.ogg=38;5;45:*.ra=38;5;45:*.wav=38;5;45:*.axa=38;5;45:*.oga=38;5;45:*.spx=38;5;45:*.xspf=38;5;45: I_MPI_MPIRUN=mpirun MIC_LIBRARY_PATH=/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin_mic:/opt/intel/psxe_runtime_2017.4.196/linux/mkl/lib/intel64_lin_mic:/opt/intel/psxe_runtime_2017.4.196/linux/tbb/lib/mic CPATH=/opt/intel/psxe_runtime_2017.4.196/linux/daal/include:/opt/intel/psxe_runtime_2017.4.196/linux/mkl/include:/opt/intel/psxe_runtime_2017.4.196/linux/tbb/include:/opt/intel/psxe_runtime_2017.4.196/linux/ipp/include: NLSPATH=/opt/intel/psxe_runtime_2017.4.196/linux/mkl/lib/intel64_lin/locale/%l_%t/%N:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin/locale/%l_%t/%N:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin/locale/%l_%t/%N PATH=/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin:/opt/intel/psxe_runtime_2017.4.196/linux/bin:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin:/opt/intel/psxe_runtime_2017.4.196/linux/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/ibutils/bin:/opt/puppetlabs/bin:/home/jrhemst/.local/bin:/home/jrhemst/bin MAIL=/var/spool/mail/jrhemst TBBROOT=/opt/intel/psxe_runtime_2017.4.196/linux/tbb PWD=/home/jrhemst/mpi_test I_MPI_HYDRA_DEBUG=on XMODIFIERS=@im=none EDITOR=vim LANG=en_US.UTF-8 MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles LOADEDMODULES= SELINUX_LEVEL_REQUESTED= DAALROOT=/opt/intel/psxe_runtime_2017.4.196/linux/daal HISTCONTROL=ignoredups HOME=/home/jrhemst SHLVL=2 I_MPI_DEBUG=6 LOGNAME=jrhemst SSH_CONNECTION=192.168.1.2 33072 192.168.1.113 22 CLASSPATH=/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib/mpi.jar:/opt/intel/psxe_runtime_2017.4.196/linux/daal/lib/daal.jar:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib/mpi.jar:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib/mpi.jar MODULESHOME=/usr/share/Modules LESSOPEN=||/usr/bin/lesspipe.sh %s XDG_RUNTIME_DIR=/run/user/1338400006 I_MPI_ROOT=/opt/intel/psxe_runtime_2017.4.196/linux/mpi BASH_FUNC_module()=() { eval `/usr/bin/modulecmd bash $*` } _=/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin/mpiexec.hydra Hydra internal environment: --------------------------- MPIR_CVAR_NEMESIS_ENABLE_CKPOINT=1 GFORTRAN_UNBUFFERED_PRECONNECTED=y I_MPI_HYDRA_UUID=7e590000-3bc9-185f-6852-05000171c0a8 DAPL_NETWORK_PROCESS_NUM=1 Intel(R) MPI Library specific variables: ---------------------------------------- I_MPI_PERHOST=allcores I_MPI_DEBUG_HYDRA=0 I_MPI_FABRICS=shm:ofa I_MPI_MPIRUN=mpirun I_MPI_HYDRA_DEBUG=on I_MPI_DEBUG=6 I_MPI_ROOT=/opt/intel/psxe_runtime_2017.4.196/linux/mpi I_MPI_HYDRA_UUID=7e590000-3bc9-185f-6852-05000171c0a8 Proxy information: ********************* [1] proxy: sp1.muskrat.local (16 cores) Exec list: ./server (1 processes); ================================================================================================== [mpiexec@sp1.muskrat.local] Timeout set to -1 (-1 means infinite) [mpiexec@sp1.muskrat.local] Got a control port string of sp1.muskrat.local:33538 Proxy launch args: /opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin/pmi_proxy --control-port sp1.muskrat.local:33538 --debug --pmi-connect alltoall --pmi-aggregate -s 0 --rmk user --launcher ssh --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 1079489000 --usize -2 --proxy-id Arguments being passed to proxy 0: --version 3.2 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname sp1.muskrat.local --global-core-map 0,16,16 --pmi-id-map 0,0 --global-process-count 1 --auto-cleanup 1 --pmi-kvsname kvs_22910_0 --pmi-process-mapping (vector,(0,1,16)) --topolib ipl --ckpointlib blcr --ckpoint-prefix /tmp --ckpoint-preserve 1 --ckpoint off --ckpoint-num -1 --global-inherited-env 49 'I_MPI_PERHOST=allcores' 'LD_LIBRARY_PATH=/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib:/opt/intel/psxe_runtime_2017.4.196/linux/daal/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/mkl/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/tbb/lib/intel64/gcc4.1:/opt/intel/psxe_runtime_2017.4.196/linux/ipp/lib/intel64:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib' 'MKLROOT=/opt/intel/psxe_runtime_2017.4.196/linux/mkl' 'MANPATH=/opt/intel/psxe_runtime_2017.4.196/linux/mpi/man:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/man:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/man:/usr/local/share/man:/usr/share/man:/opt/ibutils/share/man' 'I_MPI_DEBUG_HYDRA=0' 'XDG_SESSION_ID=216' 'HOSTNAME=sp1.muskrat.local' 'SELINUX_ROLE_REQUESTED=' 'IPPROOT=/opt/intel/psxe_runtime_2017.4.196/linux/ipp' 'SHELL=/bin/bash' 'TERM=xterm-256color' 'HISTSIZE=1000' 'I_MPI_FABRICS=shm:ofa' 'SSH_CLIENT=192.168.1.2 33072 22' 'LIBRARY_PATH=/opt/intel/psxe_runtime_2017.4.196/linux/daal/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/mkl/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/tbb/lib/intel64/gcc4.1:/opt/intel/psxe_runtime_2017.4.196/linux/ipp/lib/intel64:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin' 'SELINUX_USE_CURRENT_RANGE=' 'SSH_TTY=/dev/pts/1' 'MIC_LD_LIBRARY_PATH=/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin_mic:/opt/intel/psxe_runtime_2017.4.196/linux/mkl/lib/intel64_lin_mic:/opt/intel/psxe_runtime_2017.4.196/linux/tbb/lib/mic:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin_mic:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/mic/lib:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin_mic' 'USER=jrhemst' 'LS_COLORS=rs=0:di=38;5;27:ln=38;5;51:mh=44;38;5;15:pi=40;38;5;11:so=38;5;13:do=38;5;5:bd=48;5;232;38;5;11:cd=48;5;232;38;5;3:or=48;5;232;38;5;9:mi=05;48;5;232;38;5;15:su=48;5;196;38;5;15:sg=48;5;11;38;5;16:ca=48;5;196;38;5;226:tw=48;5;10;38;5;16:ow=48;5;10;38;5;21:st=48;5;21;38;5;15:ex=38;5;34:*.tar=38;5;9:*.tgz=38;5;9:*.arc=38;5;9:*.arj=38;5;9:*.taz=38;5;9:*.lha=38;5;9:*.lz4=38;5;9:*.lzh=38;5;9:*.lzma=38;5;9:*.tlz=38;5;9:*.txz=38;5;9:*.tzo=38;5;9:*.t7z=38;5;9:*.zip=38;5;9:*.z=38;5;9:*.Z=38;5;9:*.dz=38;5;9:*.gz=38;5;9:*.lrz=38;5;9:*.lz=38;5;9:*.lzo=38;5;9:*.xz=38;5;9:*.bz2=38;5;9:*.bz=38;5;9:*.tbz=38;5;9:*.tbz2=38;5;9:*.tz=38;5;9:*.deb=38;5;9:*.rpm=38;5;9:*.jar=38;5;9:*.war=38;5;9:*.ear=38;5;9:*.sar=38;5;9:*.rar=38;5;9:*.alz=38;5;9:*.ace=38;5;9:*.zoo=38;5;9:*.cpio=38;5;9:*.7z=38;5;9:*.rz=38;5;9:*.cab=38;5;9:*.jpg=38;5;13:*.jpeg=38;5;13:*.gif=38;5;13:*.bmp=38;5;13:*.pbm=38;5;13:*.pgm=38;5;13:*.ppm=38;5;13:*.tga=38;5;13:*.xbm=38;5;13:*.xpm=38;5;13:*.tif=38;5;13:*.tiff=38;5;13:*.png=38;5;13:*.svg=38;5;13:*.svgz=38;5;13:*.mng=38;5;13:*.pcx=38;5;13:*.mov=38;5;13:*.mpg=38;5;13:*.mpeg=38;5;13:*.m2v=38;5;13:*.mkv=38;5;13:*.webm=38;5;13:*.ogm=38;5;13:*.mp4=38;5;13:*.m4v=38;5;13:*.mp4v=38;5;13:*.vob=38;5;13:*.qt=38;5;13:*.nuv=38;5;13:*.wmv=38;5;13:*.asf=38;5;13:*.rm=38;5;13:*.rmvb=38;5;13:*.flc=38;5;13:*.avi=38;5;13:*.fli=38;5;13:*.flv=38;5;13:*.gl=38;5;13:*.dl=38;5;13:*.xcf=38;5;13:*.xwd=38;5;13:*.yuv=38;5;13:*.cgm=38;5;13:*.emf=38;5;13:*.axv=38;5;13:*.anx=38;5;13:*.ogv=38;5;13:*.ogx=38;5;13:*.aac=38;5;45:*.au=38;5;45:*.flac=38;5;45:*.mid=38;5;45:*.midi=38;5;45:*.mka=38;5;45:*.mp3=38;5;45:*.mpc=38;5;45:*.ogg=38;5;45:*.ra=38;5;45:*.wav=38;5;45:*.axa=38;5;45:*.oga=38;5;45:*.spx=38;5;45:*.xspf=38;5;45:' 'I_MPI_MPIRUN=mpirun' 'MIC_LIBRARY_PATH=/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin_mic:/opt/intel/psxe_runtime_2017.4.196/linux/mkl/lib/intel64_lin_mic:/opt/intel/psxe_runtime_2017.4.196/linux/tbb/lib/mic' 'CPATH=/opt/intel/psxe_runtime_2017.4.196/linux/daal/include:/opt/intel/psxe_runtime_2017.4.196/linux/mkl/include:/opt/intel/psxe_runtime_2017.4.196/linux/tbb/include:/opt/intel/psxe_runtime_2017.4.196/linux/ipp/include:' 'NLSPATH=/opt/intel/psxe_runtime_2017.4.196/linux/mkl/lib/intel64_lin/locale/%l_%t/%N:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin/locale/%l_%t/%N:/opt/intel/psxe_runtime_2017.4.196/linux/compiler/lib/intel64_lin/locale/%l_%t/%N' 'PATH=/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin:/opt/intel/psxe_runtime_2017.4.196/linux/bin:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin:/opt/intel/psxe_runtime_2017.4.196/linux/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/ibutils/bin:/opt/puppetlabs/bin:/home/jrhemst/.local/bin:/home/jrhemst/bin' 'MAIL=/var/spool/mail/jrhemst' 'TBBROOT=/opt/intel/psxe_runtime_2017.4.196/linux/tbb' 'PWD=/home/jrhemst/mpi_test' 'I_MPI_HYDRA_DEBUG=on' 'XMODIFIERS=@im=none' 'EDITOR=vim' 'LANG=en_US.UTF-8' 'MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles' 'LOADEDMODULES=' 'SELINUX_LEVEL_REQUESTED=' 'DAALROOT=/opt/intel/psxe_runtime_2017.4.196/linux/daal' 'HISTCONTROL=ignoredups' 'HOME=/home/jrhemst' 'SHLVL=2' 'I_MPI_DEBUG=6' 'LOGNAME=jrhemst' 'SSH_CONNECTION=192.168.1.2 33072 192.168.1.113 22' 'CLASSPATH=/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib/mpi.jar:/opt/intel/psxe_runtime_2017.4.196/linux/daal/lib/daal.jar:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib/mpi.jar:/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib/mpi.jar' 'MODULESHOME=/usr/share/Modules' 'LESSOPEN=||/usr/bin/lesspipe.sh %s' 'XDG_RUNTIME_DIR=/run/user/1338400006' 'I_MPI_ROOT=/opt/intel/psxe_runtime_2017.4.196/linux/mpi' 'BASH_FUNC_module()=() { eval `/usr/bin/modulecmd bash $*` }' '_=/opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin/mpiexec.hydra' --global-user-env 0 --global-system-env 4 'MPIR_CVAR_NEMESIS_ENABLE_CKPOINT=1' 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' 'I_MPI_HYDRA_UUID=7e590000-3bc9-185f-6852-05000171c0a8' 'DAPL_NETWORK_PROCESS_NUM=1' --proxy-core-count 16 --mpi-cmd-env mpirun -n 1 ./server --exec --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir /home/jrhemst/mpi_test --exec-args 1 ./server [mpiexec@sp1.muskrat.local] Launch arguments: /opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/bin/pmi_proxy --control-port sp1.muskrat.local:33538 --debug --pmi-connect alltoall --pmi-aggregate -s 0 --rmk user --launcher ssh --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 1079489000 --usize -2 --proxy-id 0 [proxy:0:0@sp1.muskrat.local] Start PMI_proxy 0 [proxy:0:0@sp1.muskrat.local] STDIN will be redirected to 1 fd(s): 17 [proxy:0:0@sp1.muskrat.local] got pmi command (from 12): init pmi_version=1 pmi_subversion=1 [proxy:0:0@sp1.muskrat.local] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0 [proxy:0:0@sp1.muskrat.local] got pmi command (from 12): get_maxes [proxy:0:0@sp1.muskrat.local] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024 [proxy:0:0@sp1.muskrat.local] got pmi command (from 12): barrier_in [proxy:0:0@sp1.muskrat.local] forwarding command (cmd=barrier_in) upstream [mpiexec@sp1.muskrat.local] [pgid: 0] got PMI command: cmd=barrier_in [mpiexec@sp1.muskrat.local] PMI response to fd 8 pid 12: cmd=barrier_out [proxy:0:0@sp1.muskrat.local] PMI response: cmd=barrier_out [proxy:0:0@sp1.muskrat.local] got pmi command (from 12): get_ranks2hosts [proxy:0:0@sp1.muskrat.local] PMI response: put_ranks2hosts 26 1 17 sp1.muskrat.local 0, [proxy:0:0@sp1.muskrat.local] got pmi command (from 12): get_appnum [proxy:0:0@sp1.muskrat.local] PMI response: cmd=appnum appnum=0 [proxy:0:0@sp1.muskrat.local] got pmi command (from 12): get_my_kvsname [proxy:0:0@sp1.muskrat.local] PMI response: cmd=my_kvsname kvsname=kvs_22910_0 [proxy:0:0@sp1.muskrat.local] got pmi command (from 12): get_my_kvsname [proxy:0:0@sp1.muskrat.local] PMI response: cmd=my_kvsname kvsname=kvs_22910_0 [0] MPI startup(): Intel(R) MPI Library, Version 2017 Update 3 Build 20170405 (id: 17193) [0] MPI startup(): Copyright (C) 2003-2017 Intel Corporation. All rights reserved. [0] MPI startup(): Multi-threaded optimized library [proxy:0:0@sp1.muskrat.local] got pmi command (from 12): barrier_in [proxy:0:0@sp1.muskrat.local] forwarding command (cmd=barrier_in) upstream [mpiexec@sp1.muskrat.local] [pgid: 0] got PMI command: cmd=barrier_in [mpiexec@sp1.muskrat.local] PMI response to fd 8 pid 12: cmd=barrier_out [proxy:0:0@sp1.muskrat.local] PMI response: cmd=barrier_out [0] MPI startup(): Found 2 IB devices [0] MPI startup(): Open 0 IB device: mlx5_1 [0] MPI startup(): Open 1 IB device: mlx5_0 [proxy:0:0@sp1.muskrat.local] got pmi command (from 12): put kvsname=kvs_22910_0 key=OFA_Init_fail value=1 [proxy:0:0@sp1.muskrat.local] PMI response: cmd=put_result rc=0 msg=success [proxy:0:0@sp1.muskrat.local] forwarding command (cmd=put kvsname=kvs_22910_0 key=OFA_Init_fail value=1) upstream [mpiexec@sp1.muskrat.local] [pgid: 0] got PMI command: cmd=put kvsname=kvs_22910_0 key=OFA_Init_fail value=1 [0] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
-
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Did you ever get this to work? We would be quite interested in a solution.
best, Ralf
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found that I had to add:
export I_MPI_OFA_NUM_ADAPTERS=1
export I_MPI_OFA_ADAPTER_NAME="mlx5_0"
when using nodes with two IB adapters and using I_MPI_FABRICS=OFA. While DAPL did not have problems, OFA seems to get confused.
I am using Intel MPI 2018.1.163
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page