Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
27 Views

intel mpi failed with infiniband on new nodes of our cluster (Got FATAL event 3)

Hi,

We got new nodes on our cluster. On the first 12 old nodes intel mpi (intel cluster studio 2010, 2011, 2012) works without any problems. The 4 new nodes are exactly the same OS than the 12 old and the same installation (node image). It is the same hardware too.

We have I_MPI_FABRICS=shm:ofa

If I start mpirun on the 12 old nodes, it works without problems.
If I try to start a parallel job with one of the new node I get:

[bash]send desc error [1] Abort: Got FATAL event 3 at line 861 in file ../../ofa_utility.c [/bash]
If I try to start a local job on one of the new node, it works.
So It is linked with infiniband.

Strange, because a run with openmpi with infiniband works with the new nodes.

If I'm using I_MPI_FABRICS=shm:dapl with the new nodes it works.

Ideas ?

Best regards,
Guillaume

0 Kudos
11 Replies
Highlighted
Moderator
27 Views

Hi Guillaume,

What happens if you try to run with a non-parallel command?

[bash]mpirun -genv I_MPI_FABRICS shm:ofa -n 1 -host old_node hostname : -n 1 -host new_node hostname[/bash]
Also, on the parallel job, what is the output with -verbose and I_MPI_DEBUG=5?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
0 Kudos
Highlighted
27 Views

Sorry for the delay...

I tried your test. It does not work:
- under PBS/Torque I get: "-host (or -ghost) and -machinefile are incompatible"

- in a terminal I get:
mpiexec: unable to start all procs; may have invalid machine names
remaining specified hosts:
192.168.0.13 (n13.blabla)
192.168.0.14 (n14.blabla)

It do that on all the nodes...but the machine names are correct. So I don't understand.

Best regards
0 Kudos
Highlighted
Moderator
27 Views

Hi Guillaume,

The first error message is likely due to a lack of tight integration with Torque*. Could you please send me the output from running the same command with -verbose added? Are you able to ssh from an old node to a new node, or the reverse?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
0 Kudos
Highlighted
27 Views

Hi,

- do you mean --verbose, isn't it ? here the output with --verbose directly on n13:
[bash][16:20:41] denayer@n13 ~ $ mpirun --verbose -genv I_MPI_DEBUG 5 -genv I_MPI_FABRICS shm -n 1 -host n13 hostname : -n 1 -host n14 hostname WARNING: Unable to read mpd.hosts or list of hosts isn't provided. MPI job will be run on the current machine only. running mpdallexit on n13 LAUNCHED mpd on n13 via RUNNING: mpd on n13 mpiexec: unable to start all procs; may have invalid machine names remaining specified hosts: 192.168.0.14 (n14.marvin) [/bash] here the output with --verbose from master :

[bash][16:20:17] denayer@master ~ $ mpirun --verbose -genv I_MPI_DEBUG 5 -genv I_MPI_FABRICS shm -n 1 -host n13 hostname : -n 1 -host n14 hostname WARNING: Unable to read mpd.hosts or list of hosts isn't provided. MPI job will be run on the current machine only. running mpdallexit on master LAUNCHED mpd on master via RUNNING: mpd on master mpiexec: unable to start all procs; may have invalid machine names remaining specified hosts: 192.168.0.13 (n13.marvin) 192.168.0.14 (n14.marvin) [/bash]

- For your ssh question:
from master to n13: ok.
from master to n14: ok.
from n13 to master: ok
from n14 to master: ok
from n13 to n14: ok
from n14 to n13: ok

Regards
0 Kudos
Highlighted
Moderator
27 Views

Hi Guillaume,

What is the value of I_MPI_PROCESS_MANAGER? Which version of the Intel MPI Library are you using?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
0 Kudos
Highlighted
27 Views

We have 3 different one:
intel cluster toolkit 2010
intel cluster studio 2011
intel cluster studio 2012.

The errors above are with intel cluster studio 2011:

[13:44:33] denayer@master ~ $ mpirun -version
Intel MPI Library for Linux Version 4.0 Update 1
Build 20100910 Platform Intel 64 64-bit applications
Copyright (C) 2003-2010 Intel Corporation. All rights reserved


I_MPI_PROCESS_MANAGER has no value in my shell.

Regards
0 Kudos
Highlighted
Moderator
27 Views

Hi Guillaume,

What happens with Intel Cluster Studio 2012 (which contains Intel MPI Library 4.0 Update 3)?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
0 Kudos
Highlighted
27 Views

with Intel Cluster Studio 2013:
15:52:56] denayer@master ~ $ mpirun -version
Intel MPI Library for Linux* OS, Version 4.0 Update 3 Build 20110824
Copyright (C) 2003-2011, Intel Corporation. All rights reserved.

your command works:
[bash][15:53:48] denayer@master ~ $ mpirun -genv I_MPI_DEBUG 5 -genv I_MPI_FABRICS shm -n 1 -host n13 hostname : -n 1 -host n14 hostname n14 n13 [/bash]
with --verbose:

[bash][15:52:58] denayer@master ~ $ mpirun --verbose -genv I_MPI_DEBUG 5 -genv I_MPI_FABRICS shm -n 1 -host n13 hostname : -n 1 -host n14 hostname ================================================================================================== mpiexec options: ---------------- Base path: /opt/intel/ics_2012/impi/4.0.3.008/intel64/bin/ Bootstrap server: ssh Debug level: 1 Enable X: -1 Global environment: ------------------- I_MPI_PERHOST=allcores MODULE_VERSION_STACK=3.2.5 MKLROOT=/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/mkl MANPATH=/opt/intel/ics_2012/itac/8.0.3.007/man:/opt/intel/ics_2012/impi/4.0.3.008/man:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/man/en_US:/opt/intel/ics_2012/vtune_amplifier_xe_2011/man:/opt/modules/Modules/default/share/man:/opt/pbs/man:/opt/env-switcher/man:/usr/man:/usr/share/man:/usr/local/man:/usr/local/share/man:/usr/X11R6/man:/opt/c3-4/man HOSTNAME=master VT_MPI=impi4 I_MPI_PIN=0 INTEL_LICENSE_FILE=/opt/intel/licenses IPPROOT=/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/ipp I_MPI_F77=ifort SHELL=/bin/bash TERM=xterm HISTSIZE=200000 I_MPI_FABRICS=shm:dapl SSH_CLIENT=139.11.215.121 5290 22 LIBRARY_PATH=/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/compiler/lib/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/mkl/lib/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/ipp/../compiler/lib/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/ipp/lib/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/tbb/lib/intel64/cc4.1.0_libc2.4_kernel2.6.16.21 CVSROOT=:ext:fhpout@laplace.lstm.uni-erlangen.de:/data/linux/proj_tape/LSTM/fhpdev MODULE_SHELL=sh FPATH=/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/mkl/include SSH_TTY=/dev/pts/5 USER=denayer MODULE_OSCAR_USER=denayer LD_LIBRARY_PATH=/opt/intel/ics_2012/itac/8.0.3.007/itac/slib_impi4:/opt/intel/ics_2012/impi/4.0.3.008/intel64/lib:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/debugger/lib/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/mpirt/lib/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/compiler/lib/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/mkl/lib/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/ipp/../compiler/lib/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/ipp/lib/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/tbb/lib/intel64/cc4.1.0_libc2.4_kernel2.6.16.21:/home/denayer/FSI_new/FSI/Software/carat20/libraries/rlog-1.4/lib/:/home/denayer/FSI_new/FSI/Software/carat20/libraries/atlas/lib/:/opt/maui/lib:/opt/tecplot/tec360_2010/lib LS_COLORS=no=00:fi=00:di=01;35:ln=01;36:pi=40;33:so=01;33:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=01;32:*.cmd=01;32:*.exe=01;32:*.com=01;32:*.btm=01;32:*.bat=01;32:*.sh=01;32:*.csh=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tz=01;31:*.rpm=01;31:*.cpio=01;31:*.jpg=01;35:*.gif=01;35:*.bmp=01;35:*.xbm=01;35:*.xpm=01;35:*.png=01;35:*.tif=01;35: ENV=/home/denayer/.bashrc CPATH=/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/mkl/include:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/tbb/include TMOUT=36000 MSM_PRODUCT=MSM NLSPATH=/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/debugger/intel64/locale/en_US:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/compiler/lib/intel64/locale/en_US:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/mkl/lib/intel64/locale/en_US:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/ipp/lib/intel64/locale/en_US PATH=/opt/intel/ics_2012/itac/8.0.3.007/bin:/opt/intel/ics_2012/impi/4.0.3.008/intel64/bin:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/bin/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/mpirt/bin/intel64:/opt/intel/ics_2012/vtune_amplifier_xe_2011/bin64:/usr/kerberos/bin:/opt/maui/bin:/opt/tecplot/tec360_2010/bin:/usr/local/bin:/bin:/usr/bin:/opt/pbs/bin:/opt/pbs/lib/xpbs/bin:/opt/env-switcher/bin:/opt/ansys_inc/shared_files/licensing/lic_admin:/opt/ansys_inc/v130/icemcfd/linux64_amd/bin:/opt/ansys_inc/v130/Framework/bin/Linux64:/opt/ansys_inc/v130/CFX/bin:/opt/c3-4/:/home/denayer/bin:.:/opt/gid/gid_9:/opt/matlab/r2011a/bin MAIL=/var/spool/mail/denayer MODULE_VERSION=3.2.5 VT_ADD_LIBS=-ldwarf -lelf -lvtunwind -lnsl -lm -ldl -lpthread I_MPI_TUNER_DATA_DIR=/opt/intel/ics_2012/impi/4.0.3.008/etc64/ TBBROOT=/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/tbb PWD=/home/denayer _LMFILES_=/opt/modules/oscar-modulefiles/torque-oscar/2.1.10:/opt/env-switcher/share/env-switcher/ansys/ansys-13.0:/opt/env-switcher/share/env-switcher/tecplot/tec360-2010:/opt/modules/oscar-modulefiles/switcher/1.0.13:/opt/modules/oscar-modulefiles/default-manpath/1.0.1:/opt/modules/oscar-modulefiles/maui/3.2.6:/opt/modules/modulefiles/oscar-modules/1.0.5:/opt/modules/Modules/3.2.5/modulefiles/dot:/opt/env-switcher/share/env-switcher/tools/intel-vtune-2011:/opt/env-switcher/share/env-switcher/gid/gid-9.0.6:/opt/env-switcher/share/env-switcher/matlab/matlab-r2011a:/opt/env-switcher/share/env-switcher/compiler/intel-compiler-12.1:/opt/env-switcher/share/env-switcher/mpi/intel-cluster-toolkit-2012.0.032 CARAT_LIC_PATH=/home/denayer/FSI_new/FSI/Software/carat20/exe EDITOR=/usr/bin/emacs LANG=en_US.UTF-8 MODULEPATH=/opt/env-switcher/share/env-switcher:/opt/modules/oscar-modulefiles:/opt/modules/version:/opt/modules/Modules/$MODULE_VERSION/modulefiles:/opt/modules/modulefiles: LOADEDMODULES=torque-oscar/2.1.10:ansys/ansys-13.0:tecplot/tec360-2010:switcher/1.0.13:default-manpath/1.0.1:maui/3.2.6:oscar-modules/1.0.5:dot:tools/intel-vtune-2011:gid/gid-9.0.6:matlab/matlab-r2011a:compiler/intel-compiler-12.1:mpi/intel-cluster-toolkit-2012.0.032 VT_LIB_DIR=/opt/intel/ics_2012/itac/8.0.3.007/itac/lib_impi4 I_MPI_F90=ifort MPIROOTDIR=/opt/intel/impi/4.0.1/intel64/lib I_MPI_CC=icc VT_ROOT=/opt/intel/ics_2012/itac/8.0.3.007 SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass HOME=/home/denayer SHLVL=2 I_MPI_HYDRA_BOOTSTRAP_EXEC=ssh I_MPI_CXX=icpc I_MPI_MPD_RSH=ssh MSM_HOME=/usr/local/MegaRAID Storage Manager FHPSYSTEM=INTEL64 VT_SLIB_DIR=/opt/intel/ics_2012/itac/8.0.3.007/itac/slib_impi4 I_MPI_FC=ifort LOGNAME=denayer CVS_RSH=ssh SSH_CONNECTION=139.11.215.121 5290 139.11.215.117 22 CLASSPATH=/opt/intel/ics_2012/itac/8.0.3.007/itac/lib_impi4 MODULESHOME=/opt/modules/Modules/3.2.5 CPRO_PATH=/opt/intel/ics_2012/composer_xe_2011_sp1.6.233 LESSOPEN=|/usr/bin/lesspipe.sh %s CVSEDITOR=emacs FHPTARGET=parallel INCLUDE=/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/mkl/include:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/ipp/include G_BROKEN_FILENAMES=1 I_MPI_ROOT=/opt/intel/ics_2012/impi/4.0.3.008 _=/opt/intel/ics_2012/impi/4.0.3.008/intel64/bin/mpiexec.hydra User set environment: --------------------- I_MPI_DEBUG=5 I_MPI_FABRICS=shm Proxy information: ********************* Proxy ID: 1 ----------------- Proxy name: n13 Process count: 1 Start PID: 0 Proxy exec list: .................... Exec: hostname; Process count: 1 Proxy ID: 2 ----------------- Proxy name: n14 Process count: 1 Start PID: 1 Proxy exec list: .................... Exec: hostname; Process count: 1 ================================================================================================== [mpiexec@master] Timeout set to -1 (-1 means infinite) [mpiexec@master] Got a control port string of master:47174 Proxy launch args: /opt/intel/ics_2012/impi/4.0.3.008/intel64/bin/pmi_proxy --control-port master:47174 --debug --pmi-connect lazy-cache --pmi-aggregate -s 0 --bootstrap ssh --bootstrap-exec ssh --demux poll --pgid 0 --enable-stdin 1 --proxy-id [mpiexec@master] PMI FD: (null); PMI PORT: (null); PMI ID/RANK: -1 Arguments being passed to proxy 0: --version 1.3 --interface-env-name MPICH_INTERFACE_HOSTNAME --hostname n13 --global-core-count 2 --global-process-count 2 --auto-cleanup 1 --pmi-rank -1 --pmi-kvsname kvs_21039_0 --pmi-process-mapping (vector,(0,2,1)) --binding mode=off --bindlib ipl --ckpoint-num -1 --global-inherited-env 70 'I_MPI_PERHOST=allcores' 'MODULE_VERSION_STACK=3.2.5' 'MKLROOT=/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/mkl' 'MANPATH=/opt/intel/ics_2012/itac/8.0.3.007/man:/opt/intel/ics_2012/impi/4.0.3.008/man:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/man/en_US:/opt/intel/ics_2012/vtune_amplifier_xe_2011/man:/opt/modules/Modules/default/share/man:/opt/pbs/man:/opt/env-switcher/man:/usr/man:/usr/share/man:/usr/local/man:/usr/local/share/man:/usr/X11R6/man:/opt/c3-4/man' 'HOSTNAME=master' 'VT_MPI=impi4' 'I_MPI_PIN=0' 'INTEL_LICENSE_FILE=/opt/intel/licenses' 'IPPROOT=/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/ipp' 'I_MPI_F77=ifort' 'SHELL=/bin/bash' 'TERM=xterm' 'HISTSIZE=200000' 'I_MPI_FABRICS=shm:dapl' 'SSH_CLIENT=139.11.215.121 5290 22' 'LIBRARY_PATH=/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/compiler/lib/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/mkl/lib/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/ipp/../compiler/lib/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/ipp/lib/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/tbb/lib/intel64/cc4.1.0_libc2.4_kernel2.6.16.21' 'CVSROOT=:ext:fhpout@laplace.lstm.uni-erlangen.de:/data/linux/proj_tape/LSTM/fhpdev' 'MODULE_SHELL=sh' 'FPATH=/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/mkl/include' 'SSH_TTY=/dev/pts/5' 'USER=denayer' 'MODULE_OSCAR_USER=denayer' 'LD_LIBRARY_PATH=/opt/intel/ics_2012/itac/8.0.3.007/itac/slib_impi4:/opt/intel/ics_2012/impi/4.0.3.008/intel64/lib:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/debugger/lib/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/mpirt/lib/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/compiler/lib/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/mkl/lib/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/ipp/../compiler/lib/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/ipp/lib/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/tbb/lib/intel64/cc4.1.0_libc2.4_kernel2.6.16.21:/home/denayer/FSI_new/FSI/Software/carat20/libraries/rlog-1.4/lib/:/home/denayer/FSI_new/FSI/Software/carat20/libraries/atlas/lib/:/opt/maui/lib:/opt/tecplot/tec360_2010/lib' 'LS_COLORS=no=00:fi=00:di=01;35:ln=01;36:pi=40;33:so=01;33:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=01;32:*.cmd=01;32:*.exe=01;32:*.com=01;32:*.btm=01;32:*.bat=01;32:*.sh=01;32:*.csh=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tz=01;31:*.rpm=01;31:*.cpio=01;31:*.jpg=01;35:*.gif=01;35:*.bmp=01;35:*.xbm=01;35:*.xpm=01;35:*.png=01;35:*.tif=01;35:' 'ENV=/home/denayer/.bashrc' 'CPATH=/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/mkl/include:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/tbb/include' 'TMOUT=36000' 'MSM_PRODUCT=MSM' 'NLSPATH=/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/debugger/intel64/locale/en_US:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/compiler/lib/intel64/locale/en_US:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/mkl/lib/intel64/locale/en_US:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/ipp/lib/intel64/locale/en_US' 'PATH=/opt/intel/ics_2012/itac/8.0.3.007/bin:/opt/intel/ics_2012/impi/4.0.3.008/intel64/bin:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/bin/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/mpirt/bin/intel64:/opt/intel/ics_2012/vtune_amplifier_xe_2011/bin64:/usr/kerberos/bin:/opt/maui/bin:/opt/tecplot/tec360_2010/bin:/usr/local/bin:/bin:/usr/bin:/opt/pbs/bin:/opt/pbs/lib/xpbs/bin:/opt/env-switcher/bin:/opt/ansys_inc/shared_files/licensing/lic_admin:/opt/ansys_inc/v130/icemcfd/linux64_amd/bin:/opt/ansys_inc/v130/Framework/bin/Linux64:/opt/ansys_inc/v130/CFX/bin:/opt/c3-4/:/home/denayer/bin:.:/opt/gid/gid_9:/opt/matlab/r2011a/bin' 'MAIL=/var/spool/mail/denayer' 'MODULE_VERSION=3.2.5' 'VT_ADD_LIBS=-ldwarf -lelf -lvtunwind -lnsl -lm -ldl -lpthread' 'I_MPI_TUNER_DATA_DIR=/opt/intel/ics_2012/impi/4.0.3.008/etc64/' 'TBBROOT=/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/tbb' 'PWD=/home/denayer' '_LMFILES_=/opt/modules/oscar-modulefiles/torque-oscar/2.1.10:/opt/env-switcher/share/env-switcher/ansys/ansys-13.0:/opt/env-switcher/share/env-switcher/tecplot/tec360-2010:/opt/modules/oscar-modulefiles/switcher/1.0.13:/opt/modules/oscar-modulefiles/default-manpath/1.0.1:/opt/modules/oscar-modulefiles/maui/3.2.6:/opt/modules/modulefiles/oscar-modules/1.0.5:/opt/modules/Modules/3.2.5/modulefiles/dot:/opt/env-switcher/share/env-switcher/tools/intel-vtune-2011:/opt/env-switcher/share/env-switcher/gid/gid-9.0.6:/opt/env-switcher/share/env-switcher/matlab/matlab-r2011a:/opt/env-switcher/share/env-switcher/compiler/intel-compiler-12.1:/opt/env-switcher/share/env-switcher/mpi/intel-cluster-toolkit-2012.0.032' 'CARAT_LIC_PATH=/home/denayer/FSI_new/FSI/Software/carat20/exe' 'EDITOR=/usr/bin/emacs' 'LANG=en_US.UTF-8' 'MODULEPATH=/opt/env-switcher/share/env-switcher:/opt/modules/oscar-modulefiles:/opt/modules/version:/opt/modules/Modules/$MODULE_VERSION/modulefiles:/opt/modules/modulefiles:' 'LOADEDMODULES=torque-oscar/2.1.10:ansys/ansys-13.0:tecplot/tec360-2010:switcher/1.0.13:default-manpath/1.0.1:maui/3.2.6:oscar-modules/1.0.5:dot:tools/intel-vtune-2011:gid/gid-9.0.6:matlab/matlab-r2011a:compiler/intel-compiler-12.1:mpi/intel-cluster-toolkit-2012.0.032' 'VT_LIB_DIR=/opt/intel/ics_2012/itac/8.0.3.007/itac/lib_impi4' 'I_MPI_F90=ifort' 'MPIROOTDIR=/opt/intel/impi/4.0.1/intel64/lib' 'I_MPI_CC=icc' 'VT_ROOT=/opt/intel/ics_2012/itac/8.0.3.007' 'SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass' 'HOME=/home/denayer' 'SHLVL=2' 'I_MPI_HYDRA_BOOTSTRAP_EXEC=ssh' 'I_MPI_CXX=icpc' 'I_MPI_MPD_RSH=ssh' 'MSM_HOME=/usr/local/MegaRAID Storage Manager' 'FHPSYSTEM=INTEL64' 'VT_SLIB_DIR=/opt/intel/ics_2012/itac/8.0.3.007/itac/slib_impi4' 'I_MPI_FC=ifort' 'LOGNAME=denayer' 'CVS_RSH=ssh' 'SSH_CONNECTION=139.11.215.121 5290 139.11.215.117 22' 'CLASSPATH=/opt/intel/ics_2012/itac/8.0.3.007/itac/lib_impi4' 'MODULESHOME=/opt/modules/Modules/3.2.5' 'CPRO_PATH=/opt/intel/ics_2012/composer_xe_2011_sp1.6.233' 'LESSOPEN=|/usr/bin/lesspipe.sh %s' 'CVSEDITOR=emacs' 'FHPTARGET=parallel' 'INCLUDE=/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/mkl/include:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/ipp/include' 'G_BROKEN_FILENAMES=1' 'I_MPI_ROOT=/opt/intel/ics_2012/impi/4.0.3.008' '_=/opt/intel/ics_2012/impi/4.0.3.008/intel64/bin/mpiexec.hydra' --global-user-env 2 'I_MPI_DEBUG=5' 'I_MPI_FABRICS=shm' --global-system-env 0 --start-pid 0 --proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir /home/denayer --exec-args 1 hostname [mpiexec@master] PMI FD: (null); PMI PORT: (null); PMI ID/RANK: -1 Arguments being passed to proxy 1: --version 1.3 --interface-env-name MPICH_INTERFACE_HOSTNAME --hostname n14 --global-core-count 2 --global-process-count 2 --auto-cleanup 1 --pmi-rank -1 --pmi-kvsname kvs_21039_0 --pmi-process-mapping (vector,(0,2,1)) --binding mode=off --bindlib ipl --ckpoint-num -1 --global-inherited-env 70 'I_MPI_PERHOST=allcores' 'MODULE_VERSION_STACK=3.2.5' 'MKLROOT=/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/mkl' 'MANPATH=/opt/intel/ics_2012/itac/8.0.3.007/man:/opt/intel/ics_2012/impi/4.0.3.008/man:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/man/en_US:/opt/intel/ics_2012/vtune_amplifier_xe_2011/man:/opt/modules/Modules/default/share/man:/opt/pbs/man:/opt/env-switcher/man:/usr/man:/usr/share/man:/usr/local/man:/usr/local/share/man:/usr/X11R6/man:/opt/c3-4/man' 'HOSTNAME=master' 'VT_MPI=impi4' 'I_MPI_PIN=0' 'INTEL_LICENSE_FILE=/opt/intel/licenses' 'IPPROOT=/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/ipp' 'I_MPI_F77=ifort' 'SHELL=/bin/bash' 'TERM=xterm' 'HISTSIZE=200000' 'I_MPI_FABRICS=shm:dapl' 'SSH_CLIENT=139.11.215.121 5290 22' 'LIBRARY_PATH=/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/compiler/lib/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/mkl/lib/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/ipp/../compiler/lib/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/ipp/lib/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/tbb/lib/intel64/cc4.1.0_libc2.4_kernel2.6.16.21' 'CVSROOT=:ext:fhpout@laplace.lstm.uni-erlangen.de:/data/linux/proj_tape/LSTM/fhpdev' 'MODULE_SHELL=sh' 'FPATH=/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/mkl/include' 'SSH_TTY=/dev/pts/5' 'USER=denayer' 'MODULE_OSCAR_USER=denayer' 'LD_LIBRARY_PATH=/opt/intel/ics_2012/itac/8.0.3.007/itac/slib_impi4:/opt/intel/ics_2012/impi/4.0.3.008/intel64/lib:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/debugger/lib/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/mpirt/lib/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/compiler/lib/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/mkl/lib/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/ipp/../compiler/lib/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/ipp/lib/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/tbb/lib/intel64/cc4.1.0_libc2.4_kernel2.6.16.21:/home/denayer/FSI_new/FSI/Software/carat20/libraries/rlog-1.4/lib/:/home/denayer/FSI_new/FSI/Software/carat20/libraries/atlas/lib/:/opt/maui/lib:/opt/tecplot/tec360_2010/lib' 'LS_COLORS=no=00:fi=00:di=01;35:ln=01;36:pi=40;33:so=01;33:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=01;32:*.cmd=01;32:*.exe=01;32:*.com=01;32:*.btm=01;32:*.bat=01;32:*.sh=01;32:*.csh=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tz=01;31:*.rpm=01;31:*.cpio=01;31:*.jpg=01;35:*.gif=01;35:*.bmp=01;35:*.xbm=01;35:*.xpm=01;35:*.png=01;35:*.tif=01;35:' 'ENV=/home/denayer/.bashrc' 'CPATH=/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/mkl/include:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/tbb/include' 'TMOUT=36000' 'MSM_PRODUCT=MSM' 'NLSPATH=/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/debugger/intel64/locale/en_US:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/compiler/lib/intel64/locale/en_US:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/mkl/lib/intel64/locale/en_US:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/ipp/lib/intel64/locale/en_US' 'PATH=/opt/intel/ics_2012/itac/8.0.3.007/bin:/opt/intel/ics_2012/impi/4.0.3.008/intel64/bin:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/bin/intel64:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/mpirt/bin/intel64:/opt/intel/ics_2012/vtune_amplifier_xe_2011/bin64:/usr/kerberos/bin:/opt/maui/bin:/opt/tecplot/tec360_2010/bin:/usr/local/bin:/bin:/usr/bin:/opt/pbs/bin:/opt/pbs/lib/xpbs/bin:/opt/env-switcher/bin:/opt/ansys_inc/shared_files/licensing/lic_admin:/opt/ansys_inc/v130/icemcfd/linux64_amd/bin:/opt/ansys_inc/v130/Framework/bin/Linux64:/opt/ansys_inc/v130/CFX/bin:/opt/c3-4/:/home/denayer/bin:.:/opt/gid/gid_9:/opt/matlab/r2011a/bin' 'MAIL=/var/spool/mail/denayer' 'MODULE_VERSION=3.2.5' 'VT_ADD_LIBS=-ldwarf -lelf -lvtunwind -lnsl -lm -ldl -lpthread' 'I_MPI_TUNER_DATA_DIR=/opt/intel/ics_2012/impi/4.0.3.008/etc64/' 'TBBROOT=/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/tbb' 'PWD=/home/denayer' '_LMFILES_=/opt/modules/oscar-modulefiles/torque-oscar/2.1.10:/opt/env-switcher/share/env-switcher/ansys/ansys-13.0:/opt/env-switcher/share/env-switcher/tecplot/tec360-2010:/opt/modules/oscar-modulefiles/switcher/1.0.13:/opt/modules/oscar-modulefiles/default-manpath/1.0.1:/opt/modules/oscar-modulefiles/maui/3.2.6:/opt/modules/modulefiles/oscar-modules/1.0.5:/opt/modules/Modules/3.2.5/modulefiles/dot:/opt/env-switcher/share/env-switcher/tools/intel-vtune-2011:/opt/env-switcher/share/env-switcher/gid/gid-9.0.6:/opt/env-switcher/share/env-switcher/matlab/matlab-r2011a:/opt/env-switcher/share/env-switcher/compiler/intel-compiler-12.1:/opt/env-switcher/share/env-switcher/mpi/intel-cluster-toolkit-2012.0.032' 'CARAT_LIC_PATH=/home/denayer/FSI_new/FSI/Software/carat20/exe' 'EDITOR=/usr/bin/emacs' 'LANG=en_US.UTF-8' 'MODULEPATH=/opt/env-switcher/share/env-switcher:/opt/modules/oscar-modulefiles:/opt/modules/version:/opt/modules/Modules/$MODULE_VERSION/modulefiles:/opt/modules/modulefiles:' 'LOADEDMODULES=torque-oscar/2.1.10:ansys/ansys-13.0:tecplot/tec360-2010:switcher/1.0.13:default-manpath/1.0.1:maui/3.2.6:oscar-modules/1.0.5:dot:tools/intel-vtune-2011:gid/gid-9.0.6:matlab/matlab-r2011a:compiler/intel-compiler-12.1:mpi/intel-cluster-toolkit-2012.0.032' 'VT_LIB_DIR=/opt/intel/ics_2012/itac/8.0.3.007/itac/lib_impi4' 'I_MPI_F90=ifort' 'MPIROOTDIR=/opt/intel/impi/4.0.1/intel64/lib' 'I_MPI_CC=icc' 'VT_ROOT=/opt/intel/ics_2012/itac/8.0.3.007' 'SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass' 'HOME=/home/denayer' 'SHLVL=2' 'I_MPI_HYDRA_BOOTSTRAP_EXEC=ssh' 'I_MPI_CXX=icpc' 'I_MPI_MPD_RSH=ssh' 'MSM_HOME=/usr/local/MegaRAID Storage Manager' 'FHPSYSTEM=INTEL64' 'VT_SLIB_DIR=/opt/intel/ics_2012/itac/8.0.3.007/itac/slib_impi4' 'I_MPI_FC=ifort' 'LOGNAME=denayer' 'CVS_RSH=ssh' 'SSH_CONNECTION=139.11.215.121 5290 139.11.215.117 22' 'CLASSPATH=/opt/intel/ics_2012/itac/8.0.3.007/itac/lib_impi4' 'MODULESHOME=/opt/modules/Modules/3.2.5' 'CPRO_PATH=/opt/intel/ics_2012/composer_xe_2011_sp1.6.233' 'LESSOPEN=|/usr/bin/lesspipe.sh %s' 'CVSEDITOR=emacs' 'FHPTARGET=parallel' 'INCLUDE=/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/mkl/include:/opt/intel/ics_2012/composer_xe_2011_sp1.6.233/ipp/include' 'G_BROKEN_FILENAMES=1' 'I_MPI_ROOT=/opt/intel/ics_2012/impi/4.0.3.008' '_=/opt/intel/ics_2012/impi/4.0.3.008/intel64/bin/mpiexec.hydra' --global-user-env 2 'I_MPI_DEBUG=5' 'I_MPI_FABRICS=shm' --global-system-env 0 --start-pid 1 --proxy-core-count 1 --exec --exec-appnum 1 --exec-proc-count 1 --exec-local-env 0 --exec-wdir /home/denayer --exec-args 1 hostname [mpiexec@master] Launch arguments: ssh -x -q n13 /opt/intel/ics_2012/impi/4.0.3.008/intel64/bin/pmi_proxy --control-port master:47174 --debug --pmi-connect lazy-cache --pmi-aggregate -s 0 --bootstrap ssh --bootstrap-exec ssh --demux poll --pgid 0 --enable-stdin 1 --proxy-id 0 [mpiexec@master] Launch arguments: ssh -x -q n14 /opt/intel/ics_2012/impi/4.0.3.008/intel64/bin/pmi_proxy --control-port master:47174 --debug --pmi-connect lazy-cache --pmi-aggregate -s 0 --bootstrap ssh --bootstrap-exec ssh --demux poll --pgid 0 --enable-stdin 1 --proxy-id 1 [mpiexec@master] STDIN will be redirected to 1 fd(s): 7 [proxy:0:0@n13] Start PMI_proxy 0 [proxy:0:0@n13] STDIN will be redirected to 1 fd(s): 7 [proxy:0:1@n14] Start PMI_proxy 1 [proxy:0:0@n13] got crush from 4, 0 n13 [proxy:0:1@n14] got crush from 4, 0 n14 [/bash]
I did the tests with -genv I_MPI_FABRICS shm:ofa, and it works too.

Do you see interesting infos to solve our original problem ?

Thx a lot
0 Kudos
Highlighted
Moderator
27 Views

Hi Guillaume,

Do you have a systemwide mpd.hosts file? Make certain it contains the old nodes and the new nodes.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
0 Kudos
Highlighted
27 Views

No, there is no mpd.hosts file. find or locate give 0 entry.

Where is this file normally ?

Regards
0 Kudos
Highlighted
Moderator
27 Views

Hi Guillaume,

Generally, there wouldn't be one, I wanted to make certain that there wasn't one. Back to the original error, did you get that error from all versions of Intel MPI Library?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
0 Kudos