- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Hi everyone,
This is very probably an installation issue of the Parallel Studio 2016 (update 1) on my system...So here are the details:
I have installed Intel Parallel Studio 2016 (update 1) on my server: two sockets with Intel Xeon processors (Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz), running Ubuntu 12.04 (I know that this is not supposed to be a supported OS...at least this is what the requirements checking thingy in the Parallel Studio install says...).
The problem is really simple:
If I am logged in as a regular user and if I use mpirun to run any program...The program execution takes forever. Looking at htop it seems that the used cores spend their time in "kernel" mode.
If I am logged in as root, using the same mpirun command, the programs runs perfectly.
I tried a "violent" solution by changing the permission of the directory /opt/intel to chmod ugo+rwx
But this did not anything.
Thanks in advance for your help!
TL;DR; Intel mpirun, runs normally in root, but when using a regular user it is extremely slow.
Enlace copiado
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Hi Jeremie,
Could you please provide more details about your hung MPI run - try to run it with I_MPI_DEBUG=100 and I_MPI_HYDRA_DEBUG=1 variables and provide the output (with the command line and environment variables).
Also try the following scenarios:
mpirun ... hostname
mpirun ... IMB-MPI1 pingpong
It may be an environmental problem - check your SSH settings (SSH should be passwordless).
BTW you wrote:
If I am logged in as a regular user and if I use mpirun to run any program...The program execution takes forever.
So has the MPI scenario finally finished?
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
EDIT: From the debug output it looks like it's a ssh problem (?)...the thing is that I am using MPI on only one node..I should not need SSH at all, right ?
__
Just to be clear, I am not using any SSH in my experience:
I run MPI on only one machine.
The program I am running is a kernel from the NAS Benchmarks.
__
So far I did not wait for the scenario to finish...because it is really slow. (Of course running as root the programs finishes without any problem. and with good performances).
I left a kernel of the NAS Benchmark to run over the night...it is still not done running...so I guess there is no point in waiting longer :)
__
Is it possible that the problem can come from the fact that I installed Intel Parallel Studio as root ? (Anyway, I had no choice to do otherwise, as I wanted to install Intel Parallel Studio in /opt/intel
__
With the two environment variables I_MPI_DEBUG=100 and I_MPI_HYDRA_DEBUG=1 and running the following command:
/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/mpirun -n 16 ./cg.C.16
Here is the output:
host: lizhi ================================================================================================== mpiexec options: ---------------- Base path: /opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/ Launcher: ssh Debug level: 1 Enable X: -1 Global environment: ------------------- I_MPI_PERHOST=allcores LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib:/usr/lib/jdk1.7.0/jre/lib/amd64/server/:/usr/local/cuda-7.0/lib64:/usr/local/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib LESSOPEN=| /usr/bin/lesspipe %s MAIL=/var/mail/jeremie SSH_CLIENT=172.16.0.157 51260 22 USER=jeremie LANGUAGE=en_US:en LC_TIME=en_US.UTF-8 SHLVL=1 OLDPWD=/home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi HOME=/home/jeremie XDG_SESSION_COOKIE=e6534fbbf1939771344e386e0000016f-1453390831.9609-361631696 SSH_TTY=/dev/pts/4 LC_MONETARY=en_US.UTF-8 LOGNAME=jeremie _=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/mpirun TERM=xterm PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/cuda-7.0/bin:/home/jeremie/pgashpc/Compilers/UPCCompilerLizhi/bin:/home/jeremie/pgashpc/EnergyManagement/IntelPCM LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8 LANG=en_US.UTF-8 LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36: SHELL=/bin/bash LC_NAME=en_US.UTF-8 LESSCLOSE=/usr/bin/lesspipe %s %s LC_MEASUREMENT=en_US.UTF-8 I_MPI_MPIRUN=mpirun LC_IDENTIFICATION=en_US.UTF-8 I_MPI_DEBUG=100 LC_ALL=en_US.UTF-8 PWD=/home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi/bin I_MPI_HYDRA_DEBUG=1 SSH_CONNECTION=172.16.0.157 51260 192.168.202.79 22 LC_NUMERIC=en_US.UTF-8 I_MPI_ROOT=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi LC_PAPER=en_US.UTF-8 MANPATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/man:/usr/local/man:/usr/local/share/man:/usr/share/man:/home/jeremie/pgashpc/Compilers/UPCCompilerLizhi/man Hydra internal environment: --------------------------- MPIR_CVAR_NEMESIS_ENABLE_CKPOINT=1 GFORTRAN_UNBUFFERED_PRECONNECTED=y I_MPI_HYDRA_UUID=7e7a0000-2358-0eb4-e829-050001017f00 DAPL_NETWORK_PROCESS_NUM=16 Proxy information: ********************* [1] proxy: lizhi (16 cores) Exec list: ./cg.C.16 (16 processes); ================================================================================================== [mpiexec@lizhi] Timeout set to -1 (-1 means infinite) [mpiexec@lizhi] Got a control port string of lizhi:52655 Proxy launch args: /opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/pmi_proxy --control-port lizhi:52655 --debug --pmi-connect lazy-cache --pmi-aggregate -s 0 --rmk user --launcher ssh --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 949760717 --usize -2 --proxy-id Arguments being passed to proxy 0: --version 3.1.2 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname lizhi --global-core-map 0,16,16 --pmi-id-map 0,0 --global-process-count 16 --auto-cleanup 1 --pmi-kvsname kvs_31358_0 --pmi-process-mapping (vector,(0,1,16)) --topolib ipl --ckpointlib blcr --ckpoint-prefix /tmp --ckpoint-preserve 1 --ckpoint off --ckpoint-num -1 --global-inherited-env 37 'I_MPI_PERHOST=allcores' 'LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib:/usr/lib/jdk1.7.0/jre/lib/amd64/server/:/usr/local/cuda-7.0/lib64:/usr/local/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib' 'LESSOPEN=| /usr/bin/lesspipe %s' 'MAIL=/var/mail/jeremie' 'SSH_CLIENT=172.16.0.157 51260 22' 'USER=jeremie' 'LANGUAGE=en_US:en' 'LC_TIME=en_US.UTF-8' 'SHLVL=1' 'OLDPWD=/home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi' 'HOME=/home/jeremie' 'XDG_SESSION_COOKIE=e6534fbbf1939771344e386e0000016f-1453390831.9609-361631696' 'SSH_TTY=/dev/pts/4' 'LC_MONETARY=en_US.UTF-8' 'LOGNAME=jeremie' '_=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/mpirun' 'TERM=xterm' 'PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/cuda-7.0/bin:/home/jeremie/pgashpc/Compilers/UPCCompilerLizhi/bin:/home/jeremie/pgashpc/EnergyManagement/IntelPCM' 'LC_ADDRESS=en_US.UTF-8' 'LC_TELEPHONE=en_US.UTF-8' 'LANG=en_US.UTF-8' 'LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36:' 'SHELL=/bin/bash' 'LC_NAME=en_US.UTF-8' 'LESSCLOSE=/usr/bin/lesspipe %s %s' 'LC_MEASUREMENT=en_US.UTF-8' 'I_MPI_MPIRUN=mpirun' 'LC_IDENTIFICATION=en_US.UTF-8' 'I_MPI_DEBUG=100' 'LC_ALL=en_US.UTF-8' 'PWD=/home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi/bin' 'I_MPI_HYDRA_DEBUG=1' 'SSH_CONNECTION=172.16.0.157 51260 192.168.202.79 22' 'LC_NUMERIC=en_US.UTF-8' 'I_MPI_ROOT=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi' 'LC_PAPER=en_US.UTF-8' 'MANPATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/man:/usr/local/man:/usr/local/share/man:/usr/share/man:/home/jeremie/pgashpc/Compilers/UPCCompilerLizhi/man' --global-user-env 0 --global-system-env 4 'MPIR_CVAR_NEMESIS_ENABLE_CKPOINT=1' 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' 'I_MPI_HYDRA_UUID=7e7a0000-2358-0eb4-e829-050001017f00' 'DAPL_NETWORK_PROCESS_NUM=16' --proxy-core-count 16 --mpi-cmd-env mpirun -n 16 ./cg.C.16 --exec --exec-appnum 0 --exec-proc-count 16 --exec-local-env 0 --exec-wdir /home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi/bin --exec-args 1 ./cg.C.16 [mpiexec@lizhi] Launch arguments: /opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/pmi_proxy --control-port lizhi:52655 --debug --pmi-connect lazy-cache --pmi-aggregate -s 0 --rmk user --launcher ssh --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 949760717 --usize -2 --proxy-id 0 [mpiexec@lizhi] STDIN will be redirected to 1 fd(s): 11 [proxy:0:0@lizhi] Start PMI_proxy 0 [proxy:0:0@lizhi] STDIN will be redirected to 1 fd(s): 17 [proxy:0:0@lizhi] got pmi command (from 12): init pmi_version=1 pmi_subversion=1 [proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0 [proxy:0:0@lizhi] got pmi command (from 14): init pmi_version=1 pmi_subversion=1 [proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0 [proxy:0:0@lizhi] got pmi command (from 16): init pmi_version=1 pmi_subversion=1 [proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0 [proxy:0:0@lizhi] got pmi command (from 12): get_maxes [proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024 [proxy:0:0@lizhi] got pmi command (from 14): get_maxes [proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024 [proxy:0:0@lizhi] got pmi command (from 21): init pmi_version=1 pmi_subversion=1 [proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0 [proxy:0:0@lizhi] got pmi command (from 27): init pmi_version=1 pmi_subversion=1 [proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0 [proxy:0:0@lizhi] got pmi command (from 12): barrier_in [proxy:0:0@lizhi] got pmi command (from 14): barrier_in [proxy:0:0@lizhi] got pmi command (from 16): get_maxes [proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024 [proxy:0:0@lizhi] got pmi command (from 21): get_maxes [proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024 [proxy:0:0@lizhi] got pmi command (from 24): init pmi_version=1 pmi_subversion=1 [proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0 [proxy:0:0@lizhi] got pmi command (from 30): init pmi_version=1 pmi_subversion=1 [proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0 [proxy:0:0@lizhi] got pmi command (from 16): barrier_in [proxy:0:0@lizhi] got pmi command (from 21): barrier_in [proxy:0:0@lizhi] got pmi command (from 24): get_maxes [proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024 [proxy:0:0@lizhi] got pmi command (from 27): get_maxes [proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024 [proxy:0:0@lizhi] got pmi command (from 36): init pmi_version=1 pmi_subversion=1 [proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0 [proxy:0:0@lizhi] got pmi command (from 39): init pmi_version=1 pmi_subversion=1 [proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0 [proxy:0:0@lizhi] got pmi command (from 42): init pmi_version=1 pmi_subversion=1 [proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0 [proxy:0:0@lizhi] got pmi command (from 45): init pmi_version=1 pmi_subversion=1 [proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0 [proxy:0:0@lizhi] got pmi command (from 48): init pmi_version=1 pmi_subversion=1 [proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0 [proxy:0:0@lizhi] got pmi command (from 51): init pmi_version=1 pmi_subversion=1 [proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0 [proxy:0:0@lizhi] got pmi command (from 54): init pmi_version=1 pmi_subversion=1 [proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0 [proxy:0:0@lizhi] got pmi command (from 57): init pmi_version=1 pmi_subversion=1 [proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0 [proxy:0:0@lizhi] got pmi command (from 24): barrier_in [proxy:0:0@lizhi] got pmi command (from 27): barrier_in [proxy:0:0@lizhi] got pmi command (from 30): get_maxes [proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024 [proxy:0:0@lizhi] got pmi command (from 33): init pmi_version=1 pmi_subversion=1 [proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0 [proxy:0:0@lizhi] got pmi command (from 36): get_maxes [proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024 [proxy:0:0@lizhi] got pmi command (from 39): get_maxes [proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024 [proxy:0:0@lizhi] got pmi command (from 42): get_maxes [proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024 [proxy:0:0@lizhi] got pmi command (from 45): get_maxes [proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024 [proxy:0:0@lizhi] got pmi command (from 48): get_maxes [proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024 [proxy:0:0@lizhi] got pmi command (from 51): get_maxes [proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024 [proxy:0:0@lizhi] got pmi command (from 54): get_maxes [proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024 [proxy:0:0@lizhi] got pmi command (from 30): barrier_in [proxy:0:0@lizhi] got pmi command (from 33): get_maxes [proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024 [proxy:0:0@lizhi] got pmi command (from 36): barrier_in [proxy:0:0@lizhi] got pmi command (from 39): barrier_in [proxy:0:0@lizhi] got pmi command (from 42): barrier_in [proxy:0:0@lizhi] got pmi command (from 45): barrier_in [proxy:0:0@lizhi] got pmi command (from 48): barrier_in [proxy:0:0@lizhi] got pmi command (from 51): barrier_in [proxy:0:0@lizhi] got pmi command (from 57): get_maxes [proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024 [proxy:0:0@lizhi] got pmi command (from 33): barrier_in [proxy:0:0@lizhi] got pmi command (from 54): barrier_in [proxy:0:0@lizhi] got pmi command (from 57): barrier_in [proxy:0:0@lizhi] forwarding command (cmd=barrier_in) upstream [mpiexec@lizhi] [pgid: 0] got PMI command: cmd=barrier_in [mpiexec@lizhi] PMI response to fd 8 pid 57: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] got pmi command (from 12): get_ranks2hosts [proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1 5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, [proxy:0:0@lizhi] got pmi command (from 14): get_ranks2hosts [proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1 5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, [proxy:0:0@lizhi] got pmi command (from 16): get_ranks2hosts [proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1 5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, [proxy:0:0@lizhi] got pmi command (from 21): get_ranks2hosts [proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1 5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, [proxy:0:0@lizhi] got pmi command (from 27): get_ranks2hosts [proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1 5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, [proxy:0:0@lizhi] got pmi command (from 12): get_appnum [proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0 [proxy:0:0@lizhi] got pmi command (from 14): get_appnum [proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0 [proxy:0:0@lizhi] got pmi command (from 16): get_appnum [proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0 [proxy:0:0@lizhi] got pmi command (from 21): get_appnum [proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0 [proxy:0:0@lizhi] got pmi command (from 24): get_ranks2hosts [proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1 5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, [proxy:0:0@lizhi] got pmi command (from 30): get_ranks2hosts [proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1 5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, [proxy:0:0@lizhi] got pmi command (from 33): get_ranks2hosts [proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1 5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, [proxy:0:0@lizhi] got pmi command (from 36): get_ranks2hosts [proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1 5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, [proxy:0:0@lizhi] got pmi command (from 39): get_ranks2hosts [proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1 5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, [proxy:0:0@lizhi] got pmi command (from 42): get_ranks2hosts [proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1 5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, [proxy:0:0@lizhi] got pmi command (from 45): get_ranks2hosts [proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1 5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, [proxy:0:0@lizhi] got pmi command (from 48): get_ranks2hosts [proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1 5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, [proxy:0:0@lizhi] got pmi command (from 51): get_ranks2hosts [proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1 5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, [proxy:0:0@lizhi] got pmi command (from 54): get_ranks2hosts [proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1 5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, [proxy:0:0@lizhi] got pmi command (from 57): get_ranks2hosts [proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1 5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, [proxy:0:0@lizhi] got pmi command (from 12): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [proxy:0:0@lizhi] got pmi command (from 14): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [proxy:0:0@lizhi] got pmi command (from 16): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [proxy:0:0@lizhi] got pmi command (from 21): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [proxy:0:0@lizhi] got pmi command (from 24): get_appnum [proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0 [proxy:0:0@lizhi] got pmi command (from 27): get_appnum [proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0 [proxy:0:0@lizhi] got pmi command (from 30): get_appnum [proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0 [proxy:0:0@lizhi] got pmi command (from 33): get_appnum [proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0 [proxy:0:0@lizhi] got pmi command (from 36): get_appnum [proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0 [proxy:0:0@lizhi] got pmi command (from 39): get_appnum [proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0 [proxy:0:0@lizhi] got pmi command (from 42): get_appnum [proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0 [proxy:0:0@lizhi] got pmi command (from 45): get_appnum [proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0 [proxy:0:0@lizhi] got pmi command (from 48): get_appnum [proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0 [proxy:0:0@lizhi] got pmi command (from 51): get_appnum [proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0 [proxy:0:0@lizhi] got pmi command (from 54): get_appnum [proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0 [proxy:0:0@lizhi] got pmi command (from 12): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [proxy:0:0@lizhi] got pmi command (from 14): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [proxy:0:0@lizhi] got pmi command (from 16): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [proxy:0:0@lizhi] got pmi command (from 21): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [proxy:0:0@lizhi] got pmi command (from 24): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [proxy:0:0@lizhi] got pmi command (from 27): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [proxy:0:0@lizhi] got pmi command (from 30): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [proxy:0:0@lizhi] got pmi command (from 33): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [proxy:0:0@lizhi] got pmi command (from 36): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [proxy:0:0@lizhi] got pmi command (from 39): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [proxy:0:0@lizhi] got pmi command (from 42): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [proxy:0:0@lizhi] got pmi command (from 45): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [proxy:0:0@lizhi] got pmi command (from 48): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [proxy:0:0@lizhi] got pmi command (from 51): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [proxy:0:0@lizhi] got pmi command (from 57): get_appnum [proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0 [proxy:0:0@lizhi] got pmi command (from 16): barrier_in [proxy:0:0@lizhi] got pmi command (from 21): barrier_in [proxy:0:0@lizhi] got pmi command (from 24): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [proxy:0:0@lizhi] got pmi command (from 27): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [proxy:0:0@lizhi] got pmi command (from 30): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [proxy:0:0@lizhi] got pmi command (from 33): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [proxy:0:0@lizhi] got pmi command (from 36): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [proxy:0:0@lizhi] got pmi command (from 39): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [proxy:0:0@lizhi] got pmi command (from 42): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [proxy:0:0@lizhi] got pmi command (from 45): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [proxy:0:0@lizhi] got pmi command (from 48): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [proxy:0:0@lizhi] got pmi command (from 51): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [proxy:0:0@lizhi] got pmi command (from 54): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [0] MPI startup(): Intel(R) MPI Library, Version 5.1.2 Build 20151015 (build id: 13147) [0] MPI startup(): Copyright (C) 2003-2015 Intel Corporation. All rights reserved. [0] MPI startup(): Multi-threaded optimized library [0] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map [1] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map [2] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map [3] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map [proxy:0:0@lizhi] got pmi command (from 12): put kvsname=kvs_31358_0 key=sharedFilename[0] value=/dev/shm/Intel_MPI_sUOBn3 [proxy:0:0@lizhi] forwarding command (cmd=put kvsname=kvs_31358_0 key=sharedFilename[0] value=/dev/shm/Intel_MPI_sUOBn3) upstream [mpiexec@lizhi] [pgid: 0] got PMI command: cmd=put kvsname=kvs_31358_0 key=sharedFilename[0] value=/dev/shm/Intel_MPI_sUOBn3 [mpiexec@lizhi] PMI response to fd 8 pid 12: cmd=put_result rc=0 msg=success [proxy:0:0@lizhi] got pmi command (from 14): barrier_in [proxy:0:0@lizhi] got pmi command (from 57): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [4] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map [proxy:0:0@lizhi] we don't understand the response put_result; forwarding downstream [proxy:0:0@lizhi] got pmi command (from 24): barrier_in [proxy:0:0@lizhi] got pmi command (from 27): barrier_in [proxy:0:0@lizhi] got pmi command (from 54): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [5] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map [proxy:0:0@lizhi] got pmi command (from 12): barrier_in [proxy:0:0@lizhi] got pmi command (from 30): barrier_in [proxy:0:0@lizhi] got pmi command (from 57): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0 [6] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map [7] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map [proxy:0:0@lizhi] got pmi command (from 33): barrier_in [proxy:0:0@lizhi] got pmi command (from 36): barrier_in [proxy:0:0@lizhi] got pmi command (from 39): barrier_in [proxy:0:0@lizhi] got pmi command (from 42): barrier_in [proxy:0:0@lizhi] got pmi command (from 45): barrier_in [proxy:0:0@lizhi] got pmi command (from 48): barrier_in [proxy:0:0@lizhi] got pmi command (from 51): barrier_in [8] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map [9] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map [10] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map [11] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map [12] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map [13] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map [14] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map [proxy:0:0@lizhi] got pmi command (from 54): barrier_in [15] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map [proxy:0:0@lizhi] got pmi command (from 57): barrier_in [proxy:0:0@lizhi] forwarding command (cmd=barrier_in) upstream [mpiexec@lizhi] [pgid: 0] got PMI command: cmd=barrier_in [mpiexec@lizhi] PMI response to fd 8 pid 57: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] got pmi command (from 14): get kvsname=kvs_31358_0 key=sharedFilename[0] [proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_sUOBn3 [proxy:0:0@lizhi] got pmi command (from 16): get kvsname=kvs_31358_0 key=sharedFilename[0] [proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_sUOBn3 [proxy:0:0@lizhi] got pmi command (from 21): get kvsname=kvs_31358_0 key=sharedFilename[0] [proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_sUOBn3 [proxy:0:0@lizhi] got pmi command (from 24): get kvsname=kvs_31358_0 key=sharedFilename[0] [proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_sUOBn3 [proxy:0:0@lizhi] got pmi command (from 27): get kvsname=kvs_31358_0 key=sharedFilename[0] [proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_sUOBn3 [proxy:0:0@lizhi] got pmi command (from 30): get kvsname=kvs_31358_0 key=sharedFilename[0] [proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_sUOBn3 [proxy:0:0@lizhi] got pmi command (from 33): get kvsname=kvs_31358_0 key=sharedFilename[0] [proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_sUOBn3 [proxy:0:0@lizhi] got pmi command (from 36): get kvsname=kvs_31358_0 key=sharedFilename[0] [proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_sUOBn3 [proxy:0:0@lizhi] got pmi command (from 39): get kvsname=kvs_31358_0 key=sharedFilename[0] [proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_sUOBn3 [proxy:0:0@lizhi] got pmi command (from 42): get kvsname=kvs_31358_0 key=sharedFilename[0] [proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_sUOBn3 [proxy:0:0@lizhi] got pmi command (from 45): get kvsname=kvs_31358_0 key=sharedFilename[0] [proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_sUOBn3 [proxy:0:0@lizhi] got pmi command (from 48): get kvsname=kvs_31358_0 key=sharedFilename[0] [proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_sUOBn3 [proxy:0:0@lizhi] got pmi command (from 51): get kvsname=kvs_31358_0 key=sharedFilename[0] [proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_sUOBn3 [proxy:0:0@lizhi] got pmi command (from 54): get kvsname=kvs_31358_0 key=sharedFilename[0] [proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_sUOBn3 [proxy:0:0@lizhi] got pmi command (from 57): get kvsname=kvs_31358_0 key=sharedFilename[0] [proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_sUOBn3 [13] MPI startup(): shm data transfer mode [15] MPI startup(): shm data transfer mode [10] MPI startup(): shm data transfer mode [9] MPI startup(): shm data transfer mode [0] MPI startup(): shm data transfer mode [3] MPI startup(): shm data transfer mode [1] MPI startup(): shm data transfer mode [14] MPI startup(): shm data transfer mode [11] MPI startup(): shm data transfer mode [12] MPI startup(): shm data transfer mode [8] MPI startup(): shm data transfer mode [5] MPI startup(): shm data transfer mode [2] MPI startup(): shm data transfer mode [4] MPI startup(): shm data transfer mode [6] MPI startup(): shm data transfer mode [7] MPI startup(): shm data transfer mode [proxy:0:0@lizhi] got pmi command (from 12): put kvsname=kvs_31358_0 key=P0-businesscard-0 value=fabrics_list#shm$ [proxy:0:0@lizhi] got pmi command (from 14): put kvsname=kvs_31358_0 key=P1-businesscard-0 value=fabrics_list#shm$ [proxy:0:0@lizhi] got pmi command (from 16): put kvsname=kvs_31358_0 key=P2-businesscard-0 value=fabrics_list#shm$ [proxy:0:0@lizhi] got pmi command (from 21): put kvsname=kvs_31358_0 key=P3-businesscard-0 value=fabrics_list#shm$ [proxy:0:0@lizhi] got pmi command (from 24): put kvsname=kvs_31358_0 key=P4-businesscard-0 value=fabrics_list#shm$ [proxy:0:0@lizhi] got pmi command (from 27): put kvsname=kvs_31358_0 key=P5-businesscard-0 value=fabrics_list#shm$ [proxy:0:0@lizhi] got pmi command (from 30): put kvsname=kvs_31358_0 key=P6-businesscard-0 value=fabrics_list#shm$ [proxy:0:0@lizhi] got pmi command (from 33): put kvsname=kvs_31358_0 key=P7-businesscard-0 value=fabrics_list#shm$ [proxy:0:0@lizhi] got pmi command (from 36): put kvsname=kvs_31358_0 key=P8-businesscard-0 value=fabrics_list#shm$ [proxy:0:0@lizhi] got pmi command (from 39): put kvsname=kvs_31358_0 key=P9-businesscard-0 value=fabrics_list#shm$ [proxy:0:0@lizhi] got pmi command (from 42): put kvsname=kvs_31358_0 key=P10-businesscard-0 value=fabrics_list#shm$ [proxy:0:0@lizhi] got pmi command (from 45): put kvsname=kvs_31358_0 key=P11-businesscard-0 value=fabrics_list#shm$ [proxy:0:0@lizhi] got pmi command (from 48): put kvsname=kvs_31358_0 key=P12-businesscard-0 value=fabrics_list#shm$ [proxy:0:0@lizhi] got pmi command (from 51): put kvsname=kvs_31358_0 key=P13-businesscard-0 value=fabrics_list#shm$ [proxy:0:0@lizhi] got pmi command (from 54): put kvsname=kvs_31358_0 key=P14-businesscard-0 value=fabrics_list#shm$ [proxy:0:0@lizhi] got pmi command (from 57): put kvsname=kvs_31358_0 key=P15-businesscard-0 value=fabrics_list#shm$ [mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P0-businesscard-0 value=fabrics_list#shm$ [mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success [mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P1-businesscard-0 value=fabrics_list#shm$ [mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success [mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P2-businesscard-0 value=fabrics_list#shm$ [mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success [mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P3-businesscard-0 value=fabrics_list#shm$ [mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success [mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P4-businesscard-0 value=fabrics_list#shm$ [mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success [mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P5-businesscard-0 value=fabrics_list#shm$ [mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success [mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P6-businesscard-0 value=fabrics_list#shm$ [mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success [mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P7-businesscard-0 value=fabrics_list#shm$ [mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success [mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P8-businesscard-0 value=fabrics_list#shm$ [mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success [mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P9-businesscard-0 value=fabrics_list#shm$ [mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success [mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P10-businesscard-0 value=fabrics_list#shm$ [mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success [mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P11-businesscard-0 value=fabrics_list#shm$ [mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success [mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P12-businesscard-0 value=fabrics_list#shm$ [mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success [mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P13-businesscard-0 value=fabrics_list#shm$ [mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success [mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P14-businesscard-0 value=fabrics_list#shm$ [mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success [mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P15-businesscard-0 value=fabrics_list#shm$ [mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success [proxy:0:0@lizhi] got pmi command (from 12): barrier_in [proxy:0:0@lizhi] got pmi command (from 14): barrier_in [proxy:0:0@lizhi] got pmi command (from 16): barrier_in [proxy:0:0@lizhi] got pmi command (from 39): barrier_in [proxy:0:0@lizhi] got pmi command (from 21): barrier_in [proxy:0:0@lizhi] got pmi command (from 24): barrier_in [proxy:0:0@lizhi] got pmi command (from 27): barrier_in [proxy:0:0@lizhi] got pmi command (from 30): barrier_in [proxy:0:0@lizhi] got pmi command (from 33): barrier_in [proxy:0:0@lizhi] got pmi command (from 36): barrier_in [proxy:0:0@lizhi] got pmi command (from 42): barrier_in [proxy:0:0@lizhi] got pmi command (from 45): barrier_in [proxy:0:0@lizhi] got pmi command (from 48): barrier_in [proxy:0:0@lizhi] got pmi command (from 51): barrier_in [proxy:0:0@lizhi] got pmi command (from 54): barrier_in [proxy:0:0@lizhi] got pmi command (from 57): barrier_in [proxy:0:0@lizhi] forwarding command (cmd=barrier_in) upstream [mpiexec@lizhi] [pgid: 0] got PMI command: cmd=barrier_in [mpiexec@lizhi] PMI response to fd 8 pid 57: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [0] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8 [1] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8 [2] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8 [3] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8 [4] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8 [5] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8 [6] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8 [7] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8 [8] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8 [9] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8 [10] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8 [11] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8 [12] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8 [13] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8 [14] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8 [15] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8 [0] MPI startup(): Device_reset_idx=8 [0] MPI startup(): Allgather: 1: 1-413 & 0-2147483647 [0] MPI startup(): Allgather: 2: 414-676 & 0-2147483647 [0] MPI startup(): Allgather: 1: 677-3539 & 0-2147483647 [0] MPI startup(): Allgather: 3: 3540-29998 & 0-2147483647 [0] MPI startup(): Allgather: 4: 29999-44716 & 0-2147483647 [0] MPI startup(): Allgather: 3: 44717-113786 & 0-2147483647 [0] MPI startup(): Allgather: 4: 113787-158125 & 0-2147483647 [0] MPI startup(): Allgather: 3: 158126-567736 & 0-2147483647 [0] MPI startup(): Allgather: 4: 567737-1876335 & 0-2147483647 [0] MPI startup(): Allgather: 3: 0-2147483647 & 0-2147483647 [0] MPI startup(): Allgatherv: 1: 0-435 & 0-2147483647 [0] MPI startup(): Allgatherv: 2: 435-817 & 0-2147483647 [0] MPI startup(): Allgatherv: 4: 817-1933 & 0-2147483647 [0] MPI startup(): Allgatherv: 1: 1933-2147 & 0-2147483647 [0] MPI startup(): Allgatherv: 3: 2147-31752 & 0-2147483647 [0] MPI startup(): Allgatherv: 4: 31752-63760 & 0-2147483647 [0] MPI startup(): Allgatherv: 3: 63760-146441 & 0-2147483647 [0] MPI startup(): Allgatherv: 4: 146441-569451 & 0-2147483647 [0] MPI startup(): Allgatherv: 3: 569451-1578575 & 0-2147483647 [0] MPI startup(): Allgatherv: 4: 1578575-3583798 & 0-2147483647 [0] MPI startup(): Allgatherv: 3: 0-2147483647 & 0-2147483647 [0] MPI startup(): Allreduce: 6: 0-4 & 0-2147483647 [0] MPI startup(): Allreduce: 1: 4-14 & 0-2147483647 [0] MPI startup(): Allreduce: 6: 14-24 & 0-2147483647 [0] MPI startup(): Allreduce: 1: 24-4645 & 0-2147483647 [0] MPI startup(): Allreduce: 6: 4645-10518 & 0-2147483647 [0] MPI startup(): Allreduce: 2: 10518-22173 & 0-2147483647 [0] MPI startup(): Allreduce: 4: 22173-190389 & 0-2147483647 [0] MPI startup(): Allreduce: 6: 190389-1404366 & 0-2147483647 [0] MPI startup(): Allreduce: 4: 1404366-3122567 & 0-2147483647 [0] MPI startup(): Allreduce: 7: 0-2147483647 & 0-2147483647 [0] MPI startup(): Alltoall: 1: 0-236 & 0-2147483647 [0] MPI startup(): Alltoall: 4: 237-530 & 0-2147483647 [0] MPI startup(): Alltoall: 2: 531-4590 & 0-2147483647 [0] MPI startup(): Alltoall: 4: 4591-35550 & 0-2147483647 [0] MPI startup(): Alltoall: 2: 35551-214258 & 0-2147483647 [0] MPI startup(): Alltoall: 3: 214259-1177466 & 0-2147483647 [0] MPI startup(): Alltoall: 2: 0-2147483647 & 0-2147483647 [0] MPI startup(): Alltoallv: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Alltoallw: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Barrier: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Bcast: 1: 1-3 & 0-2147483647 [0] MPI startup(): Bcast: 7: 4-584 & 0-2147483647 [0] MPI startup(): Bcast: 1: 585-3283 & 0-2147483647 [0] MPI startup(): Bcast: 7: 3284-3061726 & 0-2147483647 [0] MPI startup(): Bcast: 5: 0-2147483647 & 0-2147483647 [0] MPI startup(): Exscan: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Gather: 3: 1-2044 & 0-2147483647 [0] MPI startup(): Gather: 2: 2045-7606 & 0-2147483647 [0] MPI startup(): Gather: 3: 7607-525080 & 0-2147483647 [0] MPI startup(): Gather: 2: 525081-1147564 & 0-2147483647 [0] MPI startup(): Gather: 3: 1147565-3349096 & 0-2147483647 [0] MPI startup(): Gather: 2: 0-2147483647 & 0-2147483647 [0] MPI startup(): Gatherv: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Reduce_scatter: 4: 0-6 & 0-2147483647 [0] MPI startup(): Reduce_scatter: 5: 6-22 & 0-2147483647 [0] MPI startup(): Reduce_scatter: 1: 22-614 & 0-2147483647 [0] MPI startup(): Reduce_scatter: 3: 614-132951 & 0-2147483647 [0] MPI startup(): Reduce_scatter: 2: 132951-523266 & 0-2147483647 [0] MPI startup(): Reduce_scatter: 5: 523266-660854 & 0-2147483647 [0] MPI startup(): Reduce_scatter: 2: 660854-2488736 & 0-2147483647 [0] MPI startup(): Reduce_scatter: 5: 0-2147483647 & 0-2147483647 [0] MPI startup(): Reduce: 1: 0-2147483647 & 0-2147483647 [0] MPI startup(): Scan: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Scatter: 3: 1-5461 & 0-2147483647 [0] MPI startup(): Scatter: 2: 5462-8972 & 0-2147483647 [0] MPI startup(): Scatter: 3: 8973-361813 & 0-2147483647 [0] MPI startup(): Scatter: 2: 0-2147483647 & 0-2147483647 [0] MPI startup(): Scatterv: 0: 0-2147483647 & 0-2147483647 [1] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0) [3] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0) [5] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0) [7] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0) [2] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0) [9] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0) [11] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0) [13] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0) [15] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0) [4] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0) [6] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0) [12] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0) [14] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0) [0] MPI startup(): Rank Pid Node name Pin cpu [0] MPI startup(): 0 31363 lizhi {0,16} [0] MPI startup(): 1 31364 lizhi {1,17} [0] MPI startup(): 2 31365 lizhi {2,18} [0] MPI startup(): 3 31366 lizhi {3,19} [0] MPI startup(): 4 31367 lizhi {4,20} [8] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0) [10] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0) [0] MPI startup(): 5 31368 lizhi {5,21} [0] MPI startup(): 6 31369 lizhi {6,22} [0] MPI startup(): 7 31370 lizhi {7,23} [0] MPI startup(): 8 31371 lizhi {8,24} [0] MPI startup(): 9 31372 lizhi {9,25} [0] MPI startup(): 10 31373 lizhi {10,26} [0] MPI startup(): 11 31374 lizhi {11,27} [0] MPI startup(): 12 31375 lizhi {12,28} [0] MPI startup(): 13 31376 lizhi {13,29} [0] MPI startup(): 14 31377 lizhi {14,30} [0] MPI startup(): 15 31378 lizhi {15,31} [0] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0) [0] MPI startup(): Topology split mode = 1 | rank | node | space=1 | 0 | 0 | | 1 | 0 | | 2 | 0 | | 3 | 0 | | 4 | 0 | | 5 | 0 | | 6 | 0 | | 7 | 0 | | 8 | 0 | | 9 | 0 | | 10 | 0 | | 11 | 0 | | 12 | 0 | | 13 | 0 | | 14 | 0 | | 15 | 0 | [0] MPI startup(): I_MPI_DEBUG=100 [0] MPI startup(): I_MPI_INFO_BRAND=Intel(R) Xeon(R) [0] MPI startup(): I_MPI_INFO_CACHE1=0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23,0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23 [0] MPI startup(): I_MPI_INFO_CACHE2=0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23,0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23 [0] MPI startup(): I_MPI_INFO_CACHE3=0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1 [0] MPI startup(): I_MPI_INFO_CACHES=3 [0] MPI startup(): I_MPI_INFO_CACHE_SHARE=2,2,32 [0] MPI startup(): I_MPI_INFO_CACHE_SIZE=32768,262144,20971520 [0] MPI startup(): I_MPI_INFO_CORE=0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7 [0] MPI startup(): I_MPI_INFO_C_NAME=Unknown [0] MPI startup(): I_MPI_INFO_DESC=1342177285 [0] MPI startup(): I_MPI_INFO_FLGB=0 [0] MPI startup(): I_MPI_INFO_FLGC=532603903 [0] MPI startup(): I_MPI_INFO_FLGCEXT=0 [0] MPI startup(): I_MPI_INFO_FLGD=-1075053569 [0] MPI startup(): I_MPI_INFO_FLGDEXT=0 [0] MPI startup(): I_MPI_INFO_LCPU=32 [0] MPI startup(): I_MPI_INFO_MODE=775 [0] MPI startup(): I_MPI_INFO_NUMA_NODE_DIST=10,21,21,10 [0] MPI startup(): I_MPI_INFO_NUMA_NODE_NUM=2 [0] MPI startup(): I_MPI_INFO_PACK=0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1 [0] MPI startup(): I_MPI_INFO_SERIAL=E5-2650 0 [0] MPI startup(): I_MPI_INFO_SIGN=132823 [0] MPI startup(): I_MPI_INFO_STATE=0 [0] MPI startup(): I_MPI_INFO_THREAD=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 [0] MPI startup(): I_MPI_INFO_VEND=1 [0] MPI startup(): I_MPI_PIN_INFO=x0,16 [0] MPI startup(): I_MPI_PIN_MAPPING=16:0 0,1 1,2 2,3 3,4 4,5 5,6 6,7 7,8 8,9 9,10 10,11 11,12 12,13 13,14 14,15 15 NAS Parallel Benchmarks 3.3 -- CG Benchmark Size: 150000 Iterations: 75 Number of active processes: 16 Number of nonzeroes per row: 15 Eigenvalue shift: .110E+03 [mpiexec@lizhi] Sending Ctrl-C to processes as requested [mpiexec@lizhi] Press Ctrl-C again to force abort
Thanks in advance for your help.
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Hi Jeremie,
As far as I see your MPI application has been started successfully (SSH isn't used for such local runs) but it got stuck for some reasons (maybe due to memory exhaustion - for example, got stuck in swap area). You said that it worked fine under root so I'd suspect different system limits for the user and root accounts - could you please compare 'ulimit -a' for both users and align it if necessary.
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Thanks for your reply :)
Both ulimit -a seems to be the same:
jeremie@lizhi$ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 256960 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 256960 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited root@lizhi:# ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 256960 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 256960 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited root@lizhi:/home/jeremie/pgashpc/Code/NPB_UPC_C2_101_IntelBased/CG#
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Hi Jeremie,
As far as I see your scenario got stuck into the application so I'd recommend you to try to simplify the MPI run - try to run less number of processes (2, 4, 8) and/or check and vary the parameters of the MPI application (if any).
Also try the following test scenario under root/user accounts and provide the output for both cases:
I_MPI_DEBUG=100 I_MPI_HYDRA_DEBUG=1 mpirun -n 2 IMB-MPI1 pingpong
Another suggestion is to check SELinux status and try to disable it - potentially it may be a reason of the user/root difference.
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
I don't think I need to try on less core number as I already tried the same program compiled for 2 cores with the same problems.
Here is the result of mpirun -n 2 IMB-MPI1 pingpong (with the correct environment variables, and running as root)
host: lizhi ================================================================================================== mpiexec options: ---------------- Base path: /opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/ Launcher: ssh Debug level: 1 Enable X: -1 Global environment: ------------------- I_MPI_PERHOST=allcores LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib:/usr/lib/jdk1.7.0/jre/lib/amd64/server/:/usr/local/cuda-7.0/lib64:/usr/local/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib LESSOPEN=| /usr/bin/lesspipe %s SUDO_GID=1011 MAIL=/var/mail/root USER=root LANGUAGE=en_US:en LC_TIME=en_US.UTF-8 SHLVL=1 OLDPWD=/home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi HOME=/root XDG_SESSION_COOKIE=e6534fbbf1939771344e386e0000016f-1453462849.154656-758447237 LC_MONETARY=en_US.UTF-8 SUDO_UID=1011 LOGNAME=root _=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/mpirun TERM=xterm USERNAME=root PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/cuda-7.0/bin LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8 LANG=en_US.UTF-8 LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36: SUDO_COMMAND=/bin/su SHELL=/bin/bash LC_NAME=en_US.UTF-8 SUDO_USER=jeremie LESSCLOSE=/usr/bin/lesspipe %s %s LC_MEASUREMENT=en_US.UTF-8 I_MPI_MPIRUN=mpirun LC_IDENTIFICATION=en_US.UTF-8 I_MPI_DEBUG=100 LC_ALL=en_US.UTF-8 PWD=/home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi/bin I_MPI_HYDRA_DEBUG=1 LC_NUMERIC=en_US.UTF-8 I_MPI_ROOT=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi LC_PAPER=en_US.UTF-8 MANPATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/man:/usr/local/man:/usr/local/share/man:/usr/share/man Hydra internal environment: --------------------------- MPIR_CVAR_NEMESIS_ENABLE_CKPOINT=1 GFORTRAN_UNBUFFERED_PRECONNECTED=y I_MPI_HYDRA_UUID=00070000-a28b-7528-ed29-050001017f00 DAPL_NETWORK_PROCESS_NUM=2 Proxy information: ********************* [1] proxy: lizhi (16 cores) Exec list: IMB-MPI1 (2 processes); ================================================================================================== [mpiexec@lizhi] Timeout set to -1 (-1 means infinite) [mpiexec@lizhi] Got a control port string of lizhi:33758 Proxy launch args: /opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/pmi_proxy --control-port lizhi:33758 --debug --pmi-connect lazy-cache --pmi-aggregate -s 0 --rmk user --launcher ssh --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 1766992124 --usize -2 --proxy-id Arguments being passed to proxy 0: --version 3.1.2 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname lizhi --global-core-map 0,16,16 --pmi-id-map 0,0 --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_1792_0 --pmi-process-mapping (vector,(0,1,16)) --topolib ipl --ckpointlib blcr --ckpoint-prefix /tmp --ckpoint-preserve 1 --ckpoint off --ckpoint-num -1 --global-inherited-env 39 'I_MPI_PERHOST=allcores' 'LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib:/usr/lib/jdk1.7.0/jre/lib/amd64/server/:/usr/local/cuda-7.0/lib64:/usr/local/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib' 'LESSOPEN=| /usr/bin/lesspipe %s' 'SUDO_GID=1011' 'MAIL=/var/mail/root' 'USER=root' 'LANGUAGE=en_US:en' 'LC_TIME=en_US.UTF-8' 'SHLVL=1' 'OLDPWD=/home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi' 'HOME=/root' 'XDG_SESSION_COOKIE=e6534fbbf1939771344e386e0000016f-1453462849.154656-758447237' 'LC_MONETARY=en_US.UTF-8' 'SUDO_UID=1011' 'LOGNAME=root' '_=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/mpirun' 'TERM=xterm' 'USERNAME=root' 'PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/cuda-7.0/bin' 'LC_ADDRESS=en_US.UTF-8' 'LC_TELEPHONE=en_US.UTF-8' 'LANG=en_US.UTF-8' 'LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36:' 'SUDO_COMMAND=/bin/su' 'SHELL=/bin/bash' 'LC_NAME=en_US.UTF-8' 'SUDO_USER=jeremie' 'LESSCLOSE=/usr/bin/lesspipe %s %s' 'LC_MEASUREMENT=en_US.UTF-8' 'I_MPI_MPIRUN=mpirun' 'LC_IDENTIFICATION=en_US.UTF-8' 'I_MPI_DEBUG=100' 'LC_ALL=en_US.UTF-8' 'PWD=/home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi/bin' 'I_MPI_HYDRA_DEBUG=1' 'LC_NUMERIC=en_US.UTF-8' 'I_MPI_ROOT=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi' 'LC_PAPER=en_US.UTF-8' 'MANPATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/man:/usr/local/man:/usr/local/share/man:/usr/share/man' --global-user-env 0 --global-system-env 4 'MPIR_CVAR_NEMESIS_ENABLE_CKPOINT=1' 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' 'I_MPI_HYDRA_UUID=00070000-a28b-7528-ed29-050001017f00' 'DAPL_NETWORK_PROCESS_NUM=2' --proxy-core-count 16 --mpi-cmd-env mpirun -n 2 IMB-MPI1 pingpong --exec --exec-appnum 0 --exec-proc-count 2 --exec-local-env 0 --exec-wdir /home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi/bin --exec-args 2 IMB-MPI1 pingpong [mpiexec@lizhi] Launch arguments: /opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/pmi_proxy --control-port lizhi:33758 --debug --pmi-connect lazy-cache --pmi-aggregate -s 0 --rmk user --launcher ssh --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 1766992124 --usize -2 --proxy-id 0 [mpiexec@lizhi] STDIN will be redirected to 1 fd(s): 11 [proxy:0:0@lizhi] Start PMI_proxy 0 [proxy:0:0@lizhi] STDIN will be redirected to 1 fd(s): 17 [proxy:0:0@lizhi] got pmi command (from 12): init pmi_version=1 pmi_subversion=1 [proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0 [proxy:0:0@lizhi] got pmi command (from 14): init pmi_version=1 pmi_subversion=1 [proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0 [proxy:0:0@lizhi] got pmi command (from 12): get_maxes [proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024 [proxy:0:0@lizhi] got pmi command (from 14): get_maxes [proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024 [proxy:0:0@lizhi] got pmi command (from 12): barrier_in [proxy:0:0@lizhi] got pmi command (from 14): barrier_in [proxy:0:0@lizhi] forwarding command (cmd=barrier_in) upstream [mpiexec@lizhi] [pgid: 0] got PMI command: cmd=barrier_in [mpiexec@lizhi] PMI response to fd 8 pid 14: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] got pmi command (from 12): get_ranks2hosts [proxy:0:0@lizhi] PMI response: put_ranks2hosts 15 1 5 lizhi 0,1, [proxy:0:0@lizhi] got pmi command (from 14): get_ranks2hosts [proxy:0:0@lizhi] PMI response: put_ranks2hosts 15 1 5 lizhi 0,1, [proxy:0:0@lizhi] got pmi command (from 12): get_appnum [proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0 [proxy:0:0@lizhi] got pmi command (from 14): get_appnum [proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0 [proxy:0:0@lizhi] got pmi command (from 12): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_1792_0 [proxy:0:0@lizhi] got pmi command (from 14): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_1792_0 [proxy:0:0@lizhi] got pmi command (from 12): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_1792_0 [proxy:0:0@lizhi] got pmi command (from 14): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_1792_0 [0] MPI startup(): Intel(R) MPI Library, Version 5.1.2 Build 20151015 (build id: 13147) [0] MPI startup(): Copyright (C) 2003-2015 Intel Corporation. All rights reserved. [0] MPI startup(): Multi-threaded optimized library [0] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map [1] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map [proxy:0:0@lizhi] got pmi command (from 12): put kvsname=kvs_1792_0 key=sharedFilename[0] value=/dev/shm/Intel_MPI_5lgkMs [proxy:0:0@lizhi] forwarding command (cmd=put kvsname=kvs_1792_0 key=sharedFilename[0] value=/dev/shm/Intel_MPI_5lgkMs) upstream [mpiexec@lizhi] [pgid: 0] got PMI command: cmd=put kvsname=kvs_1792_0 key=sharedFilename[0] value=/dev/shm/Intel_MPI_5lgkMs [mpiexec@lizhi] PMI response to fd 8 pid 12: cmd=put_result rc=0 msg=success [proxy:0:0@lizhi] got pmi command (from 14): barrier_in [proxy:0:0@lizhi] we don't understand the response put_result; forwarding downstream [proxy:0:0@lizhi] got pmi command (from 12): barrier_in [proxy:0:0@lizhi] forwarding command (cmd=barrier_in) upstream [mpiexec@lizhi] [pgid: 0] got PMI command: cmd=barrier_in [mpiexec@lizhi] PMI response to fd 8 pid 12: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] got pmi command (from 14): get kvsname=kvs_1792_0 key=sharedFilename[0] [proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_5lgkMs [0] MPI startup(): shm data transfer mode [1] MPI startup(): shm data transfer mode [proxy:0:0@lizhi] got pmi command (from 12): put kvsname=kvs_1792_0 key=P0-businesscard-0 value=fabrics_list#shm$ [proxy:0:0@lizhi] got pmi command (from 14): put kvsname=kvs_1792_0 key=P1-businesscard-0 value=fabrics_list#shm$ [mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_1792_0 key=P0-businesscard-0 value=fabrics_list#shm$ [mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success [mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_1792_0 key=P1-businesscard-0 value=fabrics_list#shm$ [mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success [proxy:0:0@lizhi] got pmi command (from 12): barrier_in [proxy:0:0@lizhi] got pmi command (from 14): barrier_in [proxy:0:0@lizhi] forwarding command (cmd=barrier_in) upstream [mpiexec@lizhi] [pgid: 0] got PMI command: cmd=barrier_in [mpiexec@lizhi] PMI response to fd 8 pid 14: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [0] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8 [1] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8 [0] MPI startup(): Device_reset_idx=8 [0] MPI startup(): Allgather: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Allgatherv: 3: 0-259847 & 0-2147483647 [0] MPI startup(): Allgatherv: 4: 0-2147483647 & 0-2147483647 [0] MPI startup(): Allreduce: 1: 0-1536 & 0-2147483647 [0] MPI startup(): Allreduce: 7: 1536-2194 & 0-2147483647 [0] MPI startup(): Allreduce: 1: 2194-34792 & 0-2147483647 [0] MPI startup(): Allreduce: 4: 34792-121510 & 0-2147483647 [0] MPI startup(): Allreduce: 1: 121510-145618 & 0-2147483647 [0] MPI startup(): Allreduce: 2: 145618-668210 & 0-2147483647 [0] MPI startup(): Allreduce: 7: 668210-1546854 & 0-2147483647 [0] MPI startup(): Allreduce: 4: 1546854-2473237 & 0-2147483647 [0] MPI startup(): Allreduce: 2: 0-2147483647 & 0-2147483647 [0] MPI startup(): Alltoall: 3: 0-117964 & 0-2147483647 [0] MPI startup(): Alltoall: 4: 117965-3131275 & 0-2147483647 [0] MPI startup(): Alltoall: 3: 0-2147483647 & 0-2147483647 [0] MPI startup(): Alltoallv: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Alltoallw: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Barrier: 2: 0-2147483647 & 0-2147483647 [0] MPI startup(): Bcast: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Exscan: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Gather: 3: 1-921 & 0-2147483647 [0] MPI startup(): Gather: 1: 922-3027 & 0-2147483647 [0] MPI startup(): Gather: 3: 3028-5071 & 0-2147483647 [0] MPI startup(): Gather: 2: 5072-11117 & 0-2147483647 [0] MPI startup(): Gather: 1: 11118-86016 & 0-2147483647 [0] MPI startup(): Gather: 3: 86017-283989 & 0-2147483647 [0] MPI startup(): Gather: 1: 283990-664950 & 0-2147483647 [0] MPI startup(): Gather: 3: 0-2147483647 & 0-2147483647 [0] MPI startup(): Gatherv: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Reduce_scatter: 1: 0-6 & 0-2147483647 [0] MPI startup(): Reduce_scatter: 2: 0-2147483647 & 0-2147483647 [0] MPI startup(): Reduce: 1: 0-2147483647 & 0-2147483647 [0] MPI startup(): Scan: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Scatter: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Scatterv: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Rank Pid Node name Pin cpu [0] MPI startup(): 0 1797 lizhi {0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23} [0] MPI startup(): 1 1798 lizhi {8,9,10,11,12,13,14,15,24,25,26,27,28,29,30,31} [0] MPI startup(): Recognition=2 Platform(code=8 ippn=1 dev=1) Fabric(intra=1 inter=1 flags=0x0) [0] MPI startup(): Topology split mode = 1 | rank | node | space=1 | 0 | 0 | | 1 | 0 | [1] MPI startup(): Recognition=2 Platform(code=8 ippn=1 dev=1) Fabric(intra=1 inter=1 flags=0x0) [0] MPI startup(): I_MPI_DEBUG=100 [0] MPI startup(): I_MPI_INFO_BRAND=Intel(R) Xeon(R) [0] MPI startup(): I_MPI_INFO_CACHE1=0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23,0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23 [0] MPI startup(): I_MPI_INFO_CACHE2=0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23,0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23 [0] MPI startup(): I_MPI_INFO_CACHE3=0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1 [0] MPI startup(): I_MPI_INFO_CACHES=3 [0] MPI startup(): I_MPI_INFO_CACHE_SHARE=2,2,32 [0] MPI startup(): I_MPI_INFO_CACHE_SIZE=32768,262144,20971520 [0] MPI startup(): I_MPI_INFO_CORE=0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7 [0] MPI startup(): I_MPI_INFO_C_NAME=Unknown [0] MPI startup(): I_MPI_INFO_DESC=1342177285 [0] MPI startup(): I_MPI_INFO_FLGB=0 [0] MPI startup(): I_MPI_INFO_FLGC=532603903 [0] MPI startup(): I_MPI_INFO_FLGCEXT=0 [0] MPI startup(): I_MPI_INFO_FLGD=-1075053569 [0] MPI startup(): I_MPI_INFO_FLGDEXT=0 [0] MPI startup(): I_MPI_INFO_LCPU=32 [0] MPI startup(): I_MPI_INFO_MODE=775 [0] MPI startup(): I_MPI_INFO_NUMA_NODE_DIST=10,21,21,10 [0] MPI startup(): I_MPI_INFO_NUMA_NODE_NUM=2 [0] MPI startup(): I_MPI_INFO_PACK=0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1 [0] MPI startup(): I_MPI_INFO_SERIAL=E5-2650 0 [0] MPI startup(): I_MPI_INFO_SIGN=132823 [0] MPI startup(): I_MPI_INFO_STATE=0 [0] MPI startup(): I_MPI_INFO_THREAD=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 [0] MPI startup(): I_MPI_INFO_VEND=1 [0] MPI startup(): I_MPI_PIN_INFO=x0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23 [0] MPI startup(): I_MPI_PIN_MAPPING=2:0 0,1 8 #------------------------------------------------------------ # Intel (R) MPI Benchmarks 4.1 Update 1, MPI-1 part #------------------------------------------------------------ # Date : Fri Jan 22 15:37:26 2016 # Machine : x86_64 # System : Linux # Release : 3.13.0-35-generic # Version : #62~precise1-Ubuntu SMP Mon Aug 18 14:52:04 UTC 2014 # MPI Version : 3.0 # MPI Thread Environment: # New default behavior from Version 3.2 on: # the number of iterations per message size is cut down # dynamically when a certain run time (per message size sample) # is expected to be exceeded. Time limit is defined by variable # "SECS_PER_SAMPLE" (=> IMB_settings.h) # or through the flag => -time # Calling sequence was: # IMB-MPI1 pingpong # Minimum message length in bytes: 0 # Maximum message length in bytes: 4194304 # # MPI_Datatype : MPI_BYTE # MPI_Datatype for reductions : MPI_FLOAT # MPI_Op : MPI_SUM # # # List of Benchmarks to run: # PingPong #--------------------------------------------------- # Benchmarking PingPong # #processes = 2 #--------------------------------------------------- #bytes #repetitions t[usec] Mbytes/sec 0 1000 1.07 0.00 1 1000 0.94 1.02 2 1000 0.94 2.04 4 1000 0.94 4.07 8 1000 0.95 8.01 16 1000 0.95 16.11 32 1000 0.95 32.28 64 1000 1.06 57.61 128 1000 1.03 118.92 256 1000 1.05 232.17 512 1000 1.30 376.61 1024 1000 1.47 666.56 2048 1000 1.81 1080.24 4096 1000 2.86 1365.11 8192 1000 3.74 2091.13 16384 1000 7.76 2012.25 32768 1000 14.16 2206.84 65536 640 14.15 4415.99 131072 320 18.98 6585.50 262144 160 36.57 6836.40 524288 80 66.71 7494.85 1048576 40 107.70 9285.08 2097152 20 214.70 9315.24 4194304 10 450.55 8878.01 # All processes entering MPI_Finalize [proxy:0:0@lizhi] got pmi command (from 12): finalize [proxy:0:0@lizhi] PMI response: cmd=finalize_ack [proxy:0:0@lizhi] got pmi command (from 14): finalize [proxy:0:0@lizhi] PMI response: cmd=finalize_ack
Here is the result of mpirun -n 2 IMB-MPI1 pingpong (with the correct environment variables, and running as a regular user)
host: lizhi ================================================================================================== mpiexec options: ---------------- Base path: /opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/ Launcher: ssh Debug level: 1 Enable X: -1 Global environment: ------------------- I_MPI_PERHOST=allcores LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib:/usr/lib/jdk1.7.0/jre/lib/amd64/server/:/usr/local/cuda-7.0/lib64:/usr/local/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/compiler/lib/intel64_lin:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib LESSOPEN=| /usr/bin/lesspipe %s MAIL=/var/mail/jeremie SSH_CLIENT=172.16.0.157 54865 22 USER=jeremie LANGUAGE=en_US:en LC_TIME=en_US.UTF-8 SHLVL=1 OLDPWD=/home/jeremie/pgashpc/Code/NPB_UPC_C2_101_IntelBased/CG HOME=/home/jeremie XDG_SESSION_COOKIE=e6534fbbf1939771344e386e0000016f-1453462579.220452-307383088 SSH_TTY=/dev/pts/3 LC_MONETARY=en_US.UTF-8 LOGNAME=jeremie _=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/mpirun TERM=xterm PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/cuda-7.0/bin:/home/jeremie/pgashpc/Compilers/UPCCompilerLizhi/bin:/home/jeremie/pgashpc/EnergyManagement/IntelPCM LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8 LANG=en_US.UTF-8 LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36: SHELL=/bin/bash LC_NAME=en_US.UTF-8 LESSCLOSE=/usr/bin/lesspipe %s %s LC_MEASUREMENT=en_US.UTF-8 I_MPI_MPIRUN=mpirun LC_IDENTIFICATION=en_US.UTF-8 I_MPI_DEBUG=100 LC_ALL=en_US.UTF-8 PWD=/home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi/bin I_MPI_HYDRA_DEBUG=1 SSH_CONNECTION=172.16.0.157 54865 192.168.202.79 22 LC_NUMERIC=en_US.UTF-8 I_MPI_ROOT=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi LC_PAPER=en_US.UTF-8 MANPATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/man:/usr/local/man:/usr/local/share/man:/usr/share/man:/home/jeremie/pgashpc/Compilers/UPCCompilerLizhi/man Hydra internal environment: --------------------------- MPIR_CVAR_NEMESIS_ENABLE_CKPOINT=1 GFORTRAN_UNBUFFERED_PRECONNECTED=y I_MPI_HYDRA_UUID=2e070000-a28d-ef31-ed29-050001017f00 DAPL_NETWORK_PROCESS_NUM=2 Proxy information: ********************* [1] proxy: lizhi (16 cores) Exec list: IMB-MPI1 (2 processes); ================================================================================================== [mpiexec@lizhi] Timeout set to -1 (-1 means infinite) [mpiexec@lizhi] Got a control port string of lizhi:56831 Proxy launch args: /opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/pmi_proxy --control-port lizhi:56831 --debug --pmi-connect lazy-cache --pmi-aggregate -s 0 --rmk user --launcher ssh --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 1724218219 --usize -2 --proxy-id Arguments being passed to proxy 0: --version 3.1.2 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname lizhi --global-core-map 0,16,16 --pmi-id-map 0,0 --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_1838_0 --pmi-process-mapping (vector,(0,1,16)) --topolib ipl --ckpointlib blcr --ckpoint-prefix /tmp --ckpoint-preserve 1 --ckpoint off --ckpoint-num -1 --global-inherited-env 37 'I_MPI_PERHOST=allcores' 'LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib:/usr/lib/jdk1.7.0/jre/lib/amd64/server/:/usr/local/cuda-7.0/lib64:/usr/local/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/compiler/lib/intel64_lin:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib' 'LESSOPEN=| /usr/bin/lesspipe %s' 'MAIL=/var/mail/jeremie' 'SSH_CLIENT=172.16.0.157 54865 22' 'USER=jeremie' 'LANGUAGE=en_US:en' 'LC_TIME=en_US.UTF-8' 'SHLVL=1' 'OLDPWD=/home/jeremie/pgashpc/Code/NPB_UPC_C2_101_IntelBased/CG' 'HOME=/home/jeremie' 'XDG_SESSION_COOKIE=e6534fbbf1939771344e386e0000016f-1453462579.220452-307383088' 'SSH_TTY=/dev/pts/3' 'LC_MONETARY=en_US.UTF-8' 'LOGNAME=jeremie' '_=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/mpirun' 'TERM=xterm' 'PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/cuda-7.0/bin:/home/jeremie/pgashpc/Compilers/UPCCompilerLizhi/bin:/home/jeremie/pgashpc/EnergyManagement/IntelPCM' 'LC_ADDRESS=en_US.UTF-8' 'LC_TELEPHONE=en_US.UTF-8' 'LANG=en_US.UTF-8' 'LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36:' 'SHELL=/bin/bash' 'LC_NAME=en_US.UTF-8' 'LESSCLOSE=/usr/bin/lesspipe %s %s' 'LC_MEASUREMENT=en_US.UTF-8' 'I_MPI_MPIRUN=mpirun' 'LC_IDENTIFICATION=en_US.UTF-8' 'I_MPI_DEBUG=100' 'LC_ALL=en_US.UTF-8' 'PWD=/home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi/bin' 'I_MPI_HYDRA_DEBUG=1' 'SSH_CONNECTION=172.16.0.157 54865 192.168.202.79 22' 'LC_NUMERIC=en_US.UTF-8' 'I_MPI_ROOT=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi' 'LC_PAPER=en_US.UTF-8' 'MANPATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/man:/usr/local/man:/usr/local/share/man:/usr/share/man:/home/jeremie/pgashpc/Compilers/UPCCompilerLizhi/man' --global-user-env 0 --global-system-env 4 'MPIR_CVAR_NEMESIS_ENABLE_CKPOINT=1' 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' 'I_MPI_HYDRA_UUID=2e070000-a28d-ef31-ed29-050001017f00' 'DAPL_NETWORK_PROCESS_NUM=2' --proxy-core-count 16 --mpi-cmd-env mpirun -n 2 IMB-MPI1 pingpong --exec --exec-appnum 0 --exec-proc-count 2 --exec-local-env 0 --exec-wdir /home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi/bin --exec-args 2 IMB-MPI1 pingpong [mpiexec@lizhi] Launch arguments: /opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/pmi_proxy --control-port lizhi:56831 --debug --pmi-connect lazy-cache --pmi-aggregate -s 0 --rmk user --launcher ssh --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 1724218219 --usize -2 --proxy-id 0 [mpiexec@lizhi] STDIN will be redirected to 1 fd(s): 11 [proxy:0:0@lizhi] Start PMI_proxy 0 [proxy:0:0@lizhi] STDIN will be redirected to 1 fd(s): 17 [proxy:0:0@lizhi] got pmi command (from 12): init pmi_version=1 pmi_subversion=1 [proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0 [proxy:0:0@lizhi] got pmi command (from 14): init pmi_version=1 pmi_subversion=1 [proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0 [proxy:0:0@lizhi] got pmi command (from 12): get_maxes [proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024 [proxy:0:0@lizhi] got pmi command (from 14): get_maxes [proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024 [proxy:0:0@lizhi] got pmi command (from 12): barrier_in [proxy:0:0@lizhi] got pmi command (from 14): barrier_in [proxy:0:0@lizhi] forwarding command (cmd=barrier_in) upstream [mpiexec@lizhi] [pgid: 0] got PMI command: cmd=barrier_in [mpiexec@lizhi] PMI response to fd 8 pid 14: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] got pmi command (from 12): get_ranks2hosts [proxy:0:0@lizhi] PMI response: put_ranks2hosts 15 1 5 lizhi 0,1, [proxy:0:0@lizhi] got pmi command (from 14): get_ranks2hosts [proxy:0:0@lizhi] PMI response: put_ranks2hosts 15 1 5 lizhi 0,1, [proxy:0:0@lizhi] got pmi command (from 12): get_appnum [proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0 [proxy:0:0@lizhi] got pmi command (from 14): get_appnum [proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0 [proxy:0:0@lizhi] got pmi command (from 12): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_1838_0 [proxy:0:0@lizhi] got pmi command (from 14): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_1838_0 [proxy:0:0@lizhi] got pmi command (from 12): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_1838_0 [proxy:0:0@lizhi] got pmi command (from 14): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_1838_0 [0] MPI startup(): Intel(R) MPI Library, Version 5.1.2 Build 20151015 (build id: 13147) [0] MPI startup(): Copyright (C) 2003-2015 Intel Corporation. All rights reserved. [0] MPI startup(): Multi-threaded optimized library [0] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map [1] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map [proxy:0:0@lizhi] got pmi command (from 14): barrier_in [proxy:0:0@lizhi] got pmi command (from 12): put kvsname=kvs_1838_0 key=sharedFilename[0] value=/dev/shm/Intel_MPI_o9ODg3 [proxy:0:0@lizhi] forwarding command (cmd=put kvsname=kvs_1838_0 key=sharedFilename[0] value=/dev/shm/Intel_MPI_o9ODg3) upstream [mpiexec@lizhi] [pgid: 0] got PMI command: cmd=put kvsname=kvs_1838_0 key=sharedFilename[0] value=/dev/shm/Intel_MPI_o9ODg3 [mpiexec@lizhi] PMI response to fd 8 pid 12: cmd=put_result rc=0 msg=success [proxy:0:0@lizhi] we don't understand the response put_result; forwarding downstream [proxy:0:0@lizhi] got pmi command (from 12): barrier_in [proxy:0:0@lizhi] forwarding command (cmd=barrier_in) upstream [mpiexec@lizhi] [pgid: 0] got PMI command: cmd=barrier_in [mpiexec@lizhi] PMI response to fd 8 pid 12: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] got pmi command (from 14): get kvsname=kvs_1838_0 key=sharedFilename[0] [proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_o9ODg3 [0] MPI startup(): shm data transfer mode [1] MPI startup(): shm data transfer mode [proxy:0:0@lizhi] got pmi command (from 12): put kvsname=kvs_1838_0 key=P0-businesscard-0 value=fabrics_list#shm$ [proxy:0:0@lizhi] got pmi command (from 14): put kvsname=kvs_1838_0 key=P1-businesscard-0 value=fabrics_list#shm$ [mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_1838_0 key=P0-businesscard-0 value=fabrics_list#shm$ [mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success [mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_1838_0 key=P1-businesscard-0 value=fabrics_list#shm$ [mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success [proxy:0:0@lizhi] got pmi command (from 14): barrier_in [proxy:0:0@lizhi] got pmi command (from 12): barrier_in [proxy:0:0@lizhi] forwarding command (cmd=barrier_in) upstream [mpiexec@lizhi] [pgid: 0] got PMI command: cmd=barrier_in [mpiexec@lizhi] PMI response to fd 8 pid 12: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [0] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8 [1] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8 [0] MPI startup(): Device_reset_idx=8 [0] MPI startup(): Allgather: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Allgatherv: 3: 0-259847 & 0-2147483647 [0] MPI startup(): Allgatherv: 4: 0-2147483647 & 0-2147483647 [0] MPI startup(): Allreduce: 1: 0-1536 & 0-2147483647 [0] MPI startup(): Allreduce: 7: 1536-2194 & 0-2147483647 [0] MPI startup(): Allreduce: 1: 2194-34792 & 0-2147483647 [0] MPI startup(): Allreduce: 4: 34792-121510 & 0-2147483647 [0] MPI startup(): Allreduce: 1: 121510-145618 & 0-2147483647 [0] MPI startup(): Allreduce: 2: 145618-668210 & 0-2147483647 [0] MPI startup(): Allreduce: 7: 668210-1546854 & 0-2147483647 [0] MPI startup(): Allreduce: 4: 1546854-2473237 & 0-2147483647 [0] MPI startup(): Allreduce: 2: 0-2147483647 & 0-2147483647 [0] MPI startup(): Alltoall: 3: 0-117964 & 0-2147483647 [0] MPI startup(): Alltoall: 4: 117965-3131275 & 0-2147483647 [0] MPI startup(): Alltoall: 3: 0-2147483647 & 0-2147483647 [0] MPI startup(): Alltoallv: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Alltoallw: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Barrier: 2: 0-2147483647 & 0-2147483647 [0] MPI startup(): Bcast: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Exscan: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Gather: 3: 1-921 & 0-2147483647 [0] MPI startup(): Gather: 1: 922-3027 & 0-2147483647 [0] MPI startup(): Gather: 3: 3028-5071 & 0-2147483647 [0] MPI startup(): Gather: 2: 5072-11117 & 0-2147483647 [0] MPI startup(): Gather: 1: 11118-86016 & 0-2147483647 [0] MPI startup(): Gather: 3: 86017-283989 & 0-2147483647 [0] MPI startup(): Gather: 1: 283990-664950 & 0-2147483647 [0] MPI startup(): Gather: 3: 0-2147483647 & 0-2147483647 [0] MPI startup(): Gatherv: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Reduce_scatter: 1: 0-6 & 0-2147483647 [0] MPI startup(): Reduce_scatter: 2: 0-2147483647 & 0-2147483647 [0] MPI startup(): Reduce: 1: 0-2147483647 & 0-2147483647 [0] MPI startup(): Scan: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Scatter: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Scatterv: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Rank Pid Node name Pin cpu [0] MPI startup(): 0 1843 lizhi {0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23} [0] MPI startup(): 1 1844 lizhi {8,9,10,11,12,13,14,15,24,25,26,27,28,29,30,31} [0] MPI startup(): Recognition=2 Platform(code=8 ippn=1 dev=1) Fabric(intra=1 inter=1 flags=0x0) [0] MPI startup(): Topology split mode = 1 | rank | node | space=1 | 0 | 0 | | 1 | 0 | [1] MPI startup(): Recognition=2 Platform(code=8 ippn=1 dev=1) Fabric(intra=1 inter=1 flags=0x0) [0] MPI startup(): I_MPI_DEBUG=100 [0] MPI startup(): I_MPI_INFO_BRAND=Intel(R) Xeon(R) [0] MPI startup(): I_MPI_INFO_CACHE1=0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23,0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23 [0] MPI startup(): I_MPI_INFO_CACHE2=0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23,0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23 [0] MPI startup(): I_MPI_INFO_CACHE3=0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1 [0] MPI startup(): I_MPI_INFO_CACHES=3 [0] MPI startup(): I_MPI_INFO_CACHE_SHARE=2,2,32 [0] MPI startup(): I_MPI_INFO_CACHE_SIZE=32768,262144,20971520 [0] MPI startup(): I_MPI_INFO_CORE=0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7 [0] MPI startup(): I_MPI_INFO_C_NAME=Unknown [0] MPI startup(): I_MPI_INFO_DESC=1342177285 [0] MPI startup(): I_MPI_INFO_FLGB=0 [0] MPI startup(): I_MPI_INFO_FLGC=532603903 [0] MPI startup(): I_MPI_INFO_FLGCEXT=0 [0] MPI startup(): I_MPI_INFO_FLGD=-1075053569 [0] MPI startup(): I_MPI_INFO_FLGDEXT=0 [0] MPI startup(): I_MPI_INFO_LCPU=32 [0] MPI startup(): I_MPI_INFO_MODE=775 [0] MPI startup(): I_MPI_INFO_NUMA_NODE_DIST=10,21,21,10 [0] MPI startup(): I_MPI_INFO_NUMA_NODE_NUM=2 [0] MPI startup(): I_MPI_INFO_PACK=0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1 [0] MPI startup(): I_MPI_INFO_SERIAL=E5-2650 0 [0] MPI startup(): I_MPI_INFO_SIGN=132823 [0] MPI startup(): I_MPI_INFO_STATE=0 [0] MPI startup(): I_MPI_INFO_THREAD=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 [0] MPI startup(): I_MPI_INFO_VEND=1 [0] MPI startup(): I_MPI_PIN_INFO=x0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23 [0] MPI startup(): I_MPI_PIN_MAPPING=2:0 0,1 8 #------------------------------------------------------------ # Intel (R) MPI Benchmarks 4.1 Update 1, MPI-1 part #------------------------------------------------------------ # Date : Fri Jan 22 15:40:05 2016 # Machine : x86_64 # System : Linux # Release : 3.13.0-35-generic # Version : #62~precise1-Ubuntu SMP Mon Aug 18 14:52:04 UTC 2014 # MPI Version : 3.0 # MPI Thread Environment: # New default behavior from Version 3.2 on: # the number of iterations per message size is cut down # dynamically when a certain run time (per message size sample) # is expected to be exceeded. Time limit is defined by variable # "SECS_PER_SAMPLE" (=> IMB_settings.h) # or through the flag => -time # Calling sequence was: # IMB-MPI1 pingpong # Minimum message length in bytes: 0 # Maximum message length in bytes: 4194304 # # MPI_Datatype : MPI_BYTE # MPI_Datatype for reductions : MPI_FLOAT # MPI_Op : MPI_SUM # # # List of Benchmarks to run: # PingPong #--------------------------------------------------- # Benchmarking PingPong # #processes = 2 #--------------------------------------------------- #bytes #repetitions t[usec] Mbytes/sec 0 1000 1.14 0.00 1 1000 1.02 0.93 2 1000 1.02 1.87 4 1000 1.02 3.74 8 1000 0.91 8.41 16 1000 0.91 16.74 32 1000 0.91 33.57 64 1000 0.98 62.48 128 1000 1.02 120.03 256 1000 1.00 244.27 512 1000 1.24 394.42 1024 1000 1.43 683.64 2048 1000 1.80 1088.06 4096 1000 2.54 1539.13 8192 1000 4.19 1864.10 16384 1000 7.51 2081.66 32768 1000 14.34 2178.76 [mpiexec@lizhi] Sending Ctrl-C to processes as requested [mpiexec@lizhi] Press Ctrl-C again to force abort
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
About SELinux I do not think that it is running on the system I am using (Ubuntu 12.04)
The directory /selinux exists, however it is completely empty.
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Hi Jeremie,
Do I correctly understand that IMB pingpong hang for 2nd scenario (under the user)? If yes could you please try the same scenario with I_MPI_SHM_LMT=shm variable?
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Thanks again for your help:
Here is the result of mpirun -n 2 IMB-MPI1 pingpong
using these variables: I_MPI_SHM_LMT=shm I_MPI_DEBUG=100 I_MPI_HYDRA_DEBUG=1
run by regular user (not root)
host: lizhi ================================================================================================== mpiexec options: ---------------- Base path: /opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/ Launcher: ssh Debug level: 1 Enable X: -1 Global environment: ------------------- I_MPI_PERHOST=allcores LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib:/usr/lib/jdk1.7.0/jre/lib/amd64/server/:/usr/local/cuda-7.0/lib64:/usr/local/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib LESSOPEN=| /usr/bin/lesspipe %s MAIL=/var/mail/jeremie SSH_CLIENT=172.16.0.157 40740 22 I_MPI_SHM_LMT=shm USER=jeremie LANGUAGE=en_US:en LC_TIME=en_US.UTF-8 SHLVL=1 OLDPWD=/home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi HOME=/home/jeremie XDG_SESSION_COOKIE=e6534fbbf1939771344e386e0000016f-1453803019.76569-485305227 SSH_TTY=/dev/pts/1 LC_MONETARY=en_US.UTF-8 LOGNAME=jeremie _=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/mpirun TERM=xterm PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/cuda-7.0/bin:/home/jeremie/pgashpc/Compilers/UPCCompilerLizhi/bin:/home/jeremie/pgashpc/EnergyManagement/IntelPCM LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8 LANG=en_US.UTF-8 LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36: SHELL=/bin/bash LC_NAME=en_US.UTF-8 LESSCLOSE=/usr/bin/lesspipe %s %s LC_MEASUREMENT=en_US.UTF-8 I_MPI_MPIRUN=mpirun LC_IDENTIFICATION=en_US.UTF-8 I_MPI_DEBUG=100 LC_ALL=en_US.UTF-8 PWD=/home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi/bin I_MPI_HYDRA_DEBUG=1 SSH_CONNECTION=172.16.0.157 40740 192.168.202.79 22 LC_NUMERIC=en_US.UTF-8 I_MPI_ROOT=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi LC_PAPER=en_US.UTF-8 MANPATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/man:/usr/local/man:/usr/local/share/man:/usr/share/man:/home/jeremie/pgashpc/Compilers/UPCCompilerLizhi/man Hydra internal environment: --------------------------- MPIR_CVAR_NEMESIS_ENABLE_CKPOINT=1 GFORTRAN_UNBUFFERED_PRECONNECTED=y I_MPI_HYDRA_UUID=7b3a0000-b96e-5e0a-3d2a-050001017f00 DAPL_NETWORK_PROCESS_NUM=2 Proxy information: ********************* [1] proxy: lizhi (16 cores) Exec list: IMB-MPI1 (2 processes); ================================================================================================== [mpiexec@lizhi] Timeout set to -1 (-1 means infinite) [mpiexec@lizhi] Got a control port string of lizhi:52626 Proxy launch args: /opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/pmi_proxy --control-port lizhi:52626 --debug --pmi-connect lazy-cache --pmi-aggregate -s 0 --rmk user --launcher ssh --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 76075566 --usize -2 --proxy-id Arguments being passed to proxy 0: --version 3.1.2 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname lizhi --global-core-map 0,16,16 --pmi-id-map 0,0 --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_14971_0 --pmi-process-mapping (vector,(0,1,16)) --topolib ipl --ckpointlib blcr --ckpoint-prefix /tmp --ckpoint-preserve 1 --ckpoint off --ckpoint-num -1 --global-inherited-env 38 'I_MPI_PERHOST=allcores' 'LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib:/usr/lib/jdk1.7.0/jre/lib/amd64/server/:/usr/local/cuda-7.0/lib64:/usr/local/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib' 'LESSOPEN=| /usr/bin/lesspipe %s' 'MAIL=/var/mail/jeremie' 'SSH_CLIENT=172.16.0.157 40740 22' 'I_MPI_SHM_LMT=shm' 'USER=jeremie' 'LANGUAGE=en_US:en' 'LC_TIME=en_US.UTF-8' 'SHLVL=1' 'OLDPWD=/home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi' 'HOME=/home/jeremie' 'XDG_SESSION_COOKIE=e6534fbbf1939771344e386e0000016f-1453803019.76569-485305227' 'SSH_TTY=/dev/pts/1' 'LC_MONETARY=en_US.UTF-8' 'LOGNAME=jeremie' '_=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/mpirun' 'TERM=xterm' 'PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/cuda-7.0/bin:/home/jeremie/pgashpc/Compilers/UPCCompilerLizhi/bin:/home/jeremie/pgashpc/EnergyManagement/IntelPCM' 'LC_ADDRESS=en_US.UTF-8' 'LC_TELEPHONE=en_US.UTF-8' 'LANG=en_US.UTF-8' 'LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36:' 'SHELL=/bin/bash' 'LC_NAME=en_US.UTF-8' 'LESSCLOSE=/usr/bin/lesspipe %s %s' 'LC_MEASUREMENT=en_US.UTF-8' 'I_MPI_MPIRUN=mpirun' 'LC_IDENTIFICATION=en_US.UTF-8' 'I_MPI_DEBUG=100' 'LC_ALL=en_US.UTF-8' 'PWD=/home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi/bin' 'I_MPI_HYDRA_DEBUG=1' 'SSH_CONNECTION=172.16.0.157 40740 192.168.202.79 22' 'LC_NUMERIC=en_US.UTF-8' 'I_MPI_ROOT=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi' 'LC_PAPER=en_US.UTF-8' 'MANPATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/man:/usr/local/man:/usr/local/share/man:/usr/share/man:/home/jeremie/pgashpc/Compilers/UPCCompilerLizhi/man' --global-user-env 0 --global-system-env 4 'MPIR_CVAR_NEMESIS_ENABLE_CKPOINT=1' 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' 'I_MPI_HYDRA_UUID=7b3a0000-b96e-5e0a-3d2a-050001017f00' 'DAPL_NETWORK_PROCESS_NUM=2' --proxy-core-count 16 --mpi-cmd-env mpirun -n 2 IMB-MPI1 pingpong --exec --exec-appnum 0 --exec-proc-count 2 --exec-local-env 0 --exec-wdir /home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi/bin --exec-args 2 IMB-MPI1 pingpong [mpiexec@lizhi] Launch arguments: /opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/pmi_proxy --control-port lizhi:52626 --debug --pmi-connect lazy-cache --pmi-aggregate -s 0 --rmk user --launcher ssh --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 76075566 --usize -2 --proxy-id 0 [mpiexec@lizhi] STDIN will be redirected to 1 fd(s): 11 [proxy:0:0@lizhi] Start PMI_proxy 0 [proxy:0:0@lizhi] STDIN will be redirected to 1 fd(s): 17 [proxy:0:0@lizhi] got pmi command (from 12): init pmi_version=1 pmi_subversion=1 [proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0 [proxy:0:0@lizhi] got pmi command (from 14): init pmi_version=1 pmi_subversion=1 [proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0 [proxy:0:0@lizhi] got pmi command (from 12): get_maxes [proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024 [proxy:0:0@lizhi] got pmi command (from 14): get_maxes [proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024 [proxy:0:0@lizhi] got pmi command (from 12): barrier_in [proxy:0:0@lizhi] got pmi command (from 14): barrier_in [proxy:0:0@lizhi] forwarding command (cmd=barrier_in) upstream [mpiexec@lizhi] [pgid: 0] got PMI command: cmd=barrier_in [mpiexec@lizhi] PMI response to fd 8 pid 14: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] got pmi command (from 12): get_ranks2hosts [proxy:0:0@lizhi] PMI response: put_ranks2hosts 15 1 5 lizhi 0,1, [proxy:0:0@lizhi] got pmi command (from 14): get_ranks2hosts [proxy:0:0@lizhi] PMI response: put_ranks2hosts 15 1 5 lizhi 0,1, [proxy:0:0@lizhi] got pmi command (from 12): get_appnum [proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0 [proxy:0:0@lizhi] got pmi command (from 14): get_appnum [proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0 [proxy:0:0@lizhi] got pmi command (from 12): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_14971_0 [proxy:0:0@lizhi] got pmi command (from 14): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_14971_0 [proxy:0:0@lizhi] got pmi command (from 12): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_14971_0 [proxy:0:0@lizhi] got pmi command (from 14): get_my_kvsname [proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_14971_0 [0] MPI startup(): Intel(R) MPI Library, Version 5.1.2 Build 20151015 (build id: 13147) [0] MPI startup(): Copyright (C) 2003-2015 Intel Corporation. All rights reserved. [0] MPI startup(): Multi-threaded optimized library [0] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map [1] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map [proxy:0:0@lizhi] got pmi command (from 12): put kvsname=kvs_14971_0 key=sharedFilename[0] value=/dev/shm/Intel_MPI_M2v37K [proxy:0:0@lizhi] forwarding command (cmd=put kvsname=kvs_14971_0 key=sharedFilename[0] value=/dev/shm/Intel_MPI_M2v37K) upstream [mpiexec@lizhi] [pgid: 0] got PMI command: cmd=put kvsname=kvs_14971_0 key=sharedFilename[0] value=/dev/shm/Intel_MPI_M2v37K [mpiexec@lizhi] PMI response to fd 8 pid 12: cmd=put_result rc=0 msg=success [proxy:0:0@lizhi] got pmi command (from 14): barrier_in [proxy:0:0@lizhi] we don't understand the response put_result; forwarding downstream [proxy:0:0@lizhi] got pmi command (from 12): barrier_in [proxy:0:0@lizhi] forwarding command (cmd=barrier_in) upstream [mpiexec@lizhi] [pgid: 0] got PMI command: cmd=barrier_in [mpiexec@lizhi] PMI response to fd 8 pid 12: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] got pmi command (from 14): get kvsname=kvs_14971_0 key=sharedFilename[0] [proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_M2v37K [proxy:0:0@lizhi] got pmi command (from 12): put kvsname=kvs_14971_0 key=P0-businesscard-0 value=fabrics_list#shm$ [proxy:0:0@lizhi] got pmi command (from 14): put kvsname=kvs_14971_0 key=P1-businesscard-0 value=fabrics_list#shm$ [mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_14971_0 key=P0-businesscard-0 value=fabrics_list#shm$ [mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success [mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_14971_0 key=P1-businesscard-0 value=fabrics_list#shm$ [mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success [0] MPI startup(): shm data transfer mode [1] MPI startup(): shm data transfer mode [proxy:0:0@lizhi] got pmi command (from 12): barrier_in [proxy:0:0@lizhi] got pmi command (from 14): barrier_in [proxy:0:0@lizhi] forwarding command (cmd=barrier_in) upstream [mpiexec@lizhi] [pgid: 0] got PMI command: cmd=barrier_in [mpiexec@lizhi] PMI response to fd 8 pid 14: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [proxy:0:0@lizhi] PMI response: cmd=barrier_out [0] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8 [1] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8 [0] MPI startup(): Device_reset_idx=8 [0] MPI startup(): Allgather: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Allgatherv: 3: 0-259847 & 0-2147483647 [0] MPI startup(): Allgatherv: 4: 0-2147483647 & 0-2147483647 [0] MPI startup(): Allreduce: 1: 0-1536 & 0-2147483647 [0] MPI startup(): Allreduce: 7: 1536-2194 & 0-2147483647 [0] MPI startup(): Allreduce: 1: 2194-34792 & 0-2147483647 [0] MPI startup(): Allreduce: 4: 34792-121510 & 0-2147483647 [0] MPI startup(): Allreduce: 1: 121510-145618 & 0-2147483647 [0] MPI startup(): Allreduce: 2: 145618-668210 & 0-2147483647 [0] MPI startup(): Allreduce: 7: 668210-1546854 & 0-2147483647 [0] MPI startup(): Allreduce: 4: 1546854-2473237 & 0-2147483647 [0] MPI startup(): Allreduce: 2: 0-2147483647 & 0-2147483647 [0] MPI startup(): Alltoall: 3: 0-117964 & 0-2147483647 [0] MPI startup(): Alltoall: 4: 117965-3131275 & 0-2147483647 [0] MPI startup(): Alltoall: 3: 0-2147483647 & 0-2147483647 [0] MPI startup(): Alltoallv: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Alltoallw: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Barrier: 2: 0-2147483647 & 0-2147483647 [0] MPI startup(): Bcast: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Exscan: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Gather: 3: 1-921 & 0-2147483647 [0] MPI startup(): Gather: 1: 922-3027 & 0-2147483647 [0] MPI startup(): Gather: 3: 3028-5071 & 0-2147483647 [0] MPI startup(): Gather: 2: 5072-11117 & 0-2147483647 [0] MPI startup(): Gather: 1: 11118-86016 & 0-2147483647 [0] MPI startup(): Gather: 3: 86017-283989 & 0-2147483647 [0] MPI startup(): Gather: 1: 283990-664950 & 0-2147483647 [0] MPI startup(): Gather: 3: 0-2147483647 & 0-2147483647 [0] MPI startup(): Gatherv: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Reduce_scatter: 1: 0-6 & 0-2147483647 [0] MPI startup(): Reduce_scatter: 2: 0-2147483647 & 0-2147483647 [0] MPI startup(): Reduce: 1: 0-2147483647 & 0-2147483647 [0] MPI startup(): Scan: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Scatter: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Scatterv: 0: 0-2147483647 & 0-2147483647 [0] MPI startup(): Rank Pid Node name Pin cpu [0] MPI startup(): 0 14976 lizhi {0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23} [0] MPI startup(): 1 14977 lizhi {8,9,10,11,12,13,14,15,24,25,26,27,28,29,30,31} [0] MPI startup(): Recognition=2 Platform(code=8 ippn=1 dev=1) Fabric(intra=1 inter=1 flags=0x0) [0] MPI startup(): Topology split mode = 1 | rank | node | space=1 | 0 | 0 | | 1 | 0 | [1] MPI startup(): Recognition=2 Platform(code=8 ippn=1 dev=1) Fabric(intra=1 inter=1 flags=0x0) [0] MPI startup(): I_MPI_DEBUG=100 [0] MPI startup(): I_MPI_INFO_BRAND=Intel(R) Xeon(R) [0] MPI startup(): I_MPI_INFO_CACHE1=0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23,0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23 [0] MPI startup(): I_MPI_INFO_CACHE2=0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23,0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23 [0] MPI startup(): I_MPI_INFO_CACHE3=0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1 [0] MPI startup(): I_MPI_INFO_CACHES=3 [0] MPI startup(): I_MPI_INFO_CACHE_SHARE=2,2,32 [0] MPI startup(): I_MPI_INFO_CACHE_SIZE=32768,262144,20971520 [0] MPI startup(): I_MPI_INFO_CORE=0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7 [0] MPI startup(): I_MPI_INFO_C_NAME=Unknown [0] MPI startup(): I_MPI_INFO_DESC=1342177285 [0] MPI startup(): I_MPI_INFO_FLGB=0 [0] MPI startup(): I_MPI_INFO_FLGC=532603903 [0] MPI startup(): I_MPI_INFO_FLGCEXT=0 [0] MPI startup(): I_MPI_INFO_FLGD=-1075053569 [0] MPI startup(): I_MPI_INFO_FLGDEXT=0 [0] MPI startup(): I_MPI_INFO_LCPU=32 [0] MPI startup(): I_MPI_INFO_MODE=775 [0] MPI startup(): I_MPI_INFO_NUMA_NODE_DIST=10,21,21,10 [0] MPI startup(): I_MPI_INFO_NUMA_NODE_NUM=2 [0] MPI startup(): I_MPI_INFO_PACK=0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1 [0] MPI startup(): I_MPI_INFO_SERIAL=E5-2650 0 [0] MPI startup(): I_MPI_INFO_SIGN=132823 [0] MPI startup(): I_MPI_INFO_STATE=0 [0] MPI startup(): I_MPI_INFO_THREAD=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 [0] MPI startup(): I_MPI_INFO_VEND=1 [0] MPI startup(): I_MPI_PIN_INFO=x0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23 [0] MPI startup(): I_MPI_PIN_MAPPING=2:0 0,1 8 [0] MPI startup(): I_MPI_SHM_LMT=shm #------------------------------------------------------------ # Intel (R) MPI Benchmarks 4.1 Update 1, MPI-1 part #------------------------------------------------------------ # Date : Tue Jan 26 14:55:38 2016 # Machine : x86_64 # System : Linux # Release : 3.13.0-35-generic # Version : #62~precise1-Ubuntu SMP Mon Aug 18 14:52:04 UTC 2014 # MPI Version : 3.0 # MPI Thread Environment: # New default behavior from Version 3.2 on: # the number of iterations per message size is cut down # dynamically when a certain run time (per message size sample) # is expected to be exceeded. Time limit is defined by variable # "SECS_PER_SAMPLE" (=> IMB_settings.h) # or through the flag => -time # Calling sequence was: # IMB-MPI1 pingpong # Minimum message length in bytes: 0 # Maximum message length in bytes: 4194304 # # MPI_Datatype : MPI_BYTE # MPI_Datatype for reductions : MPI_FLOAT # MPI_Op : MPI_SUM # # # List of Benchmarks to run: # PingPong #--------------------------------------------------- # Benchmarking PingPong # #processes = 2 #--------------------------------------------------- #bytes #repetitions t[usec] Mbytes/sec 0 1000 1.13 0.00 1 1000 1.21 0.79 2 1000 0.92 2.07 4 1000 0.92 4.13 8 1000 0.92 8.29 16 1000 0.91 16.73 32 1000 0.92 33.08 64 1000 1.00 60.97 128 1000 1.00 121.59 256 1000 1.04 235.21 512 1000 1.29 377.65 1024 1000 1.44 680.29 2048 1000 1.81 1079.10 4096 1000 2.70 1445.43 8192 1000 4.05 1930.71 16384 1000 7.64 2045.95 32768 1000 14.11 2214.35 65536 640 17.21 3630.76 131072 320 31.05 4025.58 262144 160 49.35 5065.51 524288 80 78.83 6342.63 1048576 40 153.96 6494.99 2097152 20 305.65 6543.38 4194304 10 606.39 6596.37 # All processes entering MPI_Finalize [proxy:0:0@lizhi] got pmi command (from 14): finalize [proxy:0:0@lizhi] PMI response: cmd=finalize_ack [proxy:0:0@lizhi] got pmi command (from 12): finalize [proxy:0:0@lizhi] PMI response: cmd=finalize_ack
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Hi Jeremie,
It looks like I_MPI_SHM_LMT=shm has helped. Could you please try this W/A for the initial scenario?
I've reproduced the problem on our cluster - looks like Ubuntu* specificity. I'll submit an internal ticket for this problem. Thanks for the reporting.
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Hi All,
since we are running into exactly same issue as discussed in this thread here, I'd like to ask, if there was any additional outcome from that internal ticket that was created in January by Artem in post #11? - Or should I also create a ticket in Intel Premier Support now about this?
We are currently experiencing this issue with latest Ubuntu version 16.10 (but also 16.04 shows it) and using IMPI 5.1.3.180
Should one then set this environment variable by default on any Ubuntu system? - Or what is Intel's advice?
Also asking, if one has jobs spanning multiple hosts - what is the situation there?
Thanks and best regards,
Frank
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Hi Artem (Intel),
what was the outcome of the ticket that you created (mentioned in post #12 above)?
Couly ou please elaborate on this with respect to my questions in post #13 above?
Thanks and best regards,
Frank
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Hi Frank,
Sorry for the late answer.
You can find some recipes for the issue in the Intel® MPI Library Release Notes:
- On some Linux* distributions, the Intel(R) MPI Library will fail for non-root
users due to security limitations.
This has been seen on Ubuntu* 12.04, and could impact other
distributions and versions as well. Two workarounds have been identified
for this issue.
o Enable ptrace for non-root users with:
echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
o Or, revert the Intel(R) MPI Library to an earlier shared memory
mechanism which is not impacted by setting:
I_MPI_SHM_LMT=shm
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
We have encountered the same issue and can confirm that enabling ptrace for non-root users works around the problem.
Since we provide an application for users that do not necessarly have the permissions to enable ptrace, we have identified a third workaround. From within the application one can disable the ptrace restrictions by calling
prctl(PR_SET_PTRACER, PR_SET_PTRACER_ANY, 0, 0, 0);
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
I can add that this issue appears with Arch Linux as well. The workarounds mentioned here solve the issue.
%uname -a
Linux epsilon 4.8.10-1-ARCH #1 SMP PREEMPT Mon Nov 21 11:55:43 CET 2016 x86_64 GNU/Linux
impi Version 2017 Update 1 Build 2016101
- Suscribirse a un feed RSS
- Marcar tema como nuevo
- Marcar tema como leído
- Flotar este Tema para el usuario actual
- Favorito
- Suscribir
- Página de impresión sencilla