Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
127 Views

MPI having bad performance in user mode, runs perfectly in root

Hi everyone,

This is very probably an installation issue of the Parallel Studio 2016 (update 1) on my system...So here are the details:

I have installed Intel Parallel Studio 2016 (update 1) on my server: two sockets with Intel Xeon processors (Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz), running Ubuntu 12.04 (I know that this is not supposed to be a supported OS...at least this is what the requirements checking thingy in the Parallel Studio install says...).

The problem is really simple:

If I am logged in as a regular user and if I use mpirun to run any program...The program execution takes forever. Looking at htop it seems that the used cores spend their time in "kernel" mode.

If I am logged in as root, using the same mpirun command, the programs runs perfectly.

I tried a "violent" solution by changing the permission of the directory /opt/intel to chmod ugo+rwx

But this did not anything.

Thanks in advance for your help!

TL;DR; Intel mpirun, runs normally in root, but when using a regular user it is extremely slow.

0 Kudos
15 Replies
Highlighted
Employee
127 Views

Hi Jeremie,

Could you please provide more details about your hung MPI run - try to run it with I_MPI_DEBUG=100 and I_MPI_HYDRA_DEBUG=1 variables and provide the output (with the command line and environment variables).

Also try the following scenarios:
mpirun ... hostname
mpirun ... IMB-MPI1 pingpong

It may be an environmental problem - check your SSH settings (SSH should be passwordless).

BTW you wrote:

If I am logged in as a regular user and if I use mpirun to run any program...The program execution takes forever.

So has the MPI scenario finally finished?

0 Kudos
Highlighted
127 Views

EDIT: From the debug output it looks like it's a ssh problem (?)...the thing is that I am using MPI on only one node..I should not need SSH at all, right ?

__

Just to be clear, I am not using any SSH in my experience:

I run MPI on only one machine.

The program I am running is a kernel from the NAS Benchmarks.

__

So far I did not wait for the scenario to finish...because it is really slow. (Of course running as root the programs finishes without any problem. and with good performances).

I left a kernel of the NAS Benchmark to run over the night...it is still not done running...so I guess there is no point in waiting longer :)

__

Is it possible that the problem can come from the fact that I installed Intel Parallel Studio as root ? (Anyway, I had no choice to do otherwise, as I wanted to install Intel Parallel Studio in /opt/intel

__

With the two environment variables I_MPI_DEBUG=100 and I_MPI_HYDRA_DEBUG=1  and running the following command: 

/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/mpirun -n 16 ./cg.C.16

Here is the output:

host: lizhi

==================================================================================================
mpiexec options:
----------------
  Base path: /opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/
  Launcher: ssh
  Debug level: 1
  Enable X: -1

  Global environment:
  -------------------
    I_MPI_PERHOST=allcores
    LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib:/usr/lib/jdk1.7.0/jre/lib/amd64/server/:/usr/local/cuda-7.0/lib64:/usr/local/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib
    LESSOPEN=| /usr/bin/lesspipe %s
    MAIL=/var/mail/jeremie
    SSH_CLIENT=172.16.0.157 51260 22
    USER=jeremie
    LANGUAGE=en_US:en
    LC_TIME=en_US.UTF-8
    SHLVL=1
    OLDPWD=/home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi
    HOME=/home/jeremie
    XDG_SESSION_COOKIE=e6534fbbf1939771344e386e0000016f-1453390831.9609-361631696
    SSH_TTY=/dev/pts/4
    LC_MONETARY=en_US.UTF-8
    LOGNAME=jeremie
    _=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/mpirun
    TERM=xterm
    PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/cuda-7.0/bin:/home/jeremie/pgashpc/Compilers/UPCCompilerLizhi/bin:/home/jeremie/pgashpc/EnergyManagement/IntelPCM
    LC_ADDRESS=en_US.UTF-8
    LC_TELEPHONE=en_US.UTF-8
    LANG=en_US.UTF-8
    LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36:
    SHELL=/bin/bash
    LC_NAME=en_US.UTF-8
    LESSCLOSE=/usr/bin/lesspipe %s %s
    LC_MEASUREMENT=en_US.UTF-8
    I_MPI_MPIRUN=mpirun
    LC_IDENTIFICATION=en_US.UTF-8
    I_MPI_DEBUG=100
    LC_ALL=en_US.UTF-8
    PWD=/home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi/bin
    I_MPI_HYDRA_DEBUG=1
    SSH_CONNECTION=172.16.0.157 51260 192.168.202.79 22
    LC_NUMERIC=en_US.UTF-8
    I_MPI_ROOT=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi
    LC_PAPER=en_US.UTF-8
    MANPATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/man:/usr/local/man:/usr/local/share/man:/usr/share/man:/home/jeremie/pgashpc/Compilers/UPCCompilerLizhi/man

  Hydra internal environment:
  ---------------------------
    MPIR_CVAR_NEMESIS_ENABLE_CKPOINT=1
    GFORTRAN_UNBUFFERED_PRECONNECTED=y
    I_MPI_HYDRA_UUID=7e7a0000-2358-0eb4-e829-050001017f00
    DAPL_NETWORK_PROCESS_NUM=16


    Proxy information:
    *********************
      [1] proxy: lizhi (16 cores)
      Exec list: ./cg.C.16 (16 processes); 


==================================================================================================

[mpiexec@lizhi] Timeout set to -1 (-1 means infinite)
[mpiexec@lizhi] Got a control port string of lizhi:52655

Proxy launch args: /opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/pmi_proxy --control-port lizhi:52655 --debug --pmi-connect lazy-cache --pmi-aggregate -s 0 --rmk user --launcher ssh --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 949760717 --usize -2 --proxy-id 

Arguments being passed to proxy 0:
--version 3.1.2 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname lizhi --global-core-map 0,16,16 --pmi-id-map 0,0 --global-process-count 16 --auto-cleanup 1 --pmi-kvsname kvs_31358_0 --pmi-process-mapping (vector,(0,1,16)) --topolib ipl --ckpointlib blcr --ckpoint-prefix /tmp --ckpoint-preserve 1 --ckpoint off --ckpoint-num -1 --global-inherited-env 37 'I_MPI_PERHOST=allcores' 'LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib:/usr/lib/jdk1.7.0/jre/lib/amd64/server/:/usr/local/cuda-7.0/lib64:/usr/local/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib' 'LESSOPEN=| /usr/bin/lesspipe %s' 'MAIL=/var/mail/jeremie' 'SSH_CLIENT=172.16.0.157 51260 22' 'USER=jeremie' 'LANGUAGE=en_US:en' 'LC_TIME=en_US.UTF-8' 'SHLVL=1' 'OLDPWD=/home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi' 'HOME=/home/jeremie' 'XDG_SESSION_COOKIE=e6534fbbf1939771344e386e0000016f-1453390831.9609-361631696' 'SSH_TTY=/dev/pts/4' 'LC_MONETARY=en_US.UTF-8' 'LOGNAME=jeremie' '_=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/mpirun' 'TERM=xterm' 'PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/cuda-7.0/bin:/home/jeremie/pgashpc/Compilers/UPCCompilerLizhi/bin:/home/jeremie/pgashpc/EnergyManagement/IntelPCM' 'LC_ADDRESS=en_US.UTF-8' 'LC_TELEPHONE=en_US.UTF-8' 'LANG=en_US.UTF-8' 'LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36:' 'SHELL=/bin/bash' 'LC_NAME=en_US.UTF-8' 'LESSCLOSE=/usr/bin/lesspipe %s %s' 'LC_MEASUREMENT=en_US.UTF-8' 'I_MPI_MPIRUN=mpirun' 'LC_IDENTIFICATION=en_US.UTF-8' 'I_MPI_DEBUG=100' 'LC_ALL=en_US.UTF-8' 'PWD=/home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi/bin' 'I_MPI_HYDRA_DEBUG=1' 'SSH_CONNECTION=172.16.0.157 51260 192.168.202.79 22' 'LC_NUMERIC=en_US.UTF-8' 'I_MPI_ROOT=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi' 'LC_PAPER=en_US.UTF-8' 'MANPATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/man:/usr/local/man:/usr/local/share/man:/usr/share/man:/home/jeremie/pgashpc/Compilers/UPCCompilerLizhi/man' --global-user-env 0 --global-system-env 4 'MPIR_CVAR_NEMESIS_ENABLE_CKPOINT=1' 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' 'I_MPI_HYDRA_UUID=7e7a0000-2358-0eb4-e829-050001017f00' 'DAPL_NETWORK_PROCESS_NUM=16' --proxy-core-count 16 --mpi-cmd-env mpirun -n 16 ./cg.C.16  --exec --exec-appnum 0 --exec-proc-count 16 --exec-local-env 0 --exec-wdir /home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi/bin --exec-args 1 ./cg.C.16 

[mpiexec@lizhi] Launch arguments: /opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/pmi_proxy --control-port lizhi:52655 --debug --pmi-connect lazy-cache --pmi-aggregate -s 0 --rmk user --launcher ssh --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 949760717 --usize -2 --proxy-id 0 
[mpiexec@lizhi] STDIN will be redirected to 1 fd(s): 11 
[proxy:0:0@lizhi] Start PMI_proxy 0
[proxy:0:0@lizhi] STDIN will be redirected to 1 fd(s): 17 
[proxy:0:0@lizhi] got pmi command (from 12): init
pmi_version=1 pmi_subversion=1 
[proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@lizhi] got pmi command (from 14): init
pmi_version=1 pmi_subversion=1 
[proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@lizhi] got pmi command (from 16): init
pmi_version=1 pmi_subversion=1 
[proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@lizhi] got pmi command (from 12): get_maxes

[proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0@lizhi] got pmi command (from 14): get_maxes

[proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0@lizhi] got pmi command (from 21): init
pmi_version=1 pmi_subversion=1 
[proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@lizhi] got pmi command (from 27): init
pmi_version=1 pmi_subversion=1 
[proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@lizhi] got pmi command (from 12): barrier_in

[proxy:0:0@lizhi] got pmi command (from 14): barrier_in

[proxy:0:0@lizhi] got pmi command (from 16): get_maxes

[proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0@lizhi] got pmi command (from 21): get_maxes

[proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0@lizhi] got pmi command (from 24): init
pmi_version=1 pmi_subversion=1 
[proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@lizhi] got pmi command (from 30): init
pmi_version=1 pmi_subversion=1 
[proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@lizhi] got pmi command (from 16): barrier_in

[proxy:0:0@lizhi] got pmi command (from 21): barrier_in

[proxy:0:0@lizhi] got pmi command (from 24): get_maxes

[proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0@lizhi] got pmi command (from 27): get_maxes

[proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0@lizhi] got pmi command (from 36): init
pmi_version=1 pmi_subversion=1 
[proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@lizhi] got pmi command (from 39): init
pmi_version=1 pmi_subversion=1 
[proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@lizhi] got pmi command (from 42): init
pmi_version=1 pmi_subversion=1 
[proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@lizhi] got pmi command (from 45): init
pmi_version=1 pmi_subversion=1 
[proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@lizhi] got pmi command (from 48): init
pmi_version=1 pmi_subversion=1 
[proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@lizhi] got pmi command (from 51): init
pmi_version=1 pmi_subversion=1 
[proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@lizhi] got pmi command (from 54): init
pmi_version=1 pmi_subversion=1 
[proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@lizhi] got pmi command (from 57): init
pmi_version=1 pmi_subversion=1 
[proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@lizhi] got pmi command (from 24): barrier_in

[proxy:0:0@lizhi] got pmi command (from 27): barrier_in

[proxy:0:0@lizhi] got pmi command (from 30): get_maxes

[proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0@lizhi] got pmi command (from 33): init
pmi_version=1 pmi_subversion=1 
[proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@lizhi] got pmi command (from 36): get_maxes

[proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0@lizhi] got pmi command (from 39): get_maxes

[proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0@lizhi] got pmi command (from 42): get_maxes

[proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0@lizhi] got pmi command (from 45): get_maxes

[proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0@lizhi] got pmi command (from 48): get_maxes

[proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0@lizhi] got pmi command (from 51): get_maxes

[proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0@lizhi] got pmi command (from 54): get_maxes

[proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0@lizhi] got pmi command (from 30): barrier_in

[proxy:0:0@lizhi] got pmi command (from 33): get_maxes

[proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0@lizhi] got pmi command (from 36): barrier_in

[proxy:0:0@lizhi] got pmi command (from 39): barrier_in

[proxy:0:0@lizhi] got pmi command (from 42): barrier_in

[proxy:0:0@lizhi] got pmi command (from 45): barrier_in

[proxy:0:0@lizhi] got pmi command (from 48): barrier_in

[proxy:0:0@lizhi] got pmi command (from 51): barrier_in

[proxy:0:0@lizhi] got pmi command (from 57): get_maxes

[proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0@lizhi] got pmi command (from 33): barrier_in

[proxy:0:0@lizhi] got pmi command (from 54): barrier_in

[proxy:0:0@lizhi] got pmi command (from 57): barrier_in

[proxy:0:0@lizhi] forwarding command (cmd=barrier_in) upstream
[mpiexec@lizhi] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec@lizhi] PMI response to fd 8 pid 57: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] got pmi command (from 12): get_ranks2hosts

[proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1
5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, 
[proxy:0:0@lizhi] got pmi command (from 14): get_ranks2hosts

[proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1
5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, 
[proxy:0:0@lizhi] got pmi command (from 16): get_ranks2hosts

[proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1
5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, 
[proxy:0:0@lizhi] got pmi command (from 21): get_ranks2hosts

[proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1
5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, 
[proxy:0:0@lizhi] got pmi command (from 27): get_ranks2hosts

[proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1
5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, 
[proxy:0:0@lizhi] got pmi command (from 12): get_appnum

[proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0
[proxy:0:0@lizhi] got pmi command (from 14): get_appnum

[proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0
[proxy:0:0@lizhi] got pmi command (from 16): get_appnum

[proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0
[proxy:0:0@lizhi] got pmi command (from 21): get_appnum

[proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0
[proxy:0:0@lizhi] got pmi command (from 24): get_ranks2hosts

[proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1
5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, 
[proxy:0:0@lizhi] got pmi command (from 30): get_ranks2hosts

[proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1
5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, 
[proxy:0:0@lizhi] got pmi command (from 33): get_ranks2hosts

[proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1
5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, 
[proxy:0:0@lizhi] got pmi command (from 36): get_ranks2hosts

[proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1
5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, 
[proxy:0:0@lizhi] got pmi command (from 39): get_ranks2hosts

[proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1
5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, 
[proxy:0:0@lizhi] got pmi command (from 42): get_ranks2hosts

[proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1
5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, 
[proxy:0:0@lizhi] got pmi command (from 45): get_ranks2hosts

[proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1
5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, 
[proxy:0:0@lizhi] got pmi command (from 48): get_ranks2hosts

[proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1
5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, 
[proxy:0:0@lizhi] got pmi command (from 51): get_ranks2hosts

[proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1
5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, 
[proxy:0:0@lizhi] got pmi command (from 54): get_ranks2hosts

[proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1
5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, 
[proxy:0:0@lizhi] got pmi command (from 57): get_ranks2hosts

[proxy:0:0@lizhi] PMI response: put_ranks2hosts 49 1
5 lizhi 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, 
[proxy:0:0@lizhi] got pmi command (from 12): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[proxy:0:0@lizhi] got pmi command (from 14): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[proxy:0:0@lizhi] got pmi command (from 16): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[proxy:0:0@lizhi] got pmi command (from 21): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[proxy:0:0@lizhi] got pmi command (from 24): get_appnum

[proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0
[proxy:0:0@lizhi] got pmi command (from 27): get_appnum

[proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0
[proxy:0:0@lizhi] got pmi command (from 30): get_appnum

[proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0
[proxy:0:0@lizhi] got pmi command (from 33): get_appnum

[proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0
[proxy:0:0@lizhi] got pmi command (from 36): get_appnum

[proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0
[proxy:0:0@lizhi] got pmi command (from 39): get_appnum

[proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0
[proxy:0:0@lizhi] got pmi command (from 42): get_appnum

[proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0
[proxy:0:0@lizhi] got pmi command (from 45): get_appnum

[proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0
[proxy:0:0@lizhi] got pmi command (from 48): get_appnum

[proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0
[proxy:0:0@lizhi] got pmi command (from 51): get_appnum

[proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0
[proxy:0:0@lizhi] got pmi command (from 54): get_appnum

[proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0
[proxy:0:0@lizhi] got pmi command (from 12): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[proxy:0:0@lizhi] got pmi command (from 14): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[proxy:0:0@lizhi] got pmi command (from 16): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[proxy:0:0@lizhi] got pmi command (from 21): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[proxy:0:0@lizhi] got pmi command (from 24): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[proxy:0:0@lizhi] got pmi command (from 27): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[proxy:0:0@lizhi] got pmi command (from 30): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[proxy:0:0@lizhi] got pmi command (from 33): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[proxy:0:0@lizhi] got pmi command (from 36): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[proxy:0:0@lizhi] got pmi command (from 39): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[proxy:0:0@lizhi] got pmi command (from 42): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[proxy:0:0@lizhi] got pmi command (from 45): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[proxy:0:0@lizhi] got pmi command (from 48): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[proxy:0:0@lizhi] got pmi command (from 51): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[proxy:0:0@lizhi] got pmi command (from 57): get_appnum

[proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0
[proxy:0:0@lizhi] got pmi command (from 16): barrier_in

[proxy:0:0@lizhi] got pmi command (from 21): barrier_in

[proxy:0:0@lizhi] got pmi command (from 24): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[proxy:0:0@lizhi] got pmi command (from 27): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[proxy:0:0@lizhi] got pmi command (from 30): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[proxy:0:0@lizhi] got pmi command (from 33): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[proxy:0:0@lizhi] got pmi command (from 36): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[proxy:0:0@lizhi] got pmi command (from 39): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[proxy:0:0@lizhi] got pmi command (from 42): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[proxy:0:0@lizhi] got pmi command (from 45): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[proxy:0:0@lizhi] got pmi command (from 48): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[proxy:0:0@lizhi] got pmi command (from 51): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[proxy:0:0@lizhi] got pmi command (from 54): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[0] MPI startup(): Intel(R) MPI Library, Version 5.1.2  Build 20151015 (build id: 13147)
[0] MPI startup(): Copyright (C) 2003-2015 Intel Corporation.  All rights reserved.
[0] MPI startup(): Multi-threaded optimized library
[0] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map
[1] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map
[2] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map
[3] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map
[proxy:0:0@lizhi] got pmi command (from 12): put
kvsname=kvs_31358_0 key=sharedFilename[0] value=/dev/shm/Intel_MPI_sUOBn3 
[proxy:0:0@lizhi] forwarding command (cmd=put kvsname=kvs_31358_0 key=sharedFilename[0] value=/dev/shm/Intel_MPI_sUOBn3) upstream
[mpiexec@lizhi] [pgid: 0] got PMI command: cmd=put kvsname=kvs_31358_0 key=sharedFilename[0] value=/dev/shm/Intel_MPI_sUOBn3
[mpiexec@lizhi] PMI response to fd 8 pid 12: cmd=put_result rc=0 msg=success
[proxy:0:0@lizhi] got pmi command (from 14): barrier_in

[proxy:0:0@lizhi] got pmi command (from 57): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[4] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map
[proxy:0:0@lizhi] we don't understand the response put_result; forwarding downstream
[proxy:0:0@lizhi] got pmi command (from 24): barrier_in

[proxy:0:0@lizhi] got pmi command (from 27): barrier_in

[proxy:0:0@lizhi] got pmi command (from 54): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[5] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map
[proxy:0:0@lizhi] got pmi command (from 12): barrier_in

[proxy:0:0@lizhi] got pmi command (from 30): barrier_in

[proxy:0:0@lizhi] got pmi command (from 57): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_31358_0
[6] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map
[7] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map
[proxy:0:0@lizhi] got pmi command (from 33): barrier_in

[proxy:0:0@lizhi] got pmi command (from 36): barrier_in

[proxy:0:0@lizhi] got pmi command (from 39): barrier_in

[proxy:0:0@lizhi] got pmi command (from 42): barrier_in

[proxy:0:0@lizhi] got pmi command (from 45): barrier_in

[proxy:0:0@lizhi] got pmi command (from 48): barrier_in

[proxy:0:0@lizhi] got pmi command (from 51): barrier_in

[8] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map
[9] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map
[10] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map
[11] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map
[12] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map
[13] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map
[14] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map
[proxy:0:0@lizhi] got pmi command (from 54): barrier_in

[15] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map
[proxy:0:0@lizhi] got pmi command (from 57): barrier_in

[proxy:0:0@lizhi] forwarding command (cmd=barrier_in) upstream
[mpiexec@lizhi] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec@lizhi] PMI response to fd 8 pid 57: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] got pmi command (from 14): get
kvsname=kvs_31358_0 key=sharedFilename[0] 
[proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_sUOBn3
[proxy:0:0@lizhi] got pmi command (from 16): get
kvsname=kvs_31358_0 key=sharedFilename[0] 
[proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_sUOBn3
[proxy:0:0@lizhi] got pmi command (from 21): get
kvsname=kvs_31358_0 key=sharedFilename[0] 
[proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_sUOBn3
[proxy:0:0@lizhi] got pmi command (from 24): get
kvsname=kvs_31358_0 key=sharedFilename[0] 
[proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_sUOBn3
[proxy:0:0@lizhi] got pmi command (from 27): get
kvsname=kvs_31358_0 key=sharedFilename[0] 
[proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_sUOBn3
[proxy:0:0@lizhi] got pmi command (from 30): get
kvsname=kvs_31358_0 key=sharedFilename[0] 
[proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_sUOBn3
[proxy:0:0@lizhi] got pmi command (from 33): get
kvsname=kvs_31358_0 key=sharedFilename[0] 
[proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_sUOBn3
[proxy:0:0@lizhi] got pmi command (from 36): get
kvsname=kvs_31358_0 key=sharedFilename[0] 
[proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_sUOBn3
[proxy:0:0@lizhi] got pmi command (from 39): get
kvsname=kvs_31358_0 key=sharedFilename[0] 
[proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_sUOBn3
[proxy:0:0@lizhi] got pmi command (from 42): get
kvsname=kvs_31358_0 key=sharedFilename[0] 
[proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_sUOBn3
[proxy:0:0@lizhi] got pmi command (from 45): get
kvsname=kvs_31358_0 key=sharedFilename[0] 
[proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_sUOBn3
[proxy:0:0@lizhi] got pmi command (from 48): get
kvsname=kvs_31358_0 key=sharedFilename[0] 
[proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_sUOBn3
[proxy:0:0@lizhi] got pmi command (from 51): get
kvsname=kvs_31358_0 key=sharedFilename[0] 
[proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_sUOBn3
[proxy:0:0@lizhi] got pmi command (from 54): get
kvsname=kvs_31358_0 key=sharedFilename[0] 
[proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_sUOBn3
[proxy:0:0@lizhi] got pmi command (from 57): get
kvsname=kvs_31358_0 key=sharedFilename[0] 
[proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_sUOBn3
[13] MPI startup(): shm data transfer mode
[15] MPI startup(): shm data transfer mode
[10] MPI startup(): shm data transfer mode
[9] MPI startup(): shm data transfer mode
[0] MPI startup(): shm data transfer mode
[3] MPI startup(): shm data transfer mode
[1] MPI startup(): shm data transfer mode
[14] MPI startup(): shm data transfer mode
[11] MPI startup(): shm data transfer mode
[12] MPI startup(): shm data transfer mode
[8] MPI startup(): shm data transfer mode
[5] MPI startup(): shm data transfer mode
[2] MPI startup(): shm data transfer mode
[4] MPI startup(): shm data transfer mode
[6] MPI startup(): shm data transfer mode
[7] MPI startup(): shm data transfer mode
[proxy:0:0@lizhi] got pmi command (from 12): put
kvsname=kvs_31358_0 key=P0-businesscard-0 value=fabrics_list#shm$ 
[proxy:0:0@lizhi] got pmi command (from 14): put
kvsname=kvs_31358_0 key=P1-businesscard-0 value=fabrics_list#shm$ 
[proxy:0:0@lizhi] got pmi command (from 16): put
kvsname=kvs_31358_0 key=P2-businesscard-0 value=fabrics_list#shm$ 
[proxy:0:0@lizhi] got pmi command (from 21): put
kvsname=kvs_31358_0 key=P3-businesscard-0 value=fabrics_list#shm$ 
[proxy:0:0@lizhi] got pmi command (from 24): put
kvsname=kvs_31358_0 key=P4-businesscard-0 value=fabrics_list#shm$ 
[proxy:0:0@lizhi] got pmi command (from 27): put
kvsname=kvs_31358_0 key=P5-businesscard-0 value=fabrics_list#shm$ 
[proxy:0:0@lizhi] got pmi command (from 30): put
kvsname=kvs_31358_0 key=P6-businesscard-0 value=fabrics_list#shm$ 
[proxy:0:0@lizhi] got pmi command (from 33): put
kvsname=kvs_31358_0 key=P7-businesscard-0 value=fabrics_list#shm$ 
[proxy:0:0@lizhi] got pmi command (from 36): put
kvsname=kvs_31358_0 key=P8-businesscard-0 value=fabrics_list#shm$ 
[proxy:0:0@lizhi] got pmi command (from 39): put
kvsname=kvs_31358_0 key=P9-businesscard-0 value=fabrics_list#shm$ 
[proxy:0:0@lizhi] got pmi command (from 42): put
kvsname=kvs_31358_0 key=P10-businesscard-0 value=fabrics_list#shm$ 
[proxy:0:0@lizhi] got pmi command (from 45): put
kvsname=kvs_31358_0 key=P11-businesscard-0 value=fabrics_list#shm$ 
[proxy:0:0@lizhi] got pmi command (from 48): put
kvsname=kvs_31358_0 key=P12-businesscard-0 value=fabrics_list#shm$ 
[proxy:0:0@lizhi] got pmi command (from 51): put
kvsname=kvs_31358_0 key=P13-businesscard-0 value=fabrics_list#shm$ 
[proxy:0:0@lizhi] got pmi command (from 54): put
kvsname=kvs_31358_0 key=P14-businesscard-0 value=fabrics_list#shm$ 
[proxy:0:0@lizhi] got pmi command (from 57): put
kvsname=kvs_31358_0 key=P15-businesscard-0 value=fabrics_list#shm$ 
[mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P0-businesscard-0 value=fabrics_list#shm$
[mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success

[mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P1-businesscard-0 value=fabrics_list#shm$
[mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success

[mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P2-businesscard-0 value=fabrics_list#shm$
[mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success

[mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P3-businesscard-0 value=fabrics_list#shm$
[mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success

[mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P4-businesscard-0 value=fabrics_list#shm$
[mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success

[mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P5-businesscard-0 value=fabrics_list#shm$
[mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success

[mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P6-businesscard-0 value=fabrics_list#shm$
[mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success

[mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P7-businesscard-0 value=fabrics_list#shm$
[mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success

[mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P8-businesscard-0 value=fabrics_list#shm$
[mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success

[mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P9-businesscard-0 value=fabrics_list#shm$
[mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success

[mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P10-businesscard-0 value=fabrics_list#shm$
[mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success

[mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P11-businesscard-0 value=fabrics_list#shm$
[mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success

[mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P12-businesscard-0 value=fabrics_list#shm$
[mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success

[mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P13-businesscard-0 value=fabrics_list#shm$
[mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success

[mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P14-businesscard-0 value=fabrics_list#shm$
[mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success

[mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_31358_0 key=P15-businesscard-0 value=fabrics_list#shm$
[mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success

[proxy:0:0@lizhi] got pmi command (from 12): barrier_in

[proxy:0:0@lizhi] got pmi command (from 14): barrier_in

[proxy:0:0@lizhi] got pmi command (from 16): barrier_in

[proxy:0:0@lizhi] got pmi command (from 39): barrier_in

[proxy:0:0@lizhi] got pmi command (from 21): barrier_in

[proxy:0:0@lizhi] got pmi command (from 24): barrier_in

[proxy:0:0@lizhi] got pmi command (from 27): barrier_in

[proxy:0:0@lizhi] got pmi command (from 30): barrier_in

[proxy:0:0@lizhi] got pmi command (from 33): barrier_in

[proxy:0:0@lizhi] got pmi command (from 36): barrier_in

[proxy:0:0@lizhi] got pmi command (from 42): barrier_in

[proxy:0:0@lizhi] got pmi command (from 45): barrier_in

[proxy:0:0@lizhi] got pmi command (from 48): barrier_in

[proxy:0:0@lizhi] got pmi command (from 51): barrier_in

[proxy:0:0@lizhi] got pmi command (from 54): barrier_in

[proxy:0:0@lizhi] got pmi command (from 57): barrier_in

[proxy:0:0@lizhi] forwarding command (cmd=barrier_in) upstream
[mpiexec@lizhi] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec@lizhi] PMI response to fd 8 pid 57: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[0] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8
[1] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8
[2] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8
[3] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8
[4] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8
[5] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8
[6] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8
[7] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8
[8] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8
[9] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8
[10] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8
[11] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8
[12] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8
[13] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8
[14] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8
[15] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8
[0] MPI startup(): Device_reset_idx=8
[0] MPI startup(): Allgather: 1: 1-413 & 0-2147483647
[0] MPI startup(): Allgather: 2: 414-676 & 0-2147483647
[0] MPI startup(): Allgather: 1: 677-3539 & 0-2147483647
[0] MPI startup(): Allgather: 3: 3540-29998 & 0-2147483647
[0] MPI startup(): Allgather: 4: 29999-44716 & 0-2147483647
[0] MPI startup(): Allgather: 3: 44717-113786 & 0-2147483647
[0] MPI startup(): Allgather: 4: 113787-158125 & 0-2147483647
[0] MPI startup(): Allgather: 3: 158126-567736 & 0-2147483647
[0] MPI startup(): Allgather: 4: 567737-1876335 & 0-2147483647
[0] MPI startup(): Allgather: 3: 0-2147483647 & 0-2147483647
[0] MPI startup(): Allgatherv: 1: 0-435 & 0-2147483647
[0] MPI startup(): Allgatherv: 2: 435-817 & 0-2147483647
[0] MPI startup(): Allgatherv: 4: 817-1933 & 0-2147483647
[0] MPI startup(): Allgatherv: 1: 1933-2147 & 0-2147483647
[0] MPI startup(): Allgatherv: 3: 2147-31752 & 0-2147483647
[0] MPI startup(): Allgatherv: 4: 31752-63760 & 0-2147483647
[0] MPI startup(): Allgatherv: 3: 63760-146441 & 0-2147483647
[0] MPI startup(): Allgatherv: 4: 146441-569451 & 0-2147483647
[0] MPI startup(): Allgatherv: 3: 569451-1578575 & 0-2147483647
[0] MPI startup(): Allgatherv: 4: 1578575-3583798 & 0-2147483647
[0] MPI startup(): Allgatherv: 3: 0-2147483647 & 0-2147483647
[0] MPI startup(): Allreduce: 6: 0-4 & 0-2147483647
[0] MPI startup(): Allreduce: 1: 4-14 & 0-2147483647
[0] MPI startup(): Allreduce: 6: 14-24 & 0-2147483647
[0] MPI startup(): Allreduce: 1: 24-4645 & 0-2147483647
[0] MPI startup(): Allreduce: 6: 4645-10518 & 0-2147483647
[0] MPI startup(): Allreduce: 2: 10518-22173 & 0-2147483647
[0] MPI startup(): Allreduce: 4: 22173-190389 & 0-2147483647
[0] MPI startup(): Allreduce: 6: 190389-1404366 & 0-2147483647
[0] MPI startup(): Allreduce: 4: 1404366-3122567 & 0-2147483647
[0] MPI startup(): Allreduce: 7: 0-2147483647 & 0-2147483647
[0] MPI startup(): Alltoall: 1: 0-236 & 0-2147483647
[0] MPI startup(): Alltoall: 4: 237-530 & 0-2147483647
[0] MPI startup(): Alltoall: 2: 531-4590 & 0-2147483647
[0] MPI startup(): Alltoall: 4: 4591-35550 & 0-2147483647
[0] MPI startup(): Alltoall: 2: 35551-214258 & 0-2147483647
[0] MPI startup(): Alltoall: 3: 214259-1177466 & 0-2147483647
[0] MPI startup(): Alltoall: 2: 0-2147483647 & 0-2147483647
[0] MPI startup(): Alltoallv: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Alltoallw: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Barrier: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Bcast: 1: 1-3 & 0-2147483647
[0] MPI startup(): Bcast: 7: 4-584 & 0-2147483647
[0] MPI startup(): Bcast: 1: 585-3283 & 0-2147483647
[0] MPI startup(): Bcast: 7: 3284-3061726 & 0-2147483647
[0] MPI startup(): Bcast: 5: 0-2147483647 & 0-2147483647
[0] MPI startup(): Exscan: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Gather: 3: 1-2044 & 0-2147483647
[0] MPI startup(): Gather: 2: 2045-7606 & 0-2147483647
[0] MPI startup(): Gather: 3: 7607-525080 & 0-2147483647
[0] MPI startup(): Gather: 2: 525081-1147564 & 0-2147483647
[0] MPI startup(): Gather: 3: 1147565-3349096 & 0-2147483647
[0] MPI startup(): Gather: 2: 0-2147483647 & 0-2147483647
[0] MPI startup(): Gatherv: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Reduce_scatter: 4: 0-6 & 0-2147483647
[0] MPI startup(): Reduce_scatter: 5: 6-22 & 0-2147483647
[0] MPI startup(): Reduce_scatter: 1: 22-614 & 0-2147483647
[0] MPI startup(): Reduce_scatter: 3: 614-132951 & 0-2147483647
[0] MPI startup(): Reduce_scatter: 2: 132951-523266 & 0-2147483647
[0] MPI startup(): Reduce_scatter: 5: 523266-660854 & 0-2147483647
[0] MPI startup(): Reduce_scatter: 2: 660854-2488736 & 0-2147483647
[0] MPI startup(): Reduce_scatter: 5: 0-2147483647 & 0-2147483647
[0] MPI startup(): Reduce: 1: 0-2147483647 & 0-2147483647
[0] MPI startup(): Scan: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Scatter: 3: 1-5461 & 0-2147483647
[0] MPI startup(): Scatter: 2: 5462-8972 & 0-2147483647
[0] MPI startup(): Scatter: 3: 8973-361813 & 0-2147483647
[0] MPI startup(): Scatter: 2: 0-2147483647 & 0-2147483647
[0] MPI startup(): Scatterv: 0: 0-2147483647 & 0-2147483647
[1] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0)
[3] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0)
[5] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0)
[7] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0)
[2] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0)
[9] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0)
[11] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0)
[13] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0)
[15] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0)
[4] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0)
[6] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0)
[12] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0)
[14] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0)
[0] MPI startup(): Rank    Pid      Node name  Pin cpu
[0] MPI startup(): 0       31363    lizhi      {0,16}
[0] MPI startup(): 1       31364    lizhi      {1,17}
[0] MPI startup(): 2       31365    lizhi      {2,18}
[0] MPI startup(): 3       31366    lizhi      {3,19}
[0] MPI startup(): 4       31367    lizhi      {4,20}
[8] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0)
[10] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0)
[0] MPI startup(): 5       31368    lizhi      {5,21}
[0] MPI startup(): 6       31369    lizhi      {6,22}
[0] MPI startup(): 7       31370    lizhi      {7,23}
[0] MPI startup(): 8       31371    lizhi      {8,24}
[0] MPI startup(): 9       31372    lizhi      {9,25}
[0] MPI startup(): 10      31373    lizhi      {10,26}
[0] MPI startup(): 11      31374    lizhi      {11,27}
[0] MPI startup(): 12      31375    lizhi      {12,28}
[0] MPI startup(): 13      31376    lizhi      {13,29}
[0] MPI startup(): 14      31377    lizhi      {14,30}
[0] MPI startup(): 15      31378    lizhi      {15,31}
[0] MPI startup(): Recognition=2 Platform(code=8 ippn=4 dev=1) Fabric(intra=1 inter=1 flags=0x0)
[0] MPI startup(): Topology split mode = 1

| rank | node | space=1
|  0  |  0  |
|  1  |  0  |
|  2  |  0  |
|  3  |  0  |
|  4  |  0  |
|  5  |  0  |
|  6  |  0  |
|  7  |  0  |
|  8  |  0  |
|  9  |  0  |
|  10  |  0  |
|  11  |  0  |
|  12  |  0  |
|  13  |  0  |
|  14  |  0  |
|  15  |  0  |
[0] MPI startup(): I_MPI_DEBUG=100
[0] MPI startup(): I_MPI_INFO_BRAND=Intel(R) Xeon(R) 
[0] MPI startup(): I_MPI_INFO_CACHE1=0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23,0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23
[0] MPI startup(): I_MPI_INFO_CACHE2=0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23,0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23
[0] MPI startup(): I_MPI_INFO_CACHE3=0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1
[0] MPI startup(): I_MPI_INFO_CACHES=3
[0] MPI startup(): I_MPI_INFO_CACHE_SHARE=2,2,32
[0] MPI startup(): I_MPI_INFO_CACHE_SIZE=32768,262144,20971520
[0] MPI startup(): I_MPI_INFO_CORE=0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7
[0] MPI startup(): I_MPI_INFO_C_NAME=Unknown
[0] MPI startup(): I_MPI_INFO_DESC=1342177285
[0] MPI startup(): I_MPI_INFO_FLGB=0
[0] MPI startup(): I_MPI_INFO_FLGC=532603903
[0] MPI startup(): I_MPI_INFO_FLGCEXT=0
[0] MPI startup(): I_MPI_INFO_FLGD=-1075053569
[0] MPI startup(): I_MPI_INFO_FLGDEXT=0
[0] MPI startup(): I_MPI_INFO_LCPU=32
[0] MPI startup(): I_MPI_INFO_MODE=775
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_DIST=10,21,21,10
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_NUM=2
[0] MPI startup(): I_MPI_INFO_PACK=0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1
[0] MPI startup(): I_MPI_INFO_SERIAL=E5-2650 0 
[0] MPI startup(): I_MPI_INFO_SIGN=132823
[0] MPI startup(): I_MPI_INFO_STATE=0
[0] MPI startup(): I_MPI_INFO_THREAD=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
[0] MPI startup(): I_MPI_INFO_VEND=1
[0] MPI startup(): I_MPI_PIN_INFO=x0,16
[0] MPI startup(): I_MPI_PIN_MAPPING=16:0 0,1 1,2 2,3 3,4 4,5 5,6 6,7 7,8 8,9 9,10 10,11 11,12 12,13 13,14 14,15 15


 NAS Parallel Benchmarks 3.3 -- CG Benchmark

 Size:     150000
 Iterations:    75
 Number of active processes:    16
 Number of nonzeroes per row:       15
 Eigenvalue shift: .110E+03
[mpiexec@lizhi] Sending Ctrl-C to processes as requested
[mpiexec@lizhi] Press Ctrl-C again to force abort

Thanks in advance for your help.

0 Kudos
Highlighted
Employee
127 Views

Hi Jeremie,

As far as I see your MPI application has been started successfully (SSH isn't used for such local runs) but it got stuck for some reasons (maybe due to memory exhaustion - for example, got stuck in swap area). You said that it worked fine under root so I'd suspect different system limits for the user and root accounts - could you please compare 'ulimit -a' for both users and align it if necessary.

0 Kudos
Highlighted
127 Views

Thanks for your reply :)

Both ulimit -a seems to be the same:

jeremie@lizhi$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 256960
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 256960
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited




root@lizhi:# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 256960
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 256960
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
root@lizhi:/home/jeremie/pgashpc/Code/NPB_UPC_C2_101_IntelBased/CG# 

 

0 Kudos
Highlighted
Employee
127 Views

Hi Jeremie,

As far as I see your scenario got stuck into the application so I'd recommend you to try to simplify the MPI run - try to run less number of processes (2, 4, 8) and/or check and vary the parameters of the MPI application (if any).

Also try the following test scenario under root/user accounts and provide the output for both cases:
I_MPI_DEBUG=100 I_MPI_HYDRA_DEBUG=1 mpirun -n 2 IMB-MPI1 pingpong

Another suggestion is to check SELinux status and try to disable it - potentially it may be a reason of the user/root difference.

0 Kudos
Highlighted
127 Views

I don't think I need to try on less core number as I already tried the same program compiled for 2 cores with the same problems.

Here is the result of mpirun -n 2 IMB-MPI1 pingpong (with the correct environment variables, and running as root)

host: lizhi

==================================================================================================
mpiexec options:
----------------
  Base path: /opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/
  Launcher: ssh
  Debug level: 1
  Enable X: -1

  Global environment:
  -------------------
    I_MPI_PERHOST=allcores
    LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib:/usr/lib/jdk1.7.0/jre/lib/amd64/server/:/usr/local/cuda-7.0/lib64:/usr/local/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib
    LESSOPEN=| /usr/bin/lesspipe %s
    SUDO_GID=1011
    MAIL=/var/mail/root
    USER=root
    LANGUAGE=en_US:en
    LC_TIME=en_US.UTF-8
    SHLVL=1
    OLDPWD=/home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi
    HOME=/root
    XDG_SESSION_COOKIE=e6534fbbf1939771344e386e0000016f-1453462849.154656-758447237
    LC_MONETARY=en_US.UTF-8
    SUDO_UID=1011
    LOGNAME=root
    _=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/mpirun
    TERM=xterm
    USERNAME=root
    PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/cuda-7.0/bin
    LC_ADDRESS=en_US.UTF-8
    LC_TELEPHONE=en_US.UTF-8
    LANG=en_US.UTF-8
    LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36:
    SUDO_COMMAND=/bin/su
    SHELL=/bin/bash
    LC_NAME=en_US.UTF-8
    SUDO_USER=jeremie
    LESSCLOSE=/usr/bin/lesspipe %s %s
    LC_MEASUREMENT=en_US.UTF-8
    I_MPI_MPIRUN=mpirun
    LC_IDENTIFICATION=en_US.UTF-8
    I_MPI_DEBUG=100
    LC_ALL=en_US.UTF-8
    PWD=/home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi/bin
    I_MPI_HYDRA_DEBUG=1
    LC_NUMERIC=en_US.UTF-8
    I_MPI_ROOT=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi
    LC_PAPER=en_US.UTF-8
    MANPATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/man:/usr/local/man:/usr/local/share/man:/usr/share/man

  Hydra internal environment:
  ---------------------------
    MPIR_CVAR_NEMESIS_ENABLE_CKPOINT=1
    GFORTRAN_UNBUFFERED_PRECONNECTED=y
    I_MPI_HYDRA_UUID=00070000-a28b-7528-ed29-050001017f00
    DAPL_NETWORK_PROCESS_NUM=2


    Proxy information:
    *********************
      [1] proxy: lizhi (16 cores)
      Exec list: IMB-MPI1 (2 processes); 


==================================================================================================

[mpiexec@lizhi] Timeout set to -1 (-1 means infinite)
[mpiexec@lizhi] Got a control port string of lizhi:33758

Proxy launch args: /opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/pmi_proxy --control-port lizhi:33758 --debug --pmi-connect lazy-cache --pmi-aggregate -s 0 --rmk user --launcher ssh --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 1766992124 --usize -2 --proxy-id 

Arguments being passed to proxy 0:
--version 3.1.2 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname lizhi --global-core-map 0,16,16 --pmi-id-map 0,0 --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_1792_0 --pmi-process-mapping (vector,(0,1,16)) --topolib ipl --ckpointlib blcr --ckpoint-prefix /tmp --ckpoint-preserve 1 --ckpoint off --ckpoint-num -1 --global-inherited-env 39 'I_MPI_PERHOST=allcores' 'LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib:/usr/lib/jdk1.7.0/jre/lib/amd64/server/:/usr/local/cuda-7.0/lib64:/usr/local/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib' 'LESSOPEN=| /usr/bin/lesspipe %s' 'SUDO_GID=1011' 'MAIL=/var/mail/root' 'USER=root' 'LANGUAGE=en_US:en' 'LC_TIME=en_US.UTF-8' 'SHLVL=1' 'OLDPWD=/home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi' 'HOME=/root' 'XDG_SESSION_COOKIE=e6534fbbf1939771344e386e0000016f-1453462849.154656-758447237' 'LC_MONETARY=en_US.UTF-8' 'SUDO_UID=1011' 'LOGNAME=root' '_=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/mpirun' 'TERM=xterm' 'USERNAME=root' 'PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/cuda-7.0/bin' 'LC_ADDRESS=en_US.UTF-8' 'LC_TELEPHONE=en_US.UTF-8' 'LANG=en_US.UTF-8' 'LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36:' 'SUDO_COMMAND=/bin/su' 'SHELL=/bin/bash' 'LC_NAME=en_US.UTF-8' 'SUDO_USER=jeremie' 'LESSCLOSE=/usr/bin/lesspipe %s %s' 'LC_MEASUREMENT=en_US.UTF-8' 'I_MPI_MPIRUN=mpirun' 'LC_IDENTIFICATION=en_US.UTF-8' 'I_MPI_DEBUG=100' 'LC_ALL=en_US.UTF-8' 'PWD=/home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi/bin' 'I_MPI_HYDRA_DEBUG=1' 'LC_NUMERIC=en_US.UTF-8' 'I_MPI_ROOT=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi' 'LC_PAPER=en_US.UTF-8' 'MANPATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/man:/usr/local/man:/usr/local/share/man:/usr/share/man' --global-user-env 0 --global-system-env 4 'MPIR_CVAR_NEMESIS_ENABLE_CKPOINT=1' 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' 'I_MPI_HYDRA_UUID=00070000-a28b-7528-ed29-050001017f00' 'DAPL_NETWORK_PROCESS_NUM=2' --proxy-core-count 16 --mpi-cmd-env mpirun -n 2 IMB-MPI1 pingpong  --exec --exec-appnum 0 --exec-proc-count 2 --exec-local-env 0 --exec-wdir /home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi/bin --exec-args 2 IMB-MPI1 pingpong 

[mpiexec@lizhi] Launch arguments: /opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/pmi_proxy --control-port lizhi:33758 --debug --pmi-connect lazy-cache --pmi-aggregate -s 0 --rmk user --launcher ssh --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 1766992124 --usize -2 --proxy-id 0 
[mpiexec@lizhi] STDIN will be redirected to 1 fd(s): 11 
[proxy:0:0@lizhi] Start PMI_proxy 0
[proxy:0:0@lizhi] STDIN will be redirected to 1 fd(s): 17 
[proxy:0:0@lizhi] got pmi command (from 12): init
pmi_version=1 pmi_subversion=1 
[proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@lizhi] got pmi command (from 14): init
pmi_version=1 pmi_subversion=1 
[proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@lizhi] got pmi command (from 12): get_maxes

[proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0@lizhi] got pmi command (from 14): get_maxes

[proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0@lizhi] got pmi command (from 12): barrier_in

[proxy:0:0@lizhi] got pmi command (from 14): barrier_in

[proxy:0:0@lizhi] forwarding command (cmd=barrier_in) upstream
[mpiexec@lizhi] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec@lizhi] PMI response to fd 8 pid 14: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] got pmi command (from 12): get_ranks2hosts

[proxy:0:0@lizhi] PMI response: put_ranks2hosts 15 1
5 lizhi 0,1, 
[proxy:0:0@lizhi] got pmi command (from 14): get_ranks2hosts

[proxy:0:0@lizhi] PMI response: put_ranks2hosts 15 1
5 lizhi 0,1, 
[proxy:0:0@lizhi] got pmi command (from 12): get_appnum

[proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0
[proxy:0:0@lizhi] got pmi command (from 14): get_appnum

[proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0
[proxy:0:0@lizhi] got pmi command (from 12): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_1792_0
[proxy:0:0@lizhi] got pmi command (from 14): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_1792_0
[proxy:0:0@lizhi] got pmi command (from 12): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_1792_0
[proxy:0:0@lizhi] got pmi command (from 14): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_1792_0
[0] MPI startup(): Intel(R) MPI Library, Version 5.1.2  Build 20151015 (build id: 13147)
[0] MPI startup(): Copyright (C) 2003-2015 Intel Corporation.  All rights reserved.
[0] MPI startup(): Multi-threaded optimized library
[0] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map
[1] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map
[proxy:0:0@lizhi] got pmi command (from 12): put
kvsname=kvs_1792_0 key=sharedFilename[0] value=/dev/shm/Intel_MPI_5lgkMs 
[proxy:0:0@lizhi] forwarding command (cmd=put kvsname=kvs_1792_0 key=sharedFilename[0] value=/dev/shm/Intel_MPI_5lgkMs) upstream
[mpiexec@lizhi] [pgid: 0] got PMI command: cmd=put kvsname=kvs_1792_0 key=sharedFilename[0] value=/dev/shm/Intel_MPI_5lgkMs
[mpiexec@lizhi] PMI response to fd 8 pid 12: cmd=put_result rc=0 msg=success
[proxy:0:0@lizhi] got pmi command (from 14): barrier_in

[proxy:0:0@lizhi] we don't understand the response put_result; forwarding downstream
[proxy:0:0@lizhi] got pmi command (from 12): barrier_in

[proxy:0:0@lizhi] forwarding command (cmd=barrier_in) upstream
[mpiexec@lizhi] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec@lizhi] PMI response to fd 8 pid 12: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] got pmi command (from 14): get
kvsname=kvs_1792_0 key=sharedFilename[0] 
[proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_5lgkMs
[0] MPI startup(): shm data transfer mode
[1] MPI startup(): shm data transfer mode
[proxy:0:0@lizhi] got pmi command (from 12): put
kvsname=kvs_1792_0 key=P0-businesscard-0 value=fabrics_list#shm$ 
[proxy:0:0@lizhi] got pmi command (from 14): put
kvsname=kvs_1792_0 key=P1-businesscard-0 value=fabrics_list#shm$ 
[mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_1792_0 key=P0-businesscard-0 value=fabrics_list#shm$
[mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success

[mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_1792_0 key=P1-businesscard-0 value=fabrics_list#shm$
[mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success

[proxy:0:0@lizhi] got pmi command (from 12): barrier_in

[proxy:0:0@lizhi] got pmi command (from 14): barrier_in

[proxy:0:0@lizhi] forwarding command (cmd=barrier_in) upstream
[mpiexec@lizhi] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec@lizhi] PMI response to fd 8 pid 14: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[0] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8
[1] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8
[0] MPI startup(): Device_reset_idx=8
[0] MPI startup(): Allgather: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Allgatherv: 3: 0-259847 & 0-2147483647
[0] MPI startup(): Allgatherv: 4: 0-2147483647 & 0-2147483647
[0] MPI startup(): Allreduce: 1: 0-1536 & 0-2147483647
[0] MPI startup(): Allreduce: 7: 1536-2194 & 0-2147483647
[0] MPI startup(): Allreduce: 1: 2194-34792 & 0-2147483647
[0] MPI startup(): Allreduce: 4: 34792-121510 & 0-2147483647
[0] MPI startup(): Allreduce: 1: 121510-145618 & 0-2147483647
[0] MPI startup(): Allreduce: 2: 145618-668210 & 0-2147483647
[0] MPI startup(): Allreduce: 7: 668210-1546854 & 0-2147483647
[0] MPI startup(): Allreduce: 4: 1546854-2473237 & 0-2147483647
[0] MPI startup(): Allreduce: 2: 0-2147483647 & 0-2147483647
[0] MPI startup(): Alltoall: 3: 0-117964 & 0-2147483647
[0] MPI startup(): Alltoall: 4: 117965-3131275 & 0-2147483647
[0] MPI startup(): Alltoall: 3: 0-2147483647 & 0-2147483647
[0] MPI startup(): Alltoallv: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Alltoallw: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Barrier: 2: 0-2147483647 & 0-2147483647
[0] MPI startup(): Bcast: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Exscan: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Gather: 3: 1-921 & 0-2147483647
[0] MPI startup(): Gather: 1: 922-3027 & 0-2147483647
[0] MPI startup(): Gather: 3: 3028-5071 & 0-2147483647
[0] MPI startup(): Gather: 2: 5072-11117 & 0-2147483647
[0] MPI startup(): Gather: 1: 11118-86016 & 0-2147483647
[0] MPI startup(): Gather: 3: 86017-283989 & 0-2147483647
[0] MPI startup(): Gather: 1: 283990-664950 & 0-2147483647
[0] MPI startup(): Gather: 3: 0-2147483647 & 0-2147483647
[0] MPI startup(): Gatherv: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Reduce_scatter: 1: 0-6 & 0-2147483647
[0] MPI startup(): Reduce_scatter: 2: 0-2147483647 & 0-2147483647
[0] MPI startup(): Reduce: 1: 0-2147483647 & 0-2147483647
[0] MPI startup(): Scan: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Scatter: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Scatterv: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Rank    Pid      Node name  Pin cpu
[0] MPI startup(): 0       1797     lizhi      {0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23}
[0] MPI startup(): 1       1798     lizhi      {8,9,10,11,12,13,14,15,24,25,26,27,28,29,30,31}
[0] MPI startup(): Recognition=2 Platform(code=8 ippn=1 dev=1) Fabric(intra=1 inter=1 flags=0x0)
[0] MPI startup(): Topology split mode = 1

| rank | node | space=1
|  0  |  0  |
|  1  |  0  |
[1] MPI startup(): Recognition=2 Platform(code=8 ippn=1 dev=1) Fabric(intra=1 inter=1 flags=0x0)
[0] MPI startup(): I_MPI_DEBUG=100
[0] MPI startup(): I_MPI_INFO_BRAND=Intel(R) Xeon(R) 
[0] MPI startup(): I_MPI_INFO_CACHE1=0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23,0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23
[0] MPI startup(): I_MPI_INFO_CACHE2=0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23,0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23
[0] MPI startup(): I_MPI_INFO_CACHE3=0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1
[0] MPI startup(): I_MPI_INFO_CACHES=3
[0] MPI startup(): I_MPI_INFO_CACHE_SHARE=2,2,32
[0] MPI startup(): I_MPI_INFO_CACHE_SIZE=32768,262144,20971520
[0] MPI startup(): I_MPI_INFO_CORE=0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7
[0] MPI startup(): I_MPI_INFO_C_NAME=Unknown
[0] MPI startup(): I_MPI_INFO_DESC=1342177285
[0] MPI startup(): I_MPI_INFO_FLGB=0
[0] MPI startup(): I_MPI_INFO_FLGC=532603903
[0] MPI startup(): I_MPI_INFO_FLGCEXT=0
[0] MPI startup(): I_MPI_INFO_FLGD=-1075053569
[0] MPI startup(): I_MPI_INFO_FLGDEXT=0
[0] MPI startup(): I_MPI_INFO_LCPU=32
[0] MPI startup(): I_MPI_INFO_MODE=775
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_DIST=10,21,21,10
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_NUM=2
[0] MPI startup(): I_MPI_INFO_PACK=0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1
[0] MPI startup(): I_MPI_INFO_SERIAL=E5-2650 0 
[0] MPI startup(): I_MPI_INFO_SIGN=132823
[0] MPI startup(): I_MPI_INFO_STATE=0
[0] MPI startup(): I_MPI_INFO_THREAD=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
[0] MPI startup(): I_MPI_INFO_VEND=1
[0] MPI startup(): I_MPI_PIN_INFO=x0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23
[0] MPI startup(): I_MPI_PIN_MAPPING=2:0 0,1 8
#------------------------------------------------------------
#    Intel (R) MPI Benchmarks 4.1 Update 1, MPI-1 part    
#------------------------------------------------------------
# Date                  : Fri Jan 22 15:37:26 2016
# Machine               : x86_64
# System                : Linux
# Release               : 3.13.0-35-generic
# Version               : #62~precise1-Ubuntu SMP Mon Aug 18 14:52:04 UTC 2014
# MPI Version           : 3.0
# MPI Thread Environment: 

# New default behavior from Version 3.2 on:

# the number of iterations per message size is cut down 
# dynamically when a certain run time (per message size sample) 
# is expected to be exceeded. Time limit is defined by variable 
# "SECS_PER_SAMPLE" (=> IMB_settings.h) 
# or through the flag => -time 
  


# Calling sequence was: 

# IMB-MPI1 pingpong

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE 
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM  
#
#

# List of Benchmarks to run:

# PingPong

#---------------------------------------------------
# Benchmarking PingPong 
# #processes = 2 
#---------------------------------------------------
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         1.07         0.00
            1         1000         0.94         1.02
            2         1000         0.94         2.04
            4         1000         0.94         4.07
            8         1000         0.95         8.01
           16         1000         0.95        16.11
           32         1000         0.95        32.28
           64         1000         1.06        57.61
          128         1000         1.03       118.92
          256         1000         1.05       232.17
          512         1000         1.30       376.61
         1024         1000         1.47       666.56
         2048         1000         1.81      1080.24
         4096         1000         2.86      1365.11
         8192         1000         3.74      2091.13
        16384         1000         7.76      2012.25
        32768         1000        14.16      2206.84
        65536          640        14.15      4415.99
       131072          320        18.98      6585.50
       262144          160        36.57      6836.40
       524288           80        66.71      7494.85
      1048576           40       107.70      9285.08
      2097152           20       214.70      9315.24
      4194304           10       450.55      8878.01


# All processes entering MPI_Finalize

[proxy:0:0@lizhi] got pmi command (from 12): finalize

[proxy:0:0@lizhi] PMI response: cmd=finalize_ack
[proxy:0:0@lizhi] got pmi command (from 14): finalize

[proxy:0:0@lizhi] PMI response: cmd=finalize_ack

Here is the result of mpirun -n 2 IMB-MPI1 pingpong (with the correct environment variables, and running as a regular user)

host: lizhi

==================================================================================================
mpiexec options:
----------------
  Base path: /opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/
  Launcher: ssh
  Debug level: 1
  Enable X: -1

  Global environment:
  -------------------
    I_MPI_PERHOST=allcores
    LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib:/usr/lib/jdk1.7.0/jre/lib/amd64/server/:/usr/local/cuda-7.0/lib64:/usr/local/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/compiler/lib/intel64_lin:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib
    LESSOPEN=| /usr/bin/lesspipe %s
    MAIL=/var/mail/jeremie
    SSH_CLIENT=172.16.0.157 54865 22
    USER=jeremie
    LANGUAGE=en_US:en
    LC_TIME=en_US.UTF-8
    SHLVL=1
    OLDPWD=/home/jeremie/pgashpc/Code/NPB_UPC_C2_101_IntelBased/CG
    HOME=/home/jeremie
    XDG_SESSION_COOKIE=e6534fbbf1939771344e386e0000016f-1453462579.220452-307383088
    SSH_TTY=/dev/pts/3
    LC_MONETARY=en_US.UTF-8
    LOGNAME=jeremie
    _=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/mpirun
    TERM=xterm
    PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/cuda-7.0/bin:/home/jeremie/pgashpc/Compilers/UPCCompilerLizhi/bin:/home/jeremie/pgashpc/EnergyManagement/IntelPCM
    LC_ADDRESS=en_US.UTF-8
    LC_TELEPHONE=en_US.UTF-8
    LANG=en_US.UTF-8
    LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36:
    SHELL=/bin/bash
    LC_NAME=en_US.UTF-8
    LESSCLOSE=/usr/bin/lesspipe %s %s
    LC_MEASUREMENT=en_US.UTF-8
    I_MPI_MPIRUN=mpirun
    LC_IDENTIFICATION=en_US.UTF-8
    I_MPI_DEBUG=100
    LC_ALL=en_US.UTF-8
    PWD=/home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi/bin
    I_MPI_HYDRA_DEBUG=1
    SSH_CONNECTION=172.16.0.157 54865 192.168.202.79 22
    LC_NUMERIC=en_US.UTF-8
    I_MPI_ROOT=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi
    LC_PAPER=en_US.UTF-8
    MANPATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/man:/usr/local/man:/usr/local/share/man:/usr/share/man:/home/jeremie/pgashpc/Compilers/UPCCompilerLizhi/man

  Hydra internal environment:
  ---------------------------
    MPIR_CVAR_NEMESIS_ENABLE_CKPOINT=1
    GFORTRAN_UNBUFFERED_PRECONNECTED=y
    I_MPI_HYDRA_UUID=2e070000-a28d-ef31-ed29-050001017f00
    DAPL_NETWORK_PROCESS_NUM=2


    Proxy information:
    *********************
      [1] proxy: lizhi (16 cores)
      Exec list: IMB-MPI1 (2 processes); 


==================================================================================================

[mpiexec@lizhi] Timeout set to -1 (-1 means infinite)
[mpiexec@lizhi] Got a control port string of lizhi:56831

Proxy launch args: /opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/pmi_proxy --control-port lizhi:56831 --debug --pmi-connect lazy-cache --pmi-aggregate -s 0 --rmk user --launcher ssh --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 1724218219 --usize -2 --proxy-id 

Arguments being passed to proxy 0:
--version 3.1.2 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname lizhi --global-core-map 0,16,16 --pmi-id-map 0,0 --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_1838_0 --pmi-process-mapping (vector,(0,1,16)) --topolib ipl --ckpointlib blcr --ckpoint-prefix /tmp --ckpoint-preserve 1 --ckpoint off --ckpoint-num -1 --global-inherited-env 37 'I_MPI_PERHOST=allcores' 'LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib:/usr/lib/jdk1.7.0/jre/lib/amd64/server/:/usr/local/cuda-7.0/lib64:/usr/local/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/compiler/lib/intel64_lin:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib' 'LESSOPEN=| /usr/bin/lesspipe %s' 'MAIL=/var/mail/jeremie' 'SSH_CLIENT=172.16.0.157 54865 22' 'USER=jeremie' 'LANGUAGE=en_US:en' 'LC_TIME=en_US.UTF-8' 'SHLVL=1' 'OLDPWD=/home/jeremie/pgashpc/Code/NPB_UPC_C2_101_IntelBased/CG' 'HOME=/home/jeremie' 'XDG_SESSION_COOKIE=e6534fbbf1939771344e386e0000016f-1453462579.220452-307383088' 'SSH_TTY=/dev/pts/3' 'LC_MONETARY=en_US.UTF-8' 'LOGNAME=jeremie' '_=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/mpirun' 'TERM=xterm' 'PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/cuda-7.0/bin:/home/jeremie/pgashpc/Compilers/UPCCompilerLizhi/bin:/home/jeremie/pgashpc/EnergyManagement/IntelPCM' 'LC_ADDRESS=en_US.UTF-8' 'LC_TELEPHONE=en_US.UTF-8' 'LANG=en_US.UTF-8' 'LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36:' 'SHELL=/bin/bash' 'LC_NAME=en_US.UTF-8' 'LESSCLOSE=/usr/bin/lesspipe %s %s' 'LC_MEASUREMENT=en_US.UTF-8' 'I_MPI_MPIRUN=mpirun' 'LC_IDENTIFICATION=en_US.UTF-8' 'I_MPI_DEBUG=100' 'LC_ALL=en_US.UTF-8' 'PWD=/home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi/bin' 'I_MPI_HYDRA_DEBUG=1' 'SSH_CONNECTION=172.16.0.157 54865 192.168.202.79 22' 'LC_NUMERIC=en_US.UTF-8' 'I_MPI_ROOT=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi' 'LC_PAPER=en_US.UTF-8' 'MANPATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/man:/usr/local/man:/usr/local/share/man:/usr/share/man:/home/jeremie/pgashpc/Compilers/UPCCompilerLizhi/man' --global-user-env 0 --global-system-env 4 'MPIR_CVAR_NEMESIS_ENABLE_CKPOINT=1' 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' 'I_MPI_HYDRA_UUID=2e070000-a28d-ef31-ed29-050001017f00' 'DAPL_NETWORK_PROCESS_NUM=2' --proxy-core-count 16 --mpi-cmd-env mpirun -n 2 IMB-MPI1 pingpong  --exec --exec-appnum 0 --exec-proc-count 2 --exec-local-env 0 --exec-wdir /home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi/bin --exec-args 2 IMB-MPI1 pingpong 

[mpiexec@lizhi] Launch arguments: /opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/pmi_proxy --control-port lizhi:56831 --debug --pmi-connect lazy-cache --pmi-aggregate -s 0 --rmk user --launcher ssh --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 1724218219 --usize -2 --proxy-id 0 
[mpiexec@lizhi] STDIN will be redirected to 1 fd(s): 11 
[proxy:0:0@lizhi] Start PMI_proxy 0
[proxy:0:0@lizhi] STDIN will be redirected to 1 fd(s): 17 
[proxy:0:0@lizhi] got pmi command (from 12): init
pmi_version=1 pmi_subversion=1 
[proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@lizhi] got pmi command (from 14): init
pmi_version=1 pmi_subversion=1 
[proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@lizhi] got pmi command (from 12): get_maxes

[proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0@lizhi] got pmi command (from 14): get_maxes

[proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0@lizhi] got pmi command (from 12): barrier_in

[proxy:0:0@lizhi] got pmi command (from 14): barrier_in

[proxy:0:0@lizhi] forwarding command (cmd=barrier_in) upstream
[mpiexec@lizhi] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec@lizhi] PMI response to fd 8 pid 14: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] got pmi command (from 12): get_ranks2hosts

[proxy:0:0@lizhi] PMI response: put_ranks2hosts 15 1
5 lizhi 0,1, 
[proxy:0:0@lizhi] got pmi command (from 14): get_ranks2hosts

[proxy:0:0@lizhi] PMI response: put_ranks2hosts 15 1
5 lizhi 0,1, 
[proxy:0:0@lizhi] got pmi command (from 12): get_appnum

[proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0
[proxy:0:0@lizhi] got pmi command (from 14): get_appnum

[proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0
[proxy:0:0@lizhi] got pmi command (from 12): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_1838_0
[proxy:0:0@lizhi] got pmi command (from 14): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_1838_0
[proxy:0:0@lizhi] got pmi command (from 12): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_1838_0
[proxy:0:0@lizhi] got pmi command (from 14): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_1838_0
[0] MPI startup(): Intel(R) MPI Library, Version 5.1.2  Build 20151015 (build id: 13147)
[0] MPI startup(): Copyright (C) 2003-2015 Intel Corporation.  All rights reserved.
[0] MPI startup(): Multi-threaded optimized library
[0] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map
[1] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map
[proxy:0:0@lizhi] got pmi command (from 14): barrier_in

[proxy:0:0@lizhi] got pmi command (from 12): put
kvsname=kvs_1838_0 key=sharedFilename[0] value=/dev/shm/Intel_MPI_o9ODg3 
[proxy:0:0@lizhi] forwarding command (cmd=put kvsname=kvs_1838_0 key=sharedFilename[0] value=/dev/shm/Intel_MPI_o9ODg3) upstream
[mpiexec@lizhi] [pgid: 0] got PMI command: cmd=put kvsname=kvs_1838_0 key=sharedFilename[0] value=/dev/shm/Intel_MPI_o9ODg3
[mpiexec@lizhi] PMI response to fd 8 pid 12: cmd=put_result rc=0 msg=success
[proxy:0:0@lizhi] we don't understand the response put_result; forwarding downstream
[proxy:0:0@lizhi] got pmi command (from 12): barrier_in

[proxy:0:0@lizhi] forwarding command (cmd=barrier_in) upstream
[mpiexec@lizhi] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec@lizhi] PMI response to fd 8 pid 12: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] got pmi command (from 14): get
kvsname=kvs_1838_0 key=sharedFilename[0] 
[proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_o9ODg3
[0] MPI startup(): shm data transfer mode
[1] MPI startup(): shm data transfer mode
[proxy:0:0@lizhi] got pmi command (from 12): put
kvsname=kvs_1838_0 key=P0-businesscard-0 value=fabrics_list#shm$ 
[proxy:0:0@lizhi] got pmi command (from 14): put
kvsname=kvs_1838_0 key=P1-businesscard-0 value=fabrics_list#shm$ 
[mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_1838_0 key=P0-businesscard-0 value=fabrics_list#shm$
[mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success

[mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_1838_0 key=P1-businesscard-0 value=fabrics_list#shm$
[mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success

[proxy:0:0@lizhi] got pmi command (from 14): barrier_in

[proxy:0:0@lizhi] got pmi command (from 12): barrier_in

[proxy:0:0@lizhi] forwarding command (cmd=barrier_in) upstream
[mpiexec@lizhi] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec@lizhi] PMI response to fd 8 pid 12: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[0] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8
[1] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8
[0] MPI startup(): Device_reset_idx=8
[0] MPI startup(): Allgather: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Allgatherv: 3: 0-259847 & 0-2147483647
[0] MPI startup(): Allgatherv: 4: 0-2147483647 & 0-2147483647
[0] MPI startup(): Allreduce: 1: 0-1536 & 0-2147483647
[0] MPI startup(): Allreduce: 7: 1536-2194 & 0-2147483647
[0] MPI startup(): Allreduce: 1: 2194-34792 & 0-2147483647
[0] MPI startup(): Allreduce: 4: 34792-121510 & 0-2147483647
[0] MPI startup(): Allreduce: 1: 121510-145618 & 0-2147483647
[0] MPI startup(): Allreduce: 2: 145618-668210 & 0-2147483647
[0] MPI startup(): Allreduce: 7: 668210-1546854 & 0-2147483647
[0] MPI startup(): Allreduce: 4: 1546854-2473237 & 0-2147483647
[0] MPI startup(): Allreduce: 2: 0-2147483647 & 0-2147483647
[0] MPI startup(): Alltoall: 3: 0-117964 & 0-2147483647
[0] MPI startup(): Alltoall: 4: 117965-3131275 & 0-2147483647
[0] MPI startup(): Alltoall: 3: 0-2147483647 & 0-2147483647
[0] MPI startup(): Alltoallv: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Alltoallw: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Barrier: 2: 0-2147483647 & 0-2147483647
[0] MPI startup(): Bcast: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Exscan: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Gather: 3: 1-921 & 0-2147483647
[0] MPI startup(): Gather: 1: 922-3027 & 0-2147483647
[0] MPI startup(): Gather: 3: 3028-5071 & 0-2147483647
[0] MPI startup(): Gather: 2: 5072-11117 & 0-2147483647
[0] MPI startup(): Gather: 1: 11118-86016 & 0-2147483647
[0] MPI startup(): Gather: 3: 86017-283989 & 0-2147483647
[0] MPI startup(): Gather: 1: 283990-664950 & 0-2147483647
[0] MPI startup(): Gather: 3: 0-2147483647 & 0-2147483647
[0] MPI startup(): Gatherv: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Reduce_scatter: 1: 0-6 & 0-2147483647
[0] MPI startup(): Reduce_scatter: 2: 0-2147483647 & 0-2147483647
[0] MPI startup(): Reduce: 1: 0-2147483647 & 0-2147483647
[0] MPI startup(): Scan: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Scatter: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Scatterv: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Rank    Pid      Node name  Pin cpu
[0] MPI startup(): 0       1843     lizhi      {0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23}
[0] MPI startup(): 1       1844     lizhi      {8,9,10,11,12,13,14,15,24,25,26,27,28,29,30,31}
[0] MPI startup(): Recognition=2 Platform(code=8 ippn=1 dev=1) Fabric(intra=1 inter=1 flags=0x0)
[0] MPI startup(): Topology split mode = 1

| rank | node | space=1
|  0  |  0  |
|  1  |  0  |
[1] MPI startup(): Recognition=2 Platform(code=8 ippn=1 dev=1) Fabric(intra=1 inter=1 flags=0x0)
[0] MPI startup(): I_MPI_DEBUG=100
[0] MPI startup(): I_MPI_INFO_BRAND=Intel(R) Xeon(R) 
[0] MPI startup(): I_MPI_INFO_CACHE1=0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23,0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23
[0] MPI startup(): I_MPI_INFO_CACHE2=0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23,0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23
[0] MPI startup(): I_MPI_INFO_CACHE3=0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1
[0] MPI startup(): I_MPI_INFO_CACHES=3
[0] MPI startup(): I_MPI_INFO_CACHE_SHARE=2,2,32
[0] MPI startup(): I_MPI_INFO_CACHE_SIZE=32768,262144,20971520
[0] MPI startup(): I_MPI_INFO_CORE=0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7
[0] MPI startup(): I_MPI_INFO_C_NAME=Unknown
[0] MPI startup(): I_MPI_INFO_DESC=1342177285
[0] MPI startup(): I_MPI_INFO_FLGB=0
[0] MPI startup(): I_MPI_INFO_FLGC=532603903
[0] MPI startup(): I_MPI_INFO_FLGCEXT=0
[0] MPI startup(): I_MPI_INFO_FLGD=-1075053569
[0] MPI startup(): I_MPI_INFO_FLGDEXT=0
[0] MPI startup(): I_MPI_INFO_LCPU=32
[0] MPI startup(): I_MPI_INFO_MODE=775
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_DIST=10,21,21,10
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_NUM=2
[0] MPI startup(): I_MPI_INFO_PACK=0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1
[0] MPI startup(): I_MPI_INFO_SERIAL=E5-2650 0 
[0] MPI startup(): I_MPI_INFO_SIGN=132823
[0] MPI startup(): I_MPI_INFO_STATE=0
[0] MPI startup(): I_MPI_INFO_THREAD=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
[0] MPI startup(): I_MPI_INFO_VEND=1
[0] MPI startup(): I_MPI_PIN_INFO=x0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23
[0] MPI startup(): I_MPI_PIN_MAPPING=2:0 0,1 8
#------------------------------------------------------------
#    Intel (R) MPI Benchmarks 4.1 Update 1, MPI-1 part    
#------------------------------------------------------------
# Date                  : Fri Jan 22 15:40:05 2016
# Machine               : x86_64
# System                : Linux
# Release               : 3.13.0-35-generic
# Version               : #62~precise1-Ubuntu SMP Mon Aug 18 14:52:04 UTC 2014
# MPI Version           : 3.0
# MPI Thread Environment: 

# New default behavior from Version 3.2 on:

# the number of iterations per message size is cut down 
# dynamically when a certain run time (per message size sample) 
# is expected to be exceeded. Time limit is defined by variable 
# "SECS_PER_SAMPLE" (=> IMB_settings.h) 
# or through the flag => -time 
  


# Calling sequence was: 

# IMB-MPI1 pingpong

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE 
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM  
#
#

# List of Benchmarks to run:

# PingPong

#---------------------------------------------------
# Benchmarking PingPong 
# #processes = 2 
#---------------------------------------------------
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         1.14         0.00
            1         1000         1.02         0.93
            2         1000         1.02         1.87
            4         1000         1.02         3.74
            8         1000         0.91         8.41
           16         1000         0.91        16.74
           32         1000         0.91        33.57
           64         1000         0.98        62.48
          128         1000         1.02       120.03
          256         1000         1.00       244.27
          512         1000         1.24       394.42
         1024         1000         1.43       683.64
         2048         1000         1.80      1088.06
         4096         1000         2.54      1539.13
         8192         1000         4.19      1864.10
        16384         1000         7.51      2081.66
        32768         1000        14.34      2178.76
[mpiexec@lizhi] Sending Ctrl-C to processes as requested
[mpiexec@lizhi] Press Ctrl-C again to force abort
 
0 Kudos
Highlighted
127 Views

About SELinux I do not think that it is running on the system I am using (Ubuntu 12.04)

The directory /selinux exists, however it is completely empty.

0 Kudos
Highlighted
Employee
127 Views

Hi Jeremie,

Do I correctly understand that IMB pingpong hang for 2nd scenario (under the user)? If yes could you please try the same scenario with I_MPI_SHM_LMT=shm variable?

0 Kudos
Highlighted
127 Views

Thanks again for your help:

Here is the result of mpirun -n 2 IMB-MPI1 pingpong

using these variables: I_MPI_SHM_LMT=shm I_MPI_DEBUG=100 I_MPI_HYDRA_DEBUG=1

run by regular user (not root)

 

 

host: lizhi

==================================================================================================
mpiexec options:
----------------
  Base path: /opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/
  Launcher: ssh
  Debug level: 1
  Enable X: -1

  Global environment:
  -------------------
    I_MPI_PERHOST=allcores
    LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib:/usr/lib/jdk1.7.0/jre/lib/amd64/server/:/usr/local/cuda-7.0/lib64:/usr/local/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib
    LESSOPEN=| /usr/bin/lesspipe %s
    MAIL=/var/mail/jeremie
    SSH_CLIENT=172.16.0.157 40740 22
    I_MPI_SHM_LMT=shm
    USER=jeremie
    LANGUAGE=en_US:en
    LC_TIME=en_US.UTF-8
    SHLVL=1
    OLDPWD=/home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi
    HOME=/home/jeremie
    XDG_SESSION_COOKIE=e6534fbbf1939771344e386e0000016f-1453803019.76569-485305227
    SSH_TTY=/dev/pts/1
    LC_MONETARY=en_US.UTF-8
    LOGNAME=jeremie
    _=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/mpirun
    TERM=xterm
    PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/cuda-7.0/bin:/home/jeremie/pgashpc/Compilers/UPCCompilerLizhi/bin:/home/jeremie/pgashpc/EnergyManagement/IntelPCM
    LC_ADDRESS=en_US.UTF-8
    LC_TELEPHONE=en_US.UTF-8
    LANG=en_US.UTF-8
    LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36:
    SHELL=/bin/bash
    LC_NAME=en_US.UTF-8
    LESSCLOSE=/usr/bin/lesspipe %s %s
    LC_MEASUREMENT=en_US.UTF-8
    I_MPI_MPIRUN=mpirun
    LC_IDENTIFICATION=en_US.UTF-8
    I_MPI_DEBUG=100
    LC_ALL=en_US.UTF-8
    PWD=/home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi/bin
    I_MPI_HYDRA_DEBUG=1
    SSH_CONNECTION=172.16.0.157 40740 192.168.202.79 22
    LC_NUMERIC=en_US.UTF-8
    I_MPI_ROOT=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi
    LC_PAPER=en_US.UTF-8
    MANPATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/man:/usr/local/man:/usr/local/share/man:/usr/share/man:/home/jeremie/pgashpc/Compilers/UPCCompilerLizhi/man

  Hydra internal environment:
  ---------------------------
    MPIR_CVAR_NEMESIS_ENABLE_CKPOINT=1
    GFORTRAN_UNBUFFERED_PRECONNECTED=y
    I_MPI_HYDRA_UUID=7b3a0000-b96e-5e0a-3d2a-050001017f00
    DAPL_NETWORK_PROCESS_NUM=2


    Proxy information:
    *********************
      [1] proxy: lizhi (16 cores)
      Exec list: IMB-MPI1 (2 processes); 


==================================================================================================

[mpiexec@lizhi] Timeout set to -1 (-1 means infinite)
[mpiexec@lizhi] Got a control port string of lizhi:52626

Proxy launch args: /opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/pmi_proxy --control-port lizhi:52626 --debug --pmi-connect lazy-cache --pmi-aggregate -s 0 --rmk user --launcher ssh --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 76075566 --usize -2 --proxy-id 

Arguments being passed to proxy 0:
--version 3.1.2 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname lizhi --global-core-map 0,16,16 --pmi-id-map 0,0 --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_14971_0 --pmi-process-mapping (vector,(0,1,16)) --topolib ipl --ckpointlib blcr --ckpoint-prefix /tmp --ckpoint-preserve 1 --ckpoint off --ckpoint-num -1 --global-inherited-env 38 'I_MPI_PERHOST=allcores' 'LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib:/usr/lib/jdk1.7.0/jre/lib/amd64/server/:/usr/local/cuda-7.0/lib64:/usr/local/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib' 'LESSOPEN=| /usr/bin/lesspipe %s' 'MAIL=/var/mail/jeremie' 'SSH_CLIENT=172.16.0.157 40740 22' 'I_MPI_SHM_LMT=shm' 'USER=jeremie' 'LANGUAGE=en_US:en' 'LC_TIME=en_US.UTF-8' 'SHLVL=1' 'OLDPWD=/home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi' 'HOME=/home/jeremie' 'XDG_SESSION_COOKIE=e6534fbbf1939771344e386e0000016f-1453803019.76569-485305227' 'SSH_TTY=/dev/pts/1' 'LC_MONETARY=en_US.UTF-8' 'LOGNAME=jeremie' '_=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/mpirun' 'TERM=xterm' 'PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/cuda-7.0/bin:/home/jeremie/pgashpc/Compilers/UPCCompilerLizhi/bin:/home/jeremie/pgashpc/EnergyManagement/IntelPCM' 'LC_ADDRESS=en_US.UTF-8' 'LC_TELEPHONE=en_US.UTF-8' 'LANG=en_US.UTF-8' 'LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36:' 'SHELL=/bin/bash' 'LC_NAME=en_US.UTF-8' 'LESSCLOSE=/usr/bin/lesspipe %s %s' 'LC_MEASUREMENT=en_US.UTF-8' 'I_MPI_MPIRUN=mpirun' 'LC_IDENTIFICATION=en_US.UTF-8' 'I_MPI_DEBUG=100' 'LC_ALL=en_US.UTF-8' 'PWD=/home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi/bin' 'I_MPI_HYDRA_DEBUG=1' 'SSH_CONNECTION=172.16.0.157 40740 192.168.202.79 22' 'LC_NUMERIC=en_US.UTF-8' 'I_MPI_ROOT=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi' 'LC_PAPER=en_US.UTF-8' 'MANPATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/man:/usr/local/man:/usr/local/share/man:/usr/share/man:/home/jeremie/pgashpc/Compilers/UPCCompilerLizhi/man' --global-user-env 0 --global-system-env 4 'MPIR_CVAR_NEMESIS_ENABLE_CKPOINT=1' 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' 'I_MPI_HYDRA_UUID=7b3a0000-b96e-5e0a-3d2a-050001017f00' 'DAPL_NETWORK_PROCESS_NUM=2' --proxy-core-count 16 --mpi-cmd-env mpirun -n 2 IMB-MPI1 pingpong  --exec --exec-appnum 0 --exec-proc-count 2 --exec-local-env 0 --exec-wdir /home/jeremie/pgashpc/Code/NPB_Original/c2_Intel-mpi/bin --exec-args 2 IMB-MPI1 pingpong 

[mpiexec@lizhi] Launch arguments: /opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/pmi_proxy --control-port lizhi:52626 --debug --pmi-connect lazy-cache --pmi-aggregate -s 0 --rmk user --launcher ssh --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 76075566 --usize -2 --proxy-id 0 
[mpiexec@lizhi] STDIN will be redirected to 1 fd(s): 11 
[proxy:0:0@lizhi] Start PMI_proxy 0
[proxy:0:0@lizhi] STDIN will be redirected to 1 fd(s): 17 
[proxy:0:0@lizhi] got pmi command (from 12): init
pmi_version=1 pmi_subversion=1 
[proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@lizhi] got pmi command (from 14): init
pmi_version=1 pmi_subversion=1 
[proxy:0:0@lizhi] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@lizhi] got pmi command (from 12): get_maxes

[proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0@lizhi] got pmi command (from 14): get_maxes

[proxy:0:0@lizhi] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0@lizhi] got pmi command (from 12): barrier_in

[proxy:0:0@lizhi] got pmi command (from 14): barrier_in

[proxy:0:0@lizhi] forwarding command (cmd=barrier_in) upstream
[mpiexec@lizhi] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec@lizhi] PMI response to fd 8 pid 14: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] got pmi command (from 12): get_ranks2hosts

[proxy:0:0@lizhi] PMI response: put_ranks2hosts 15 1
5 lizhi 0,1, 
[proxy:0:0@lizhi] got pmi command (from 14): get_ranks2hosts

[proxy:0:0@lizhi] PMI response: put_ranks2hosts 15 1
5 lizhi 0,1, 
[proxy:0:0@lizhi] got pmi command (from 12): get_appnum

[proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0
[proxy:0:0@lizhi] got pmi command (from 14): get_appnum

[proxy:0:0@lizhi] PMI response: cmd=appnum appnum=0
[proxy:0:0@lizhi] got pmi command (from 12): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_14971_0
[proxy:0:0@lizhi] got pmi command (from 14): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_14971_0
[proxy:0:0@lizhi] got pmi command (from 12): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_14971_0
[proxy:0:0@lizhi] got pmi command (from 14): get_my_kvsname

[proxy:0:0@lizhi] PMI response: cmd=my_kvsname kvsname=kvs_14971_0
[0] MPI startup(): Intel(R) MPI Library, Version 5.1.2  Build 20151015 (build id: 13147)
[0] MPI startup(): Copyright (C) 2003-2015 Intel Corporation.  All rights reserved.
[0] MPI startup(): Multi-threaded optimized library
[0] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map
[1] MPID_nem_impi_create_numa_nodes_map(): Fetching extra numa information from /etc/ofed-mic.map
[proxy:0:0@lizhi] got pmi command (from 12): put
kvsname=kvs_14971_0 key=sharedFilename[0] value=/dev/shm/Intel_MPI_M2v37K 
[proxy:0:0@lizhi] forwarding command (cmd=put kvsname=kvs_14971_0 key=sharedFilename[0] value=/dev/shm/Intel_MPI_M2v37K) upstream
[mpiexec@lizhi] [pgid: 0] got PMI command: cmd=put kvsname=kvs_14971_0 key=sharedFilename[0] value=/dev/shm/Intel_MPI_M2v37K
[mpiexec@lizhi] PMI response to fd 8 pid 12: cmd=put_result rc=0 msg=success
[proxy:0:0@lizhi] got pmi command (from 14): barrier_in

[proxy:0:0@lizhi] we don't understand the response put_result; forwarding downstream
[proxy:0:0@lizhi] got pmi command (from 12): barrier_in

[proxy:0:0@lizhi] forwarding command (cmd=barrier_in) upstream
[mpiexec@lizhi] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec@lizhi] PMI response to fd 8 pid 12: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] got pmi command (from 14): get
kvsname=kvs_14971_0 key=sharedFilename[0] 
[proxy:0:0@lizhi] PMI response: cmd=get_result rc=0 msg=success value=/dev/shm/Intel_MPI_M2v37K
[proxy:0:0@lizhi] got pmi command (from 12): put
kvsname=kvs_14971_0 key=P0-businesscard-0 value=fabrics_list#shm$ 
[proxy:0:0@lizhi] got pmi command (from 14): put
kvsname=kvs_14971_0 key=P1-businesscard-0 value=fabrics_list#shm$ 
[mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_14971_0 key=P0-businesscard-0 value=fabrics_list#shm$
[mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success

[mpiexec@lizhi] [pgid: 0] got aggregated PMI command (part of it): cmd=put kvsname=kvs_14971_0 key=P1-businesscard-0 value=fabrics_list#shm$
[mpiexec@lizhi] reply: cmd=put_result rc=0 msg=success

[0] MPI startup(): shm data transfer mode
[1] MPI startup(): shm data transfer mode
[proxy:0:0@lizhi] got pmi command (from 12): barrier_in

[proxy:0:0@lizhi] got pmi command (from 14): barrier_in

[proxy:0:0@lizhi] forwarding command (cmd=barrier_in) upstream
[mpiexec@lizhi] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec@lizhi] PMI response to fd 8 pid 14: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[proxy:0:0@lizhi] PMI response: cmd=barrier_out
[0] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8
[1] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8
[0] MPI startup(): Device_reset_idx=8
[0] MPI startup(): Allgather: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Allgatherv: 3: 0-259847 & 0-2147483647
[0] MPI startup(): Allgatherv: 4: 0-2147483647 & 0-2147483647
[0] MPI startup(): Allreduce: 1: 0-1536 & 0-2147483647
[0] MPI startup(): Allreduce: 7: 1536-2194 & 0-2147483647
[0] MPI startup(): Allreduce: 1: 2194-34792 & 0-2147483647
[0] MPI startup(): Allreduce: 4: 34792-121510 & 0-2147483647
[0] MPI startup(): Allreduce: 1: 121510-145618 & 0-2147483647
[0] MPI startup(): Allreduce: 2: 145618-668210 & 0-2147483647
[0] MPI startup(): Allreduce: 7: 668210-1546854 & 0-2147483647
[0] MPI startup(): Allreduce: 4: 1546854-2473237 & 0-2147483647
[0] MPI startup(): Allreduce: 2: 0-2147483647 & 0-2147483647
[0] MPI startup(): Alltoall: 3: 0-117964 & 0-2147483647
[0] MPI startup(): Alltoall: 4: 117965-3131275 & 0-2147483647
[0] MPI startup(): Alltoall: 3: 0-2147483647 & 0-2147483647
[0] MPI startup(): Alltoallv: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Alltoallw: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Barrier: 2: 0-2147483647 & 0-2147483647
[0] MPI startup(): Bcast: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Exscan: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Gather: 3: 1-921 & 0-2147483647
[0] MPI startup(): Gather: 1: 922-3027 & 0-2147483647
[0] MPI startup(): Gather: 3: 3028-5071 & 0-2147483647
[0] MPI startup(): Gather: 2: 5072-11117 & 0-2147483647
[0] MPI startup(): Gather: 1: 11118-86016 & 0-2147483647
[0] MPI startup(): Gather: 3: 86017-283989 & 0-2147483647
[0] MPI startup(): Gather: 1: 283990-664950 & 0-2147483647
[0] MPI startup(): Gather: 3: 0-2147483647 & 0-2147483647
[0] MPI startup(): Gatherv: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Reduce_scatter: 1: 0-6 & 0-2147483647
[0] MPI startup(): Reduce_scatter: 2: 0-2147483647 & 0-2147483647
[0] MPI startup(): Reduce: 1: 0-2147483647 & 0-2147483647
[0] MPI startup(): Scan: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Scatter: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Scatterv: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Rank    Pid      Node name  Pin cpu
[0] MPI startup(): 0       14976    lizhi      {0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23}
[0] MPI startup(): 1       14977    lizhi      {8,9,10,11,12,13,14,15,24,25,26,27,28,29,30,31}
[0] MPI startup(): Recognition=2 Platform(code=8 ippn=1 dev=1) Fabric(intra=1 inter=1 flags=0x0)
[0] MPI startup(): Topology split mode = 1

| rank | node | space=1
|  0  |  0  |
|  1  |  0  |
[1] MPI startup(): Recognition=2 Platform(code=8 ippn=1 dev=1) Fabric(intra=1 inter=1 flags=0x0)
[0] MPI startup(): I_MPI_DEBUG=100
[0] MPI startup(): I_MPI_INFO_BRAND=Intel(R) Xeon(R) 
[0] MPI startup(): I_MPI_INFO_CACHE1=0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23,0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23
[0] MPI startup(): I_MPI_INFO_CACHE2=0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23,0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23
[0] MPI startup(): I_MPI_INFO_CACHE3=0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1
[0] MPI startup(): I_MPI_INFO_CACHES=3
[0] MPI startup(): I_MPI_INFO_CACHE_SHARE=2,2,32
[0] MPI startup(): I_MPI_INFO_CACHE_SIZE=32768,262144,20971520
[0] MPI startup(): I_MPI_INFO_CORE=0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7
[0] MPI startup(): I_MPI_INFO_C_NAME=Unknown
[0] MPI startup(): I_MPI_INFO_DESC=1342177285
[0] MPI startup(): I_MPI_INFO_FLGB=0
[0] MPI startup(): I_MPI_INFO_FLGC=532603903
[0] MPI startup(): I_MPI_INFO_FLGCEXT=0
[0] MPI startup(): I_MPI_INFO_FLGD=-1075053569
[0] MPI startup(): I_MPI_INFO_FLGDEXT=0
[0] MPI startup(): I_MPI_INFO_LCPU=32
[0] MPI startup(): I_MPI_INFO_MODE=775
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_DIST=10,21,21,10
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_NUM=2
[0] MPI startup(): I_MPI_INFO_PACK=0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1
[0] MPI startup(): I_MPI_INFO_SERIAL=E5-2650 0 
[0] MPI startup(): I_MPI_INFO_SIGN=132823
[0] MPI startup(): I_MPI_INFO_STATE=0
[0] MPI startup(): I_MPI_INFO_THREAD=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
[0] MPI startup(): I_MPI_INFO_VEND=1
[0] MPI startup(): I_MPI_PIN_INFO=x0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23
[0] MPI startup(): I_MPI_PIN_MAPPING=2:0 0,1 8
[0] MPI startup(): I_MPI_SHM_LMT=shm
#------------------------------------------------------------
#    Intel (R) MPI Benchmarks 4.1 Update 1, MPI-1 part    
#------------------------------------------------------------
# Date                  : Tue Jan 26 14:55:38 2016
# Machine               : x86_64
# System                : Linux
# Release               : 3.13.0-35-generic
# Version               : #62~precise1-Ubuntu SMP Mon Aug 18 14:52:04 UTC 2014
# MPI Version           : 3.0
# MPI Thread Environment: 

# New default behavior from Version 3.2 on:

# the number of iterations per message size is cut down 
# dynamically when a certain run time (per message size sample) 
# is expected to be exceeded. Time limit is defined by variable 
# "SECS_PER_SAMPLE" (=> IMB_settings.h) 
# or through the flag => -time 
  


# Calling sequence was: 

# IMB-MPI1 pingpong

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE 
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM  
#
#

# List of Benchmarks to run:

# PingPong

#---------------------------------------------------
# Benchmarking PingPong 
# #processes = 2 
#---------------------------------------------------
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         1.13         0.00
            1         1000         1.21         0.79
            2         1000         0.92         2.07
            4         1000         0.92         4.13
            8         1000         0.92         8.29
           16         1000         0.91        16.73
           32         1000         0.92        33.08
           64         1000         1.00        60.97
          128         1000         1.00       121.59
          256         1000         1.04       235.21
          512         1000         1.29       377.65
         1024         1000         1.44       680.29
         2048         1000         1.81      1079.10
         4096         1000         2.70      1445.43
         8192         1000         4.05      1930.71
        16384         1000         7.64      2045.95
        32768         1000        14.11      2214.35
        65536          640        17.21      3630.76
       131072          320        31.05      4025.58
       262144          160        49.35      5065.51
       524288           80        78.83      6342.63
      1048576           40       153.96      6494.99
      2097152           20       305.65      6543.38
      4194304           10       606.39      6596.37


# All processes entering MPI_Finalize

[proxy:0:0@lizhi] got pmi command (from 14): finalize

[proxy:0:0@lizhi] PMI response: cmd=finalize_ack
[proxy:0:0@lizhi] got pmi command (from 12): finalize

[proxy:0:0@lizhi] PMI response: cmd=finalize_ack

 

0 Kudos
Highlighted
Employee
127 Views

Hi Jeremie,

It looks like I_MPI_SHM_LMT=shm has helped. Could you please try this W/A for the initial scenario?

I've reproduced the problem on our cluster - looks like Ubuntu* specificity. I'll submit an internal ticket for this problem. Thanks for the reporting.

0 Kudos
Highlighted
127 Views

Hi All,

since we are running into exactly same issue as discussed in this thread here, I'd like to ask, if there was any additional outcome from that internal ticket that was created in January by Artem in post #11? - Or should I also create a ticket in Intel Premier Support now about this?

We are currently experiencing this issue with latest Ubuntu version 16.10 (but also 16.04 shows it) and using IMPI 5.1.3.180

Should one then set this environment variable by default on any Ubuntu system? - Or what is Intel's advice?
Also asking, if one has jobs spanning multiple hosts - what is the situation there?

Thanks and best regards,
Frank

0 Kudos
Highlighted
127 Views

Hi Artem (Intel),

what was the outcome of the ticket that you created (mentioned in post #12 above)?
Couly ou please elaborate on this with respect to my questions in post #13 above?

Thanks and best regards,
Frank

0 Kudos
Highlighted
Employee
127 Views

Hi Frank,

Sorry for the late answer.
You can find some recipes for the issue in the Intel® MPI Library Release Notes:

    - On some Linux* distributions, the Intel(R) MPI Library will fail for non-root
      users due to security limitations.
      This has been seen on Ubuntu* 12.04, and could impact other
      distributions and versions as well.  Two workarounds have been identified
      for this issue.
      o Enable ptrace for non-root users with:
        echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
      o Or, revert the Intel(R) MPI Library to an earlier shared memory
        mechanism which is not impacted by setting:
        I_MPI_SHM_LMT=shm

0 Kudos
Highlighted
Beginner
127 Views

We have encountered the same issue and can confirm that enabling ptrace for non-root users works around the problem.

Since we provide an application for users that do not necessarly have the permissions to enable ptrace, we have identified a third workaround. From within the application one can disable the ptrace restrictions by calling

prctl(PR_SET_PTRACER, PR_SET_PTRACER_ANY, 0, 0, 0);

 

0 Kudos
Highlighted
Beginner
127 Views

I can add that this issue appears with Arch Linux as well. The workarounds mentioned here solve the issue.

%uname -a

Linux epsilon 4.8.10-1-ARCH #1 SMP PREEMPT Mon Nov 21 11:55:43 CET 2016 x86_64 GNU/Linux

 

impi Version 2017 Update 1 Build 2016101

0 Kudos