Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Gary_L_
Beginner
167 Views

Cannot launch mpiexec.hydra on Knights Landing processors

Jump to solution

Hello,

I'm having some difficulty running Intel MPI (compiler vers: 2016.3.210) on a Knights Landing node (dual-socket). The example below is of the Linpack benchmark with the default execution input file (so it only requires 4 cores for now). 

[glawson@rulfo ~]$ mpd
rulfo_43354: mpd_uncaught_except_tb handling:
  <type 'exceptions.AttributeError'>: 'MPD' object has no attribute 'myIP'
    /opt/intel/compilers_and_libraries_2016.3.210/linux/mpi/intel64/bin/mpd  1677  run
        myIP=self.myIP,
    /opt/intel/compilers_and_libraries_2016.3.210/linux/mpi/intel64/bin/mpd  3676  <module>
        mpd.run()

I receive the same error message if I try to declare the host as "ruflo" the hostname, or "localhost" using '-host'. I also receive this error if I attempt to use 'mpd' or 'mpdboot'.

My /etc/hosts file is as follows:

[glawson@rulfo knl]$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
127.16.0.1 ruflo

And I can obtain the host's name using the 'hostname' command. 

I've searched for this problem in the MPI developers guide and installation guides, but I cannot find this error anywhere. Can someone please point me in the right direction for troubleshooting this problem?

Thanks
Gary

Here is the same example with MPI Hydra Debug enabled:

[glawson@rulfo knl]$ export I_MPI_HYDRA_DEBUG=on
[glawson@rulfo knl]$ mpiexec.hydra -np 4 ./xhpl
host: rulfo

==================================================================================================
mpiexec options:
----------------
  Base path: /opt/intel/compilers_and_libraries_2016.3.210/linux/mpi/intel64/bin/
  Launcher: ssh
  Debug level: 1
  Enable X: -1

  Global environment:
  -------------------
    I_MPI_PERHOST=allcores
    LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2016.3.210/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64::/opt/compilers_and_libraries_2016.3.210/linux/mpi/intel64/lib:/opt/compilers_and_libraries_2016.3.210/linux/mpi/mic/lib
    XDG_SESSION_ID=1
    HOSTNAME=rulfo
    SELINUX_ROLE_REQUESTED=
    TERM=xterm
    SHELL=/bin/bash
    HISTSIZE=1000
    SSH_CLIENT=10.0.1.1 59078 22
    SELINUX_USE_CURRENT_RANGE=
    OLDPWD=/home/glawson
    SSH_TTY=/dev/pts/0
    USER=glawson
    LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:
    MAIL=/var/spool/mail/glawson
    PATH=/opt/intel/compilers_and_libraries_2016.3.210/linux/mpi/intel64/bin:/opt/intel/compilers_and_libraries_2016.3.210/linux/bin/intel64/:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/glawson/.local/bin:/home/glawson/bin
    I_MPI_HYDRA_DEBUG=on
    PWD=/home/glawson/hpl/bin/knl
    LANG=en_US.UTF-8
    SELINUX_LEVEL_REQUESTED=
    HISTCONTROL=ignoredups
    SHLVL=1
    HOME=/home/glawson
    LOGNAME=glawson
    SSH_CONNECTION=10.0.1.1 59078 10.0.1.2 22
    LESSOPEN=||/usr/bin/lesspipe.sh %s
    XDG_RUNTIME_DIR=/run/user/1000
    _=/opt/intel/compilers_and_libraries_2016.3.210/linux/mpi/intel64/bin/mpiexec.hydra

  Hydra internal environment:
  ---------------------------
    MPIR_CVAR_NEMESIS_ENABLE_CKPOINT=1
    GFORTRAN_UNBUFFERED_PRECONNECTED=y
    I_MPI_HYDRA_UUID=501c0000-baff-7a5e-5339-050000000000
    IPATH_NO_BACKTRACE=1
    DAPL_NETWORK_PROCESS_NUM=4


    Proxy information:
    *********************
      [1] proxy: rulfo (64 cores)
      Exec list: ./xhpl (4 processes); 


==================================================================================================

[mpiexec@rulfo] Timeout set to -1 (-1 means infinite)
[mpiexec@rulfo] HYDU_getfullhostname (../../utils/others/others.c:146): getaddrinfo error (hostname: rulfo, error: Temporary failure in name resolution)
[mpiexec@rulfo] HYDU_sock_create_and_listen_portstr (../../utils/sock/sock.c:1094): unable to get local hostname
[mpiexec@rulfo] HYD_pmci_launch_procs (../../pm/pmiserv/pmiserv_pmci.c:353): unable to create PMI port
[mpiexec@rulfo] main (../../ui/mpich/mpiexec.c:1106): process manager returned error launching processes
[glawson@rulfo knl]$

 

0 Kudos
1 Solution
Artem_R_Intel1
Employee
167 Views

Hi Gary,

You can try to play with /etc/hosts to avoid '-localhost' option - 'hostname -f' should return correct hostname.

Try this one:

$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4 rulfo
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

View solution in original post

4 Replies
Artem_R_Intel1
Employee
167 Views

Hi Gary,

MPD is a deprecated MPI process manager - it's recommended to use Hydra.

Regarding to the Hydra error it looks like it's because of specific network settings. What's 'hostname -f/hostname -i' output?
Could you please try to run the following scenarios:
mpiexec.hydra -localhost ruflo -np 4 ./xhpl
mpiexec.hydra -localhost 127.16.0.1 -np 4 ./xhpl

Gary_L_
Beginner
167 Views

Greetings Artem,

Thank you for your reply. I just realized the first "example" was the wrong output (mpd instead of the example). I also read that MPD was no longer used, and I do use mpi hydra for all of my calls. Sorry for the confusion there. I tried the commands you suggested:

[glawson@rulfo knl]$ hostname -f
hostname: Temporary failure in name resolution

[glawson@rulfo knl]$ hostname -i
hostname: Temporary failure in name resolution

And when I ran ' mpiexec.hydra -localhost ruflo -np 4 ./xhpl ', the code ran as expected (but I won't display the output here). It seems I should have used the '-locahost ' option over '-host ', although it is still odd that mpiexec.hydra will not execute without the 'localhost' option.

For my network, the NIC IP is set statically as 10.0.0.2, so that it may be connected to other local nodes. I assume this is the culprit behind the problems with Intel MPI, but I'm not sure how to correct the problem. 

Gary

 

 

Artem_R_Intel1
Employee
168 Views

Hi Gary,

You can try to play with /etc/hosts to avoid '-localhost' option - 'hostname -f' should return correct hostname.

Try this one:

$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4 rulfo
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

View solution in original post

Gary_L_
Beginner
167 Views

Artem,

Thank you again, adding rulfo to the list of localhosts worked. Now I may call mpiexec.hydra normally.

Gary

Reply