- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I'm having some difficulty running Intel MPI (compiler vers: 2016.3.210) on a Knights Landing node (dual-socket). The example below is of the Linpack benchmark with the default execution input file (so it only requires 4 cores for now).
[glawson@rulfo ~]$ mpd rulfo_43354: mpd_uncaught_except_tb handling: <type 'exceptions.AttributeError'>: 'MPD' object has no attribute 'myIP' /opt/intel/compilers_and_libraries_2016.3.210/linux/mpi/intel64/bin/mpd 1677 run myIP=self.myIP, /opt/intel/compilers_and_libraries_2016.3.210/linux/mpi/intel64/bin/mpd 3676 <module> mpd.run()
I receive the same error message if I try to declare the host as "ruflo" the hostname, or "localhost" using '-host'. I also receive this error if I attempt to use 'mpd' or 'mpdboot'.
My /etc/hosts file is as follows:
[glawson@rulfo knl]$ cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 127.16.0.1 ruflo
And I can obtain the host's name using the 'hostname' command.
I've searched for this problem in the MPI developers guide and installation guides, but I cannot find this error anywhere. Can someone please point me in the right direction for troubleshooting this problem?
Thanks
Gary
Here is the same example with MPI Hydra Debug enabled:
[glawson@rulfo knl]$ export I_MPI_HYDRA_DEBUG=on [glawson@rulfo knl]$ mpiexec.hydra -np 4 ./xhpl host: rulfo ================================================================================================== mpiexec options: ---------------- Base path: /opt/intel/compilers_and_libraries_2016.3.210/linux/mpi/intel64/bin/ Launcher: ssh Debug level: 1 Enable X: -1 Global environment: ------------------- I_MPI_PERHOST=allcores LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2016.3.210/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.3.210/linux/compiler/lib/intel64::/opt/compilers_and_libraries_2016.3.210/linux/mpi/intel64/lib:/opt/compilers_and_libraries_2016.3.210/linux/mpi/mic/lib XDG_SESSION_ID=1 HOSTNAME=rulfo SELINUX_ROLE_REQUESTED= TERM=xterm SHELL=/bin/bash HISTSIZE=1000 SSH_CLIENT=10.0.1.1 59078 22 SELINUX_USE_CURRENT_RANGE= OLDPWD=/home/glawson SSH_TTY=/dev/pts/0 USER=glawson LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36: MAIL=/var/spool/mail/glawson PATH=/opt/intel/compilers_and_libraries_2016.3.210/linux/mpi/intel64/bin:/opt/intel/compilers_and_libraries_2016.3.210/linux/bin/intel64/:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/glawson/.local/bin:/home/glawson/bin I_MPI_HYDRA_DEBUG=on PWD=/home/glawson/hpl/bin/knl LANG=en_US.UTF-8 SELINUX_LEVEL_REQUESTED= HISTCONTROL=ignoredups SHLVL=1 HOME=/home/glawson LOGNAME=glawson SSH_CONNECTION=10.0.1.1 59078 10.0.1.2 22 LESSOPEN=||/usr/bin/lesspipe.sh %s XDG_RUNTIME_DIR=/run/user/1000 _=/opt/intel/compilers_and_libraries_2016.3.210/linux/mpi/intel64/bin/mpiexec.hydra Hydra internal environment: --------------------------- MPIR_CVAR_NEMESIS_ENABLE_CKPOINT=1 GFORTRAN_UNBUFFERED_PRECONNECTED=y I_MPI_HYDRA_UUID=501c0000-baff-7a5e-5339-050000000000 IPATH_NO_BACKTRACE=1 DAPL_NETWORK_PROCESS_NUM=4 Proxy information: ********************* [1] proxy: rulfo (64 cores) Exec list: ./xhpl (4 processes); ================================================================================================== [mpiexec@rulfo] Timeout set to -1 (-1 means infinite) [mpiexec@rulfo] HYDU_getfullhostname (../../utils/others/others.c:146): getaddrinfo error (hostname: rulfo, error: Temporary failure in name resolution) [mpiexec@rulfo] HYDU_sock_create_and_listen_portstr (../../utils/sock/sock.c:1094): unable to get local hostname [mpiexec@rulfo] HYD_pmci_launch_procs (../../pm/pmiserv/pmiserv_pmci.c:353): unable to create PMI port [mpiexec@rulfo] main (../../ui/mpich/mpiexec.c:1106): process manager returned error launching processes [glawson@rulfo knl]$
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Gary,
You can try to play with /etc/hosts to avoid '-localhost' option - 'hostname -f' should return correct hostname.
Try this one:
$ cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 rulfo
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Gary,
MPD is a deprecated MPI process manager - it's recommended to use Hydra.
Regarding to the Hydra error it looks like it's because of specific network settings. What's 'hostname -f/hostname -i' output?
Could you please try to run the following scenarios:
mpiexec.hydra -localhost ruflo -np 4 ./xhpl
mpiexec.hydra -localhost 127.16.0.1 -np 4 ./xhpl
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Greetings Artem,
Thank you for your reply. I just realized the first "example" was the wrong output (mpd instead of the example). I also read that MPD was no longer used, and I do use mpi hydra for all of my calls. Sorry for the confusion there. I tried the commands you suggested:
[glawson@rulfo knl]$ hostname -f hostname: Temporary failure in name resolution [glawson@rulfo knl]$ hostname -i hostname: Temporary failure in name resolution
And when I ran ' mpiexec.hydra -localhost ruflo -np 4 ./xhpl ', the code ran as expected (but I won't display the output here). It seems I should have used the '-locahost ' option over '-host ', although it is still odd that mpiexec.hydra will not execute without the 'localhost' option.
For my network, the NIC IP is set statically as 10.0.0.2, so that it may be connected to other local nodes. I assume this is the culprit behind the problems with Intel MPI, but I'm not sure how to correct the problem.
Gary
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Gary,
You can try to play with /etc/hosts to avoid '-localhost' option - 'hostname -f' should return correct hostname.
Try this one:
$ cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 rulfo
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Artem,
Thank you again, adding rulfo to the list of localhosts worked. Now I may call mpiexec.hydra normally.
Gary

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page