Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2161 Discussions

mpdallexit: cannot connect to local mpd

camiyu917gmail_com
1,682 Views
I get a problem while I execute "mpirun -r ssh -f mpd.hosts -n 2 ./testcpp"

==================
mpiexec_cluster-master (mpiexec 841): no msg recvd from mpd when expecting ack o f request. Please examine the /tmp/mpd2.logfile_user log file on each node of th e ring.
mpdallexit: cannot connect to local mpd (/tmp/mpd2.console_user_090519.111321_4345); possible causes:
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)
==================

follow is the logfile:
====================================================
======== mpd2.logfile_cluster-master_user_090519.110901_4130 =======
====================================================
logfile for mpd with pid 4166
cluster-master_45346: mpd_uncaught_except_tb handling:
exceptions.IndexError: list index out of range
/opt/intel/impi/3.2.0.011/bin64/mpd.py 132 pin_Join_list
list.append(l1+l2+l3)
/opt/intel/impi/3.2.0.011/bin64/mpd.py 421 pin_CpuList
ordids = pin_Join_list(info['pack_id'],info['core_id'],info['thread_id'],space)
/opt/intel/impi/3.2.0.011/bin64/mpd.py 2535 run_one_cli
self.PinList = pin_CpuList(gl_envvars, self.PinCase, self.PinSpace,self.CpuInfo,len(self.RanksToBeRunHere))
/opt/intel/impi/3.2.0.011/bin64/mpd.py 2369 do_mpdrun
rv = self.run_one_cli(lorank,msg)
/opt/intel/impi/3.2.0.011/bin64/mpd.py 1605 handle_console_input
self.do_mpdrun(msg)
/opt/intel/impi/3.2.0.011/bin64/mpdlib.py 613 handle_active_streams
handler(stream,*args)
/opt/intel/impi/3.2.0.011/bin64/mpd.py 1262 runmainloop
rv = self.streamHandler.handle_active_streams(timeout=8.0)
/opt/intel/impi/3.2.0.011/bin64/mpd.py 1231 run
self.runmainloop()
/opt/intel/impi/3.2.0.011/bin64/mpd.py 2762 ?
mpd.run()
====================================================

====================================================
========== mpd2.logfile_user_090519.110350_3668 ===============
====================================================
logfile for mpd with pid 3704
cluster-master_37151: mpd_uncaught_except_tb handling:
exceptions.IndexError: list index out of range
/opt/intel/impi/3.2.0.011/bin64/mpd.py 132 pin_Join_list
list.append(l1+l2+l3)
/opt/intel/impi/3.2.0.011/bin64/mpd.py 421 pin_CpuList
ordids = pin_Join_list(info['pack_id'],info['core_id'],info['thread_id'],space)
/opt/intel/impi/3.2.0.011/bin64/mpd.py 2535 run_one_cli
self.PinList = pin_CpuList(gl_envvars, self.PinCase, self.PinSpace,self.CpuInfo,len(self.RanksToBeRunHere))
/opt/intel/impi/3.2.0.011/bin64/mpd.py 2369 do_mpdrun
rv = self.run_one_cli(lorank,msg)
/opt/intel/impi/3.2.0.011/bin64/mpd.py 1605 handle_console_input
self.do_mpdrun(msg)
/opt/intel/impi/3.2.0.011/bin64/mpdlib.py 613 handle_active_streams
handler(stream,*args)
/opt/intel/impi/3.2.0.011/bin64/mpd.py 1262 runmainloop
rv = self.streamHandler.handle_active_streams(timeout=8.0)
/opt/intel/impi/3.2.0.011/bin64/mpd.py 1231 run
self.runmainloop()
/opt/intel/impi/3.2.0.011/bin64/mpd.py 2762 ?
mpd.run()
====================================================

OS: SuSE Linux Enterprise 11


I have set env variable I_MPI_CPUINFO="/proc/cpuinfo",
AND execute "mpdboot -n 2 -f ./mpd.hosts" is OK.
Please Help me.... Thanks...
0 Kudos
5 Replies
Dmitry_K_Intel2
Employee
1,682 Views
I get a problem while I execute "mpirun -r ssh -f mpd.hosts -n 2 ./testcpp"


Hi there,
Seems you problem is related to the ssh connection. (By default Intel MPI library uses rsh connection.)
You should configure your authentication so you'll be able to login on all servers without a password.

To setup public key authentication method follow thefollowing steps:

1. Public key generation

local> ssh-keygen -t dsa -f .ssh/id_dsa

When a password is asked, leave it just pressing

Two new files id_dsa and id_dsa.pub are created in .ssh directory. The last one is a public part.

2. Public key distribution to remote nodes

Go to the .ssh directory. Copy the public key to the remote machine.

local> cd .ssh
local> scp id_dsa.pub user@remote:~/.ssh/id_dsa.pub

Logon into remote machine and go to the .ssh directory on the remote side.

local> ssh user@remote
remote> cd .ssh

Add the client's public key to the known public keys on the remote server.

remote> cat id_dsa.pub >> authorized_keys2
remote> chmod 640 authorized_keys2
remote> rm id_dsa.pub
remote> exit

Next time you log into the remote server, no password will be asked.

Note that ssh setup depends on the ssh client distribution.

Hope this helps.
Best wishes.
Dmitry

0 Kudos
camiyu917gmail_com
1,682 Views
Hello Dmitry:

Thanks for your help.

My cluster domain just two node (cluster-master & cluster-slave1).
Now, I can use ssh login master from slave without password, also can use ssh login slave from master without password.

But I still get the same problem.


Very thanks for you help.
0 Kudos
Gergana_S_Intel
Employee
1,682 Views

Hello camiyu917,

Thanks for the output from the mpd logfile. Which version of the Intel MPI Library do you have installed? Is it Intel MPI Library 3.2, which was released November 2008? (This information is available in the mpisupport.txt file in the installation directory).

If yes, this might be a known incompatibility between the Intel MPI Library and the latest version of OpenSuSE 11.1 (similar to SLES 11). You can get more details at the following forum thread.

You have two options to resolve this issue:

  • Upgrade to the latest Intel MPI Library 3.2 Update 1, available for download from the Intel Registration Center.
  • Set the environment variable I_MPI_CPUINFO to proc. You can do so by running:
    export I_MPI_CPUINFO=proc

Let us know if either of those options work out for you.

Regards,
~Gergana

0 Kudos
camiyu917gmail_com
1,682 Views

  • Upgrade to the latest Intel MPI Library 3.2 Update 1, available for download from the Intel Registration Center.
  • Set the environment variable I_MPI_CPUINFO to proc. You can do so by running:
    export I_MPI_CPUINFO=proc


I install Intel Cluster Toolkit Compiler Edition for Linux_3.2.020 ,

The problem is solve, this problem is environment variable about cpuinfo.
after execute "export I_MPI_CPUINFO=proc" is OK.

Now, I can run Intel MPI in SuSE Linux Enterprise 11.

Very thanks for your help ~


0 Kudos
Gergana_S_Intel
Employee
1,682 Views
Now, I can run Intel MPI in SuSE Linux Enterprise 11.

Very thanks for your help ~

You are very welcome! I'm glad things worked out for you. There will be an update to the Intel Cluster Toolkit Compiler Edition 3.2 package sometime over the summer, which will include a fix to this issue. If interested, keep an eye out on the forums since we'll be making an announcement when the new version is out.

Regards,
~Gergana
0 Kudos
Reply