- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
this is the information i got:
yukai@hc-abs:/home_sas/yukai => mpdboot -d -v -r ssh -f mpd.hosts -n 7
debug: starting
running mpdallexit on hc-abs
LAUNCHED mpd on hc-abs via
debug: launch cmd= env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 /opt/intel/impi/3.2.2.006/bin64/mpd.py --ncpus=1 --myhost=hc-abs -e -d -s 7
debug: mpd on hc-abs on port 40529
RUNNING: mpd on hc-abs
debug: info for running mpd: {'ip': '', 'ncpus': 1, 'list_port': 40529, 'entry_port': '', 'host': 'hc-abs', 'entry_host': '', 'ifhn': ''}
LAUNCHED mpd on n10 via hc-abs
debug: launch cmd= ssh -x -n n10 env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 HOSTNAME=$HOSTNAME HOSTTYPE=$HOSTTYPE MACHTYPE=$MACHTYPE OSTYPE=$OSTYPE /opt/intel/impi/3.2.2.006/bin64/mpd.py -h hc-abs -p 40529 --ifhn=192.168.0.160 --ncpus=1 --myhost=n10 --myip=192.168.0.160 -e -d -s 7
LAUNCHED mpd on n11 via hc-abs
debug: launch cmd= ssh -x -n n11 env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 HOSTNAME=$HOSTNAME HOSTTYPE=$HOSTTYPE MACHTYPE=$MACHTYPE OSTYPE=$OSTYPE /opt/intel/impi/3.2.2.006/bin64/mpd.py -h hc-abs -p 40529 --ifhn=192.168.0.161 --ncpus=1 --myhost=n11 --myip=192.168.0.161 -e -d -s 7
LAUNCHED mpd on n12 via hc-abs
debug: launch cmd= ssh -x -n n12 env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 HOSTNAME=$HOSTNAME HOSTTYPE=$HOSTTYPE MACHTYPE=$MACHTYPE OSTYPE=$OSTYPE /opt/intel/impi/3.2.2.006/bin64/mpd.py -h hc-abs -p 40529 --ifhn=192.168.0.162 --ncpus=1 --myhost=n12 --myip=192.168.0.162 -e -d -s 7
LAUNCHED mpd on n13 via hc-abs
debug: launch cmd= ssh -x -n n13 env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 HOSTNAME=$HOSTNAME HOSTTYPE=$HOSTTYPE MACHTYPE=$MACHTYPE OSTYPE=$OSTYPE /opt/intel/impi/3.2.2.006/bin64/mpd.py -h hc-abs -p 40529 --ifhn=192.168.0.163 --ncpus=1 --myhost=n13 --myip=192.168.0.163 -e -d -s 7
debug: mpd on n10 on port 32896
mpdboot_hc-abs (handle_mpd_output 886): failed to ping mpd on n10; received output={}
i am sure ssh work perfectly (passwordless).
mpd.hosts:
n10
n11
n12
n13
n14
n15
n16
mpirun works fine for each node.
yukai@hc-abs:/home_sas/yukai => cpuinfo
Intel Xeon Processor (Intel64 Dunnington)
===== Processor composition =====
Processors(CPUs) : 16
Packages(sockets) : 4
Cores per package : 4
Threads per core : 1
===== Processor identification =====
Processor Thread Id. Core Id. Package Id.
0 0 0 0
1 0 0 1
2 0 0 2
3 0 0 3
4 0 2 0
5 0 2 1
6 0 2 2
7 0 2 3
8 0 1 0
9 0 1 1
10 0 1 2
11 0 1 3
12 0 3 0
13 0 3 1
14 0 3 2
15 0 3 3
===== Placement on packages =====
Package Id. Core Id. Processors
0 0,2,1,3 0,4,8,12
1 0,2,1,3 1,5,9,13
2 0,2,1,3 2,6,10,14
3 0,2,1,3 3,7,11,15
===== Cache sharing =====
Cache Size Processors
L1 32 KB no sharing
L2 3 MB (0,8)(1,9)(2,10)(3,11)(4,12)(5,13)(6,14)(7,15)
L3 8 MB (0,4,8,12)(1,5,9,13)(2,6,10,14)(3,7,11,15)
/etc/hosts looks fine.
Any help and suggestion will be greatly appreciated!
yukai@hc-abs:/home_sas/yukai => mpdboot -d -v -r ssh -f mpd.hosts -n 7
debug: starting
running mpdallexit on hc-abs
LAUNCHED mpd on hc-abs via
debug: launch cmd= env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 /opt/intel/impi/3.2.2.006/bin64/mpd.py --ncpus=1 --myhost=hc-abs -e -d -s 7
debug: mpd on hc-abs on port 40529
RUNNING: mpd on hc-abs
debug: info for running mpd: {'ip': '', 'ncpus': 1, 'list_port': 40529, 'entry_port': '', 'host': 'hc-abs', 'entry_host': '', 'ifhn': ''}
LAUNCHED mpd on n10 via hc-abs
debug: launch cmd= ssh -x -n n10 env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 HOSTNAME=$HOSTNAME HOSTTYPE=$HOSTTYPE MACHTYPE=$MACHTYPE OSTYPE=$OSTYPE /opt/intel/impi/3.2.2.006/bin64/mpd.py -h hc-abs -p 40529 --ifhn=192.168.0.160 --ncpus=1 --myhost=n10 --myip=192.168.0.160 -e -d -s 7
LAUNCHED mpd on n11 via hc-abs
debug: launch cmd= ssh -x -n n11 env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 HOSTNAME=$HOSTNAME HOSTTYPE=$HOSTTYPE MACHTYPE=$MACHTYPE OSTYPE=$OSTYPE /opt/intel/impi/3.2.2.006/bin64/mpd.py -h hc-abs -p 40529 --ifhn=192.168.0.161 --ncpus=1 --myhost=n11 --myip=192.168.0.161 -e -d -s 7
LAUNCHED mpd on n12 via hc-abs
debug: launch cmd= ssh -x -n n12 env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 HOSTNAME=$HOSTNAME HOSTTYPE=$HOSTTYPE MACHTYPE=$MACHTYPE OSTYPE=$OSTYPE /opt/intel/impi/3.2.2.006/bin64/mpd.py -h hc-abs -p 40529 --ifhn=192.168.0.162 --ncpus=1 --myhost=n12 --myip=192.168.0.162 -e -d -s 7
LAUNCHED mpd on n13 via hc-abs
debug: launch cmd= ssh -x -n n13 env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 HOSTNAME=$HOSTNAME HOSTTYPE=$HOSTTYPE MACHTYPE=$MACHTYPE OSTYPE=$OSTYPE /opt/intel/impi/3.2.2.006/bin64/mpd.py -h hc-abs -p 40529 --ifhn=192.168.0.163 --ncpus=1 --myhost=n13 --myip=192.168.0.163 -e -d -s 7
debug: mpd on n10 on port 32896
mpdboot_hc-abs (handle_mpd_output 886): failed to ping mpd on n10; received output={}
i am sure ssh work perfectly (passwordless).
mpd.hosts:
n10
n11
n12
n13
n14
n15
n16
mpirun works fine for each node.
yukai@hc-abs:/home_sas/yukai => cpuinfo
Intel Xeon Processor (Intel64 Dunnington)
===== Processor composition =====
Processors(CPUs) : 16
Packages(sockets) : 4
Cores per package : 4
Threads per core : 1
===== Processor identification =====
Processor Thread Id. Core Id. Package Id.
0 0 0 0
1 0 0 1
2 0 0 2
3 0 0 3
4 0 2 0
5 0 2 1
6 0 2 2
7 0 2 3
8 0 1 0
9 0 1 1
10 0 1 2
11 0 1 3
12 0 3 0
13 0 3 1
14 0 3 2
15 0 3 3
===== Placement on packages =====
Package Id. Core Id. Processors
0 0,2,1,3 0,4,8,12
1 0,2,1,3 1,5,9,13
2 0,2,1,3 2,6,10,14
3 0,2,1,3 3,7,11,15
===== Cache sharing =====
Cache Size Processors
L1 32 KB no sharing
L2 3 MB (0,8)(1,9)(2,10)(3,11)(4,12)(5,13)(6,14)(7,15)
L3 8 MB (0,4,8,12)(1,5,9,13)(2,6,10,14)(3,7,11,15)
/etc/hosts looks fine.
Any help and suggestion will be greatly appreciated!
Link Copied
14 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
problem solved!
Now I have a question about mpd.hosts.
The ring doesn't work if I don't put the head node in the first line.
Now the question is how I can avoid this cause I don't want to use the head node (leave it for the other system programs).
Eithe starting ring without the head node or submitting jobs to specified nodes in the ring?
Now I have a question about mpd.hosts.
The ring doesn't work if I don't put the head node in the first line.
Now the question is how I can avoid this cause I don't want to use the head node (leave it for the other system programs).
Eithe starting ring without the head node or submitting jobs to specified nodes in the ring?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Have you tried '-nolocal' option?
Regards!
Dmitry
Have you tried '-nolocal' option?
Regards!
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi tamuer,
Could you please tell me how to resolve that. I'm having the same problem.
Thanks,
Tuan
Could you please tell me how to resolve that. I'm having the same problem.
Thanks,
Tuan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tuan,
Could you clarify what problem you have.
What library version do you use.
Could you post commands and error message here and I'll try to help you.
Regards!
Dmitry
Could you clarify what problem you have.
What library version do you use.
Could you post commands and error message here and I'll try to help you.
Regards!
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hi Tuan, I just did what Dmitry told me. He is a wonderful expert.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, Tamuer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What was the fix? I've just begun experiencing the problem on a cluster that was working previously. Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Daniel,
Could you clarify what the problem is? What version of the Intel MPI Library do you use?
Usually there are some log files in /tmp directory. Try 'ls /tmp | grep mpd'
Please provide as much information as possible and I'll try to help you.
Regards!
Dmitry
Could you clarify what the problem is? What version of the Intel MPI Library do you use?
Usually there are some log files in /tmp directory. Try 'ls /tmp | grep mpd'
Please provide as much information as possible and I'll try to help you.
Regards!
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm using ICT 3.2.2. Everything works fine on all nodes except 2. The install is on a shared filesystem. The logfile is empty. If I run w/ -d I get:
[root@test1 ~]# mpdboot -n 2 -r ssh -f machines -d
debug: starting
running mpdallexit on test1
debug: launch cmd= env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 /opt/intel/impi/3.2.2.006/bin64/mpd.py --ncpus=1 --myhost=test1 -e -d -s 2
debug: mpd on test1 on port 37556
debug: info for running mpd: {'ip': '10.11.178.192', 'ncpus': 1, 'list_port': 37556, 'entry_port': '', 'host': 'test1', 'entry_host': '', 'ifhn': ''}
debug: launch cmd= ssh -x -n test2 env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 HOSTNAME=$HOSTNAME /opt/intel/impi/3.2.2.006/bin64/mpd.py -h test1 -p 37556 --ifhn=10.11.179.27 --ncpus=1 --myhost=test2 --myip=10.11.179.27 -e -d -s 2
debug: mpd on test2 on port 50042
mpdboot_test1 (handle_mpd_output 886): failed to ping mpd on test2; received output={}
[root@test1 ~]# mpdboot -n 2 -r ssh -f machines -ddebug: startingrunning mpdallexit on test1debug: launch cmd= env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 /opt/intel/impi/3.2.2.006/bin64/mpd.py --ncpus=1 --myhost=test1 -e -d -s 2debug: mpd on test1 on port 37556debug: info for running mpd: {'ip': '10.11.178.192', 'ncpus': 1, 'list_port': 37556, 'entry_port': '', 'host': 'test1', 'entry_host': '', 'ifhn': ''}debug: launch cmd= ssh -x -n test2 env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 HOSTNAME=$HOSTNAME /opt/intel/impi/3.2.2.006/bin64/mpd.py -h test1 -p 37556 --ifhn=10.11.179.27 --ncpus=1 --myhost=test2 --myip=10.11.179.27 -e -d -s 2debug: mpd on test2 on port 50042mpdboot_test1 (handle_mpd_output 886): failed to ping mpd on test2; received output={}- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Daniel,
Could you check that you can log-in without entering password (or passphrase) from test1 to test2 and vice versa.
[root@test1 ~] ssh test1
Passwordless ssh connection is one of requirements.
Regards!
Dmitry
Could you check that you can log-in without entering password (or passphrase) from test1 to test2 and vice versa.
[root@test1 ~] ssh test1
Passwordless ssh connection is one of requirements.
Regards!
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
passwordless ssh is working properly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Daniel,
Looks like there are some limitations on the network ports. Do you use firewall? Or might be some ssh ports are restricted? Could you please check with your system administrator?
Regards!
Dmitry
Looks like there are some limitations on the network ports. Do you use firewall? Or might be some ssh ports are restricted? Could you please check with your system administrator?
Regards!
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dmitry,
No firewall rules are defined, selinux is disabled. I can use ibping to ping between the machines and get replies. Still cannot create a ring. MPDs can start locally. Passworddless ssh works perfectly. Authentication is from the same NIS server as all the other nodes in the cluster that do work. It is an odd problem, IMHO!! Any more suggestions? Thanks,
Dan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Daniel,
Let's compare ssh versions! I'm using:
[root@cluster1002 ~]$ ssh -V
ssh: Reflection for Secure IT 6.1.2.1 (build 3005) on x86_64-redhat-linux-gnu (64-bit)
Could you check for mpd processes on both nodes?
[root@cluster1002 ~] ps ux
[root@claster1002 ~] ssh test2 -x ps ux
If there is an mpd process please kill it.
[root@claster1002 ~] echo test1 > mpd.hosts
[root@claster1002 ~] echo test2 >> mpd.hosts
[root@claster1002 ~] mpdboot -r ssh -n 2 -d
Check the ring:
[root@claster1002 ~] mpdtrace
If there is no ring, let's try to create it by hand:
[root@test1 ~] env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 /opt/intel/impi/3.2.2.006/bin64/mpd.py --ncpus=1 --myhost=test1 -e -d -s 2
You'll get port number (port_number) which will be used in the next command
[root@test1 ~] ssh -x -n test2 env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 HOSTNAME=$HOSTNAME /opt/intel/impi/3.2.2.006/bin64/mpd.py -h test1 -p port_number --ifhn=10.11.179.27 --ncpus=1 --myhost=test2 --myip=10.11.179.27 -e -d -s 2
If ssh works correctly new mpd ring will be created.
[root@test1 ~] mpdtrace
test1
test2
If it doesn't work it means that you have some issues with configuration. If it work send me the output - probably your ssh outputs information in another format.
Regards!
Dmitry
Let's compare ssh versions! I'm using:
[root@cluster1002 ~]$ ssh -V
ssh: Reflection for Secure IT 6.1.2.1 (build 3005) on x86_64-redhat-linux-gnu (64-bit)
Could you check for mpd processes on both nodes?
[root@cluster1002 ~] ps ux
[root@claster1002 ~] ssh test2 -x ps ux
If there is an mpd process please kill it.
[root@claster1002 ~] echo test1 > mpd.hosts
[root@claster1002 ~] echo test2 >> mpd.hosts
[root@claster1002 ~] mpdboot -r ssh -n 2 -d
Check the ring:
[root@claster1002 ~] mpdtrace
If there is no ring, let's try to create it by hand:
[root@test1 ~] env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 /opt/intel/impi/3.2.2.006/bin64/mpd.py --ncpus=1 --myhost=test1 -e -d -s 2
You'll get port number (port_number) which will be used in the next command
[root@test1 ~] ssh -x -n test2 env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 HOSTNAME=$HOSTNAME /opt/intel/impi/3.2.2.006/bin64/mpd.py -h test1 -p port_number --ifhn=10.11.179.27 --ncpus=1 --myhost=test2 --myip=10.11.179.27 -e -d -s 2
If ssh works correctly new mpd ring will be created.
[root@test1 ~] mpdtrace
test1
test2
If it doesn't work it means that you have some issues with configuration. If it work send me the output - probably your ssh outputs information in another format.
Regards!
Dmitry

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page